Raptor 3.0.0-rc.1
A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences
 
raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::user_bins Class Reference

Bookkeeping for user and technical bins. More...

#include <raptor/hierarchical_interleaved_bloom_filter.hpp>

+ Collaboration diagram for raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::user_bins:

Public Member Functions

std::vector< int64_t > & bin_indices_of_ibf (size_t const idx)
 Returns a vector containing user bin indices for each bin in the idxth IBF.
 
int64_t filename_index (size_t const ibf_idx, size_t const bin_idx) const
 Returns the filename index of the ibf_idxth IBF for bin bin_idx.
 
std::stringfilename_of_user_bin (size_t const idx)
 Returns the filename of the idxth user bin.
 
size_t num_user_bins () const noexcept
 Returns the number of managed user bins.
 
auto operator[] (size_t const ibf_idx) const
 Returns a view over the user bin filenames for the ibf_idxth IBF. An empty string is returned for merged bins.
 
std::string const & operator[] (std::pair< size_t, size_t > const &index_pair) const
 For a pair (a,b), returns a const reference to the filename of the user bin at IBF a, bin b.
 
template<typename archive_t >
void serialize (archive_t &archive)
 Serialisation support function.
 
void set_ibf_count (size_t const size)
 Changes the number of managed IBFs.
 
void set_user_bin_count (size_t const size)
 Changes the number of managed user bins.
 
template<typename stream_t >
void write_filenames (stream_t &out_stream) const
 Writes all filenames to a stream. Index and filename are tab-separated.
 

Private Attributes

std::vector< std::vector< int64_t > > ibf_bin_to_filename_position {}
 Stores for each bin in each IBF of the HIBF the ID of the filename.
 
std::vector< std::stringuser_bin_filenames
 Contains filenames of all user bins.
 

Detailed Description

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
class raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::user_bins

Bookkeeping for user and technical bins.

Member Function Documentation

◆ bin_indices_of_ibf()

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
std::vector< int64_t > & raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::user_bins::bin_indices_of_ibf ( size_t const  idx)
inline

Returns a vector containing user bin indices for each bin in the idxth IBF.

Parameters
idxThe id of the x-th IBF.

Example

#include <seqan3/core/debug_stream.hpp>
// For this example we have two input fasta files with the following three sequences (= user bins):
// example1.fasta:
// ```fasta
// >chr1
// AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGCGTTCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
// ```
// example2.fasta:
// ```fasta
// >chr2
// AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGCGTCATTAA
// >chr3
// AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
// ```
// 1-level-IBF: |chr1|chr2,chr3|
// 2-level-IBF: |chr2|chr3|
int main()
{
hibf.user_bins.set_ibf_count(2);
hibf.user_bins.bin_indices_of_ibf(0) = {1, 2, 3};
hibf.user_bins.bin_indices_of_ibf(1) = {2, 3};
seqan3::debug_stream << "User bin indices of 1-level-IBF: " << hibf.user_bins.bin_indices_of_ibf(0) << '\n';
seqan3::debug_stream << "User bin indices of 2-level-IBF: " << hibf.user_bins.bin_indices_of_ibf(1) << '\n';
}
// Prints out:
// User bin indices of 1-level-IBF: [1,2,3]
// User bin indices of 2-level-IBF: [2,3]
The HIBF binning directory. A data structure that efficiently answers set-membership queries for mult...
Definition: hierarchical_interleaved_bloom_filter.hpp:88
Provides raptor::hierarchical_interleaved_bloom_filter.

◆ filename_of_user_bin()

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
std::string & raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::user_bins::filename_of_user_bin ( size_t const  idx)
inline

Returns the filename of the idxth user bin.

Parameters
idxThe id of the x-th user bin.

Example

#include <seqan3/core/debug_stream.hpp>
// For this example we have two input fasta files with the following three sequences (= user bins):
// example1.fasta:
// ```fasta
// >chr1
// AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGCGTTCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
// ```
// example2.fasta:
// ```fasta
// >chr2
// AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGCGTCATTAA
// >chr3
// AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
// ```
// 1-level-IBF: |chr1|chr2,chr3|
// 2-level-IBF: |chr2|chr3|
int main()
{
hibf.user_bins.set_user_bin_count(3);
hibf.user_bins.filename_of_user_bin(0) = "path/example1.fasta";
hibf.user_bins.filename_of_user_bin(1) = "path/example2.fasta";
hibf.user_bins.filename_of_user_bin(2) = "path/example2.fasta";
seqan3::debug_stream << "Filename of user bin 0: " << hibf.user_bins.filename_of_user_bin(0) << '\n';
seqan3::debug_stream << "Filename of user bin 1: " << hibf.user_bins.filename_of_user_bin(1) << '\n';
seqan3::debug_stream << "Filename of user bin 2: " << hibf.user_bins.filename_of_user_bin(2) << '\n';
}
// Prints out:
// Filename of user bin 0: path/example1.fasta
// Filename of user bin 1: path/example2.fasta
// Filename of user bin 2: path/example2.fasta

◆ serialize()

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
template<typename archive_t >
void raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::user_bins::serialize ( archive_t &  archive)
inline

Serialisation support function.

Template Parameters
archive_tType of archive; must satisfy seqan3::cereal_archive.
Parameters
[in]archiveThe archive being serialised from/to.
Attention
These functions are never called directly.
See also
https://docs.seqan.de/seqan/3.2.0/group__io.html#serialisation

◆ write_filenames()

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
template<typename stream_t >
void raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::user_bins::write_filenames ( stream_t &  out_stream) const
inline

Writes all filenames to a stream. Index and filename are tab-separated.

0 <path_to_user_bin_0> 1 <path_to_user_bin_1>

+ Here is the call graph for this function:

Member Data Documentation

◆ ibf_bin_to_filename_position

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
std::vector<std::vector<int64_t> > raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::user_bins::ibf_bin_to_filename_position {}
private

Stores for each bin in each IBF of the HIBF the ID of the filename.

Assume we look up a bin b in IBF i, i.e. ibf_bin_to_filename_position[i][b]. If -1 is returned, bin b is a merged bin, and there is no filename, we need to look into the lower level IBF. Otherwise, the returned value j can be used to access the corresponding filename user_bin_filenames[j].


The documentation for this class was generated from the following file: