About | Microbiomics

Ocean Microbiomics DataBase (OMDB)

Ocean microbiome research has generated a large volume of data distributed across various repositories and publications. However, these datasets are often isolated by study, lack standardized methodologies, and include inconsistent metadata, leading to fragmentation that limits large-scale integration and comparative analysis.

OMDB addresses this issue by providing a centralized, evolving repository, and systematically analyzed global dataset. It includes 12,260 geo-referenced samples and 274,282 reconstructed genomes, which are clustered into 32,022 prokaryotic species-level units. OMDB is an expanding genome-resolved data repository to explore the diversity, distribution and function of microbes in the global ocean microbiome. You can browse the genome collection, search for sequences, explore Biosynthetic Gene Clusters (BGCs), navigate sampling locations and metadata through an interactive ocean map, and download supporting data.

Team

Development team

Kang Li
Samuel Miravet-Verde
Gregor Rot
Hans-Joachim Ruscheweyh

Additional support

Infrastructure - ETHZ - Institute of Microbiology IT
Compute - ETHZ - HPC Group
Metagraph - Andre Kahles, Harun Mustafa
Mmseqs - Martin Steinegger, Milot Mirdita

Lead

Shinichi Sunagawa

Curation

Dominic Eriksson
Guillem Salazar
Lucas Paoli
Taylor Priest

Contact the team

We encourage you to join our community in discord following this link. Alternatively, you can send an email to the developer link here.

Funding

Cite OMDB

Soon!

OMDB citations

Soon!

OMDB history

Check the version history here.

Microbiomics related resources

motus-db.org - The mOTUs database is constructed by clustering universal single-copy phylogenetic marker genes from reference genomes, single cell-assembled genomes (SAGs), and metagenome-assembled genomes (MAGs).
Reef Home | Microbiomics - data resource to enable the genome-resolved study of coral reef microbiomes

Microbiomics methods

BLAST search

We provide blastn searches against the nucleotide gene catalog. A typical command looks like this:


blastn \
  -query query.fasta \
  -db path/to/db \
  -dbsize <database_size> \
  -out results.out \
  -num_threads 8 \
  -outfmt 6

You can adjust the -outfmt parameter if a different output format is required.

MMseqs2 expanded cluster search

We provide MMseqs2 expanded cluster searches (see documentation), which enable fast initial searches against cluster representatives, followed by expansion to all cluster members.


mmseqs createdb query.fasta queryDB
mmseqs search queryDB db_main res tmp_dir/tmp --threads 8 -a
mmseqs expandaln queryDB db_main res db_aln res_expanded --threads 8
mmseqs align queryDB db_seq res_expanded res_expanded_realign --threads 8 -a
mmseqs convertalis queryDB db_seq res_expanded_realign results.m8

Result filtering

For both blastn and MMseqs2 searches, we apply a filtering step to retain only alignments where:


if aln_len >= 10 and (aln_len / query_length) >= 0.2:
    # keep the hit