Ocean Microbiomics DataBase (OMDB)

Ocean microbiome research has generated a large volume of data distributed across various repositories and publications. However, these datasets are often isolated by study, lack standardized methodologies, and include inconsistent metadata, leading to fragmentation that limits large-scale integration and comparative analysis.

OMDB addresses this issue by providing a centralized, evolving repository, and systematically analyzed global dataset. It includes 12,260 geo-referenced samples and 274,282 reconstructed genomes, which are clustered into 32,022 prokaryotic species-level units. OMDB is an expanding genome-resolved data repository to explore the diversity, distribution and function of microbes in the global ocean microbiome. You can browse the genome collection, search for sequences, explore Biosynthetic Gene Clusters (BGCs), navigate sampling locations and metadata through an interactive ocean map, and download supporting data.

Team

Development team
  • Kang Li
  • Samuel Miravet-Verde
  • Gregor Rot
  • Hans-Joachim Ruscheweyh
Additional support
  • Infrastructure - ETHZ - Institute of Microbiology IT
  • Compute - ETHZ - HPC Group
  • Metagraph - Andre Kahles, Harun Mustafa
  • Mmseqs - Martin Steinegger, Milot Mirdita
Lead
  • Shinichi Sunagawa
Curation
  • Dominic Eriksson
  • Guillem Salazar
  • Lucas Paoli
  • Taylor Priest

Contact the team

We encourage you to join our community in discord following this link. Alternatively, you can send an email to the developer link here.

Funding

ETH ETHZ Logo SNSF SIB HFSP (LT0050/2023-L)

Cite OMDB

Soon!

OMDB citations

Soon!

OMDB history

Check the version history here.

Microbiomics related resources

  • motus-db.org - The mOTUs database is constructed by clustering universal single-copy phylogenetic marker genes from reference genomes, single cell-assembled genomes (SAGs), and metagenome-assembled genomes (MAGs).
  • Reef Home | Microbiomics - data resource to enable the genome-resolved study of coral reef microbiomes

Microbiomics methods

BLAST search

We provide blastn searches against the nucleotide gene catalog. A typical command looks like this:


blastn \
  -query query.fasta \
  -db path/to/db \
  -dbsize <database_size> \
  -out results.out \
  -num_threads 8 \
  -outfmt 6
    

You can adjust the -outfmt parameter if a different output format is required.

MMseqs2 expanded cluster search

We provide MMseqs2 expanded cluster searches (see documentation), which enable fast initial searches against cluster representatives, followed by expansion to all cluster members.


mmseqs createdb query.fasta queryDB
mmseqs search queryDB db_main res tmp_dir/tmp --threads 8 -a
mmseqs expandaln queryDB db_main res db_aln res_expanded --threads 8
mmseqs align queryDB db_seq res_expanded res_expanded_realign --threads 8 -a
mmseqs convertalis queryDB db_seq res_expanded_realign results.m8
    

Result filtering

For both blastn and MMseqs2 searches, we apply a filtering step to retain only alignments where:


if aln_len >= 10 and (aln_len / query_length) >= 0.2:
    # keep the hit