Ocean Microbiomics DataBase (OMDB)
Ocean microbiome research has generated a large volume of data distributed across various repositories and publications. However, these datasets are often isolated by study, lack standardized methodologies, and include inconsistent metadata, leading to fragmentation that limits large-scale integration and comparative analysis.
OMDB addresses this issue by providing a centralized, evolving repository, and systematically analyzed global dataset. It includes 12,260 geo-referenced samples and 274,282 reconstructed genomes, which are clustered into 32,022 prokaryotic species-level units. OMDB is an expanding genome-resolved data repository to explore the diversity, distribution and function of microbes in the global ocean microbiome. You can browse the genome collection, search for sequences, explore Biosynthetic Gene Clusters (BGCs), navigate sampling locations and metadata through an interactive ocean map, and download supporting data.
Team
Development team
- Kang Li
- Samuel Miravet-Verde
- Gregor Rot
- Hans-Joachim Ruscheweyh
Additional support
- Infrastructure - ETHZ - Institute of Microbiology IT
- Compute - ETHZ - HPC Group
- Metagraph - Andre Kahles, Harun Mustafa
- Mmseqs - Martin Steinegger, Milot Mirdita
Lead
- Shinichi Sunagawa
Curation
- Dominic Eriksson
- Guillem Salazar
- Lucas Paoli
- Taylor Priest
Contact the team
Funding




Cite OMDB
Microbiomics related resources
- motus-db.org - The mOTUs database is constructed by clustering universal single-copy phylogenetic marker genes from reference genomes, single cell-assembled genomes (SAGs), and metagenome-assembled genomes (MAGs).
- Reef Home | Microbiomics - data resource to enable the genome-resolved study of coral reef microbiomes
Microbiomics methods
BLAST search
We provide blastn
searches against the nucleotide gene catalog. A typical command looks like this:
blastn \
-query query.fasta \
-db path/to/db \
-dbsize <database_size> \
-out results.out \
-num_threads 8 \
-outfmt 6
You can adjust the -outfmt
parameter if a different output format is required.
MMseqs2 expanded cluster search
We provide MMseqs2 expanded cluster searches (see documentation), which enable fast initial searches against cluster representatives, followed by expansion to all cluster members.
mmseqs createdb query.fasta queryDB
mmseqs search queryDB db_main res tmp_dir/tmp --threads 8 -a
mmseqs expandaln queryDB db_main res db_aln res_expanded --threads 8
mmseqs align queryDB db_seq res_expanded res_expanded_realign --threads 8 -a
mmseqs convertalis queryDB db_seq res_expanded_realign results.m8
Result filtering
For both blastn
and MMseqs2 searches, we apply a filtering step to retain only alignments where:
if aln_len >= 10 and (aln_len / query_length) >= 0.2:
# keep the hit