a. De-novo genome assembly Raw WMS reads Quality Control Trimommatic Bowtie2 Quality controlled reads Assembly MetaSPAdes MEGAHIT Contigs Binning (MetaBAT2, MaxBin2, CONCOCT)→MetaWRAP Genome Bins Bin QC CheckM UHGG All MAG (286,997) UHGG (4,644) KIJ Genomes (29,082) Dereplication of genomes (1st) Mash dRep KIJ Species Representatives (2,199) Dereplication of genomes (2nd) Mash dRep HRGM Genomes (5,414) b. Genome catalog Coding sequence prediction Prodigal KIJ redundant Proteins (64.7M) UHGP redundant Proteins (625.3M) Redundancy removal (1st) CD-HIT KIJ Proteins CD - HIT 100 (20.6M) UHGP-100 (170.6M) Redundancy removal (2nd) CD-HIT HRGM Proteins 100% (103.7M) 95% (20.0M) 90% (14.8M) 70% (8.5M) 50% (4.7M) c. Protein catalog Intermediate data UHGG data HRGM data Data processing and software