1. Descrition: KIJ_CD-HIT-100_Proteins and UHGP-100_unique proteins are merged, and identical proteins are de-replicated 2. Number of proteins: 107 million 3. Protein fasta file: KIJ-UHGP_unique_Proteins.faa 4. Cluster info file: KIJ-UHGP_unique_Proteins.cluster_info.tsv >format: 1st column - representative 2nd column - member proteins (separated by ';') >Representative protein is the longest sequence of the cluster.