1. Descrition: KIJ_unique_Proteins.faa are further clustered with CD-HIT. CD-HIT options -c 1.0 -aS 0.8 -n 5 2. Number of proteins: 20,662,850 3. Protein fasta file: KIJ_CD-HIT-100_Proteins.faa 4. Cluster info file: KIJ_CD-HIT-100_Proteins.cluster_info.tsv >format: 1st column - representative 2nd column - member proteins (separated by ';') >Representative protein is the longest sequence of the cluster.