If the protein length is not a multiple of 28, a full 56-mer is created at the C-terminus of the protein, potentially overlapping more than 28 amino acids of the previous tile. All three sub-libraries were reverse translated with the Python Pepsyn packages revtrans command to obtain 168 nucleotides long oligonucleotides. and compresses the proteome through epitope-stitching. Dolphyn compresses the size of a peptide library by 78% compared to traditional tiling, increasing the antibody-reactive peptides from 10% to 31%. We find that the immune system develops antibodies to human gut bacteria-infecting viruses, particularly and the (Methods and Fig.?1B, data available). The alanine scan library of peptides contains a series of peptides identical to each of the native peptide sequence but with triple alanine substitutions scanning from N- to C-terminus of each peptide. Triple alanine substitutions can interrupt antibody binding and thus reveal the precise determinants of an epitope within a longer peptide sequence. Physique?1C normalizes, centers, and overlays this information for SMAD9 all individuals with reactivity to a native public epitope. 80% of the mean reactivity curve spans 14 amino acid positions, suggesting that most linear public epitopes can be captured by peptides of this length. Figure?1D presents the results of the phylum. The predicted phage hosts in this cluster are primarily heatmap annotation indicates phages with genomes that had an alignment to VBY-825 an reference in the NCBI nt database. These alignments largely correspond to prophage sequences that have integrated into their host bacterium. Dolphyn libraries recover observations made with Pepsyn We confirmed that phages and peptides that showed frequent antibody reactivity (public epitope peptides) in a previous VirScan study7. Within each of these wildtype peptides, a series of shorter peptides of length 15, 20, 25, 30, 35, 40 and 45 amino acids were designed to tile across the initial 56 amino acid peptide in VBY-825 actions of 5 amino acids (Fig.?1B). In addition, each of the wildtype peptides was subject to triple alanine mutation scanning, including peptides identical to each of the native peptide sequence but with triple alanine substitutions scanning from N- to C-terminus of each peptide. This library contains 46,070 peptides, and has also been used by Shrock et al.14 with a different cohort. The amino acid sequences are available as row names within the Public Epitope Data Set file hfc_pubEpitopes.csv on Zenodo. Gut phage database – phageome pilot library (for evaluating the performance and demonstrating the power of the Dolphyn algorithm) This library contains 48,128 peptides that are 56 amino acids long and is divided in three subsets of peptides, representing the same 112 prevalent phages in 3354 protein cluster representatives: 1) 19,117 peptides (length?=?15 amino acids) that are likely to contain an epitope based on the random forest predictions (value?>?0.5). These encoding oligonucleotides are padded around the 5 end to make them the same length as the other two peptide libraries (56 amino acids), with three stop codons and a random sequence generated with VBY-825 a pseudo-random generator, i.e. the Python random.choice() function. 2) 5266 peptides designed with the Dolphyn algorithm. 15-mer epitope peptides are grouped if they are present together?on more than one protein. A is created for every three epitope 15-mers, that are available per protein group. The 15-mer having the highest-probability epitope goes first, then a GGGGS linker, then the 15-mer having the second highest probability, then a GGGGS linker, then the 15-mer having the third highest probability, then a stop codon, creating a peptide of 56 amino acid length. If two or more Dolphyn peptides are created per protein set, the second highest probability 15-mer gets the first position on the second peptide and all other epitopes are ranked and positioned accordingly. 3) 23,745 Pepsyn peptides created by tiling the protein sequence with 56 amino acid long peptides and overlapping by 28 amino acids. If the protein length is not a multiple of 28, a full 56-mer is created at the C-terminus of the protein, potentially overlapping more than 28 amino acids of the previous tile. All three sub-libraries were reverse translated.