by Jessie L. Maier, Craig Gin, Benjamin Callahan, Emma K. Sheriff, Breck A. Duerkop, Manuel Kleiner
Salmonella enterica Serovar Typhimurium (Salmonella) and its bacteriophage P22 are a model system for the study of horizontal gene transfer by generalized transduction. Typically, the P22 DNA packaging machinery initiates packaging when a short sequence of DNA, known as the pac site, is recognized on the P22 genome. However, sequences similar to the pac site in the host genome, called pseudo-pac sites, lead to erroneous packaging and subsequent generalized transduction of Salmonella DNA. While the general genomic locations of the Salmonella pseudo-pac sites are known, the sequences themselves have not been determined. We used visualization of P22 sequencing reads mapped to host Salmonella genomes to define regions of generalized transduction initiation and the likely locations of pseudo-pac sites. We searched each genome region for the sequence with the highest similarity to the P22 pac site and aligned the resulting sequences. We built a regular expression (sequence match pattern) from the alignment and used it to search the genomes of two P22-susceptible Salmonella strains—LT2 and 14028S—for sequence matches. The final regular expression successfully identified pseudo-pac sites in both LT2 and 14028S that correspond with generalized transduction initiation sites in mapped read coverages. The pseudo-pac site sequences identified in this study can be used to predict locations of generalized transduction in other P22-susceptible hosts or to initiate generalized transduction at specific locations in P22-susceptible hosts with genetic engineering. Furthermore, the bioinformatics approach used to identify the Salmonella pseudo-pac sites in this study could be applied to other phage—host systems.