Error-correcting DNA barcodes for high-throughput sequencing

9 Jul 2018, 18:00
2h
Holme Building/--The Refectory (University of Sydney)

Holme Building/--The Refectory

University of Sydney

20
Board: 114
Poster Presentation Biochemistry and Cell Biology Poster Session

Speaker

John Hawkins (Institute for Computational Engineering and Science, Department of Molecular Biosciences, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA)

Description

Many large-scale high-throughput experiments use DNA barcodes—short DNA sequences prepended to DNA libraries—for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely-used error-correcting codes borrowed from computer science (e.g., Hamming and Levenshtein codes) do not properly account for insertions and deletions in DNA barcodes, even though deletions are the most common type of synthesis error. We present and experimentally validate FREE (Filled/truncated Right End Edit) barcodes, which correct substitution, insertion, and deletion errors, even when these errors alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced GC content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate lists of barcodes with different lengths and error-correction levels that may be useful in diverse high-throughput applications, including $>10^6$ single-error correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, concatenating two or more FREE codes into a single barcode increases the available barcode space combinatorially, generating lists with $>10^{15}$ error-correcting barcodes. Our software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community.

Primary authors

John Hawkins (Institute for Computational Engineering and Science, Department of Molecular Biosciences, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA) Stephen Jones, Jr. (Department of Molecular Biosciences and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA) Ilya Finkelstein (Department of Molecular Biosciences, Institute for Cellular and Molecular Biology, and Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, TX 78712, USA) William Press (Institute for Computational Engineering and Science, Institute for Cellular and Molecular Biology, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX 78712, USA)

Presentation Materials

There are no materials yet.