The Icebox

Why is sequencing data poor quality at the start and end?

Sequencing data, when produced with Illumina technology, decreases in quality over the read length due to phasing. This occurs where a reversibly-terminating base attaches but does not lose its terminator / blocker group correctly. 1 Due to the blocker group still being present on the nucleotide, it is not possible to extend the DNA chain further, and the colour detected remains constant. If this happens many times, it can lead to poor base calling due to incorrect colours being detected for a given base. The likelihood of this occurring increases with longer read lengths.

At the start of sequencing reads, it is common to see poor quality data when using Sanger sequencing. This is because very short sequencing products don’t migrate through the electrophoresis medium in a reliable manner, giving unreliable results for 20-30 nucleotides. 2 At the end of Sanger sequencing traces, the chromatogram may show many small peaks, and the base calling software may not be able to determine which bases are present. Longer sequencing products are produced in low numbers, so there is a lower concentration of these molecules, giving the lower peaks. Resolving single base differences becomes more challenging as DNA fragments become longer. 2

  1. https://www.ecseq.com/support/ngs/why-does-the-sequence-quality-decrease-over-the-read-in-illumina

  2. https://www.azenta.com/blog/analyzing-sanger-sequencing-data