This is a sitemap over all available pages ordered by namespaces.
DNA sequencing is the process of determining the precise order of nucleotides in a DNA molecule. Sequences reported by a DNA sequencing machine are typically very short, usually between 100-250 bp but sometimes as short as 25-35 bp. These sequences have a variety of useful research applications, most of which are based on reconstructing complete molecules (such as chromosomes or mRNAs) from smaller fragments or quantifying the relative abundance of a large number of molecules simultaneously.
The following paper provides an excellent review of DNA sequencing, highlighting the advent of NGS technologies in the last decade and their impact on genomics. Please read the paper in preparation for class on Wednesday January 21st.
Although DNA has a beautiful and intricate chemical structure, when working with sequence data we ignore its atomic structure and instead focus at the resolution of entire nucleotides. Since there are only 4 nucleotides, we can use a tiny alphabet of 4 symbols to represent any DNA sequence: A for adenine, C for cytosine, G for guanine, and T for thymine.
Data files produced by sequencing machines contain many reads. Each read is the instrument's readout of the nucleotide sequence of a single DNA fragment, encoded in a string of As, Cs, Gs, and Ts. See this page for more information about common sequence data formats.
How can you be sure the reads reported by the sequencing instrument are from your sample of interest and not from, for example, primers or barcode sequences? Can you identify and correct any errors in the reads? Quality control is always an important first step when you are working with NGS data. See this page for more information about the types of quality control you should consider.