Mapping: alignment of short NGS reads

The most fundamental task in analyzing NGS data is read alignment or read mapping: determining the region of the genome from which the read was derived. This technique has a variety of applications for studying genome biology.

  • Mapping genomic DNA reads to the genome can reveal variation between individuals, such as single nucleotide polymorphisms (SNPs), small insertions/deletions (indels), and large-scale variations in genome structure.
  • Genomic DNA treated with bisulfite will convert methylated cytosines to uracil but will leave methylated cytosines alone. Thus, mapping sequence data obtained from bisulfite-treated DNA back to the genome can identify sites of methylation.
  • Mapping RNA-seq data to a reference genome and assembling transcript models from the aligned reads is a common method for gene expression profiling. The reads not only reconstruct transcript sequences, but also measure the relative abundance of each transcript in the sample. This technique is most commonly used to profile gene expression in different samples (tissues, conditions, or treatments) and identify genes that are present at different abundances in different samples (differential expression analysis).
  • If high-quality transcript sequences are already available, mapping RNA-seq reads directly to the transcripts (instead of the genome) is an alternative for measuring transcript abundance.

Efficient algorithms for read mapping and new applications of the technique are topics of intense research interest—many mapping tools already exist, and new ones are being developed and published as we speak. This review paper provides a pretty thorough overview of the topic: a definition of the read mapping problem, proposed approaches to solving that problem, and features of specific implementations (software tools). The authors also created a website, which they have maintained since the paper's publication, that provides a comprehensive list of read mapping software.

