Transcriptome assembly from NGS data

As we discussed in the mapping unit, RNA-seq data allows for the simultaneous discovery/identification of expressed genes and quantification of those genes' abundances in the sample. When a reference genome is available, software tools such as Tophat and Cufflinks provide the quickest and potentially most accurate way of identifying transcripts.

When a reference genome is not available, transcripts must be assembled de novo. The challenges with transcriptome assembly are similar to those encountered in genome assembly, but the differences are substantial enough that you cannot simply use a genome assembler to perform transcriptome assembly. The following publications describe assembly tools designed specifically for the challenge of assembling transcripts from RNA-seq data de novo. Please read these papers as an introduction to the topic.

Common assembly programs

Feel free to add to this list!

