Transcriptome Assembly

Transcriptome Assembly

Assembly of RNA-seq short reads into a transcriptome. 12 hour minimum ($876 internal, $1,116 external) per project.

1. Quality Assessment

Quality of data assessed by FastQC.

  • Deliverables
    • Reports generated by FastQC.
  • Tools Used
    • FastQC: (Andrews 2010) used to generate quality summaries of data:
      • Per base sequence quality report: useful for deciding if trimming necessary.
      • Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
      • Overrepresented sequences: evaluation of adapter contamination.

2. Assembly

We use community standard assemblers to generate a de novo assembly. Assembly is a very computationally complex task, and may not finish within the time limits imposed on compute jobs at TACC, especially for large data sets. If an initial assembly run doesn't complete within TACC time limits, we employ a variety of strategies such as in silico normalization to get a complete assembly.

  • Deliverables
    • FASTA file of assembly.

    • If we are unable to finish an assembly, no charge.

  • Tools Used
    • Trinity for eukaryotes
    • rnaSPAdes for bacteria

3. Optional: Homology Against Standard Databases

We can take a completed assembly and BLAST against UniProt or HMMER against Pfam for an additional charge. These homology searches will give some indication of what the assembled transcripts represent.

  • Deliverables
    • BLAST against UniProt table with the option of appending the best hits to the FASTA file tags.

    • HMMER against Pfam table with the option of appending the best hits to the FASTA file tags.

  • Tools Used
    • BLASTx (Altschul, et al 1997) for nucleotide-to-protein homology search in the UniProt protein database.
    • hmmscan (Eddy, 1998) for HMM-based homology search against the Pfam database of proteins and protein domains.