Skip to end of banner Go to start of banner

Tricks to preprocess SOLiD and 454 data

Skip to end of metadata

Created by Dhivya Arasappan, last modified on Dec 16, 2011

Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Some tricks to preprocess/assess ABI SOLiD data

Look for dominant sequences in your data
- grep -v '^>' F3.csfasta |sort|uniq -c -w 25|sort -n -r|head -20

- F3.csfasta : Input file- raw csfasta file from ABI SOLiD
- This command looks for dominant sequences with unique bases in the first 25 bases of the read - change 25 if you want more or less o the read to be considered when looking for dominant sequences.

Some tricks to preprocess/assess 454 data

Make 454 data into format of one sequence per line
- makeSeqsOneLine 454.fna > 454.modified.fna

- 454.fna : Input file of raw 454 data
- 454.modified.fna : Output file of modified 454 data

Pull out read sequences (with read id) containing a certain pattern (Let's say 'TAGGAC')
- grep -B 1 'TAGGAC' 454.modified.fna |grep -v '^-' > 454.pattern.fna

- 454.modified.fna : Modified 454 data
- 454.pattern.fna : Fasta file with only reads containing the specified pattern.

Pull out read sequences (with read id) starting with a certain pattern (Let's say 'TAGGAC')
- grep -B 1 '^{TAGGAC' 454.modified.fna |grep -v '}-' > 454.pattern.fna

- 454.modified.fna : Modified 454 data
- 454.pattern.fna : Fasta file with only reads starting with the specified pattern.

To get the reverse complement sequences for a fasta file, run the following command on fourierseq:

- reversecomplement.pl test.fasta|sed 's/U/T/g' > test.revcomp.fasta

- test.fasta: Fasta input file
- test.revcomp.fasta : Fasta output file, with reverse complemented sequences

No labels

Confluence Documentation | Web Privacy Policy | Web Accessibility