Some tricks to preprocess/assess ABI SOLiD data
- Look for dominant sequences in your data
- grep -v '^>' F3.csfasta |sort|uniq -c -w 25|sort -n -r|head -20
-
- F3.csfasta : Input file- raw csfasta file from ABI SOLiD
- This command looks for dominant sequences with unique bases in the first 25 bases of the read - change 25 if you want more or less o the read to be considered when looking for dominant sequences.
Some tricks to preprocess/assess 454 data
- Make 454 data into format of one sequence per line
- makeSeqsOneLine 454.fna > 454.modified.fna
-
- 454.fna : Input file of raw 454 data
- 454.modified.fna : Output file of modified 454 data
- Pull out read sequences (with read id) containing a certain pattern (Let's say 'TAGGAC')
- grep -B 1 'TAGGAC' 454.modified.fna |grep -v '^-' > 454.pattern.fna
-
- 454.modified.fna : Modified 454 data
- 454.pattern.fna : Fasta file with only reads containing the specified pattern.
- Pull out read sequences (with read id) starting with a certain pattern (Let's say 'TAGGAC')
- grep -B 1 'TAGGAC' 454.modified.fna |grep -v '-' > 454.pattern.fna
-
- 454.modified.fna : Modified 454 data
- 454.pattern.fna : Fasta file with only reads starting with the specified pattern.
- To get the reverse complement sequences for a fasta file, run the following command on fourierseq:
-
- reversecomplement.pl test.fasta|sed 's/U/T/g' > test.revcomp.fasta
-
- test.fasta: Fasta input file
- test.revcomp.fasta : Fasta output file, with reverse complemented sequences