Small rna analysis

Scripts found on fourierseq which are useful when performing small rna analysis, particularly using the ABI SREK (rna2map) pipeline.

findpositionmiRNA: Given the miRBase fasta files containing mature microRNA sequences (mature.fa) and hairpin sequences (hairpin.fa), will find the start location of the mature miRNA relative to the start of the hairpin (zero offset)
- Inputs: mature.fa; hairpin.fa (preferably filtered to contain sequences from the organism of interest), organism (the three letter abbreviation like hsa or mmu)
  Outputs: File with miRNAid\tmiRNAsequence\thairpinid\thairpinsequence\tstartposition
mapreads_interpreter_SREK: converts a SREK mapping output file into a tab-delimited info file.
- Inputs: SREK mapping output file (after extension); reference fasta file
  Outputs: Info file with readid\tgi#\tmismatches\tdirection\tstartlocation\tstart%\tend%\tcoverage%\tgenedescription\tgenelength\tmappinglength
mapreads_select_mismatches: will filter info file by number of mismatches
- Inputs: info file generated by mapreads_interepreter; mismatch cutoff
  Outputs: info file filtered to include only mappings with mismatches less than or equal to user specified cutoff
mapreads_select_by_length: will filter info filter by length
- Inputs: info file generated by mapreads_interpreter_SREK; minimum length; maximum length
  Outputs: info file filtered to include only results with mapping length within user specified cutoff
findmaturemicro_SREK_hsa: From SREK miRBase mapping results, will extract reads mapping within +-3bp of mature miRNA start sites and will provide read counts for each mature miRNA.
- Inputs: info file; file with location of mature miRNA relative to the hairpin (this is the output of findpositionmiRNA)
  Outputs: counts file, with read counts for each mature miRNA; file with information about the reads and the mature miRNAs they mapped to
combine_mutlicounts : to combine two files based on first column ( used to combine counts files generated by findmaturemicro_SREK above)
- Inputs: file1; file2 (used for two counts files); maximum number of columns in first file (first file can have any number of columns, but second file must have only two columns)
  Outputs: file resulting from combining file1 and file2 based on first column
  Note: Use this script multiple times to combine multiple counts files.

For generating miRNA read coverage graphs

These are scripts that can be used to generate simple read coverage graphs, one for each miRNA. However,these script will need to be modified according to the samples, miRNAs and files of interest.

gethsastart.sh : Generates histogram of read coverage for every mature miRNA specified. Needs to be modified to indicate the information file generated by findmaturemicro_SREK

plot_hist_hsa.R : Uses above generated histogram files to generate an R graph (output as a pdf file). Again needs modification to indicate the output from gethsastart.sh