...
Login to ls6, start and idev session, then load the BioContainers bedtools module, and then check its version.
Code Block | ||||
---|---|---|---|---|
| ||||
idev -m 120 -N 1 -A OTH21164 -r CoreNGS-Fri # or -A TRA23004 # or idev -m 90 -N 1 -A OTH21164 -p development # or -A TRA23004 module load biocontainers module load bedtools bedtools --version # should be bedtools v2.27.1 |
Input format considerations
- Most BEDTools functions now accept either BAM or BED files as input.
- BED format files must be BED3+, or BED6+ if strand-specific operations are requested.
- When comparing against a set of regions, those regions are usually supplied in either BED or GTF/GFF.
- All text-format input files (BED, GTF/GFF, VCF) should use Unix line endings (linefeed only).
...
Which type of RNA-seq library you have depends on the library preparation method – so ask your sequencing center! Our yeast RNA-seq library is sense stranded (; however note that most RNA-seq libraries these days, including ones prepared by GSAF, are antisense stranded).
If you have a stranded RNA-seq library, you should use either -s or -S to avoid false counting against a gene on the wrong strand.
...
- seqname - The name of the chromosome or contig.
- source - Name of the program that generated this feature, or other data source (e.g. public database)
- feature_type - Type of the feature, for example:
- chromosome
- CDS (coding sequence), exon
- gene, transcript
- start_codon, stop_codon
- start - Start position of the feature, with sequence numbering starting at 1.
- end - End position of the feature, with sequence numbering starting at 1.
- score - A numeric value. Often but not always an integer. Meaning differs and not usually important.
- strand - Defined as + (forward), - (reverse), or . (no relevant strand)
- frame - For a CDS, one of 0, 1 or 2, specifying the reading frame of the first base; otherwise '.'
...
Expand | |||||||
---|---|---|---|---|---|---|---|
| |||||||
|
...
One of the first things you want to know about your annotation file is what gene features it contains. Here's how to find that: (Read more about what's going on here at piping a histogram)
Expand | |||||
---|---|---|---|---|---|
| |||||
|
Read more about what's going on here at piping a histogram.
Code Block | ||||
---|---|---|---|---|
| ||||
cd $SCRATCH/core_ngs/bedtools cat sacCer3.R64-1-1_20110208.gff | grep -v '^#' | cut -f 3 | \ sort | uniq -c | sort -k1,1nr | more |
...
Code Block |
---|
chrI 334 649 YAL069W 315 + YAL069W Dubious chrI 537 792 YAL068W-A 255 255 + YAL068W-A Dubious chrI 1806 2169 YAL068C 363 - PAU8 Verified chrI 2479 2707 YAL067W-A 228 + YAL067W-A Uncharacterized chrI 7234 9016 YAL067C 1782 - SEO1 Verified chrI 10090 10399 YAL066W 309 + YAL066W Dubious chrI 11564 11951 YAL065C 387 - YAL065C Uncharacterized chrI 12045 12426 YAL064W-B 381 + YAL064W-B Uncharacterized chrI 13362 13743 YAL064C-A 381 - YAL064C-A Uncharacterized chrI 21565 21850 YAL064W 285 + YAL064W Verified chrI 22394 22685 YAL063C-A 291 - YAL063C-A Uncharacterized chrI 23999 27968 YAL063C 3969 - FLO9 Verified chrI 31566 32940 YAL062W 1374 + GDH3 Verified chrI 33447 34701 YAL061W 1254 + BDH2 Uncharacterized chrI 35154 36303 YAL060W 1149 + BDH1 Verified chrI 36495 36918 YAL059C-A 423 - YAL059C-A Dubious chrI 36508 37147 YAL059W 639 + ECM1 Verified chrI 37463 38972 YAL058W 1509 + CNE1 Verified chrI 38695 39046 YAL056C-A 351 - YAL056C-A Dubious chrI 39258 41901 YAL056W 2643 + GPB2 Verified |
Note that value in the 8th column. In the yeast annotations from SGD there are 3 gene classifications: Verified, Uncharacterized and Dubious. The Dubious ones have no experimental evidence so are generally excluded.
...