...
This data is derived from a human (HeLa) Ago2 CLIP-seq experiment (Ameyar-zazoua M et al., Nat Struct Mol Biol. 2012;19(10):998-1004), and since you already copied it over at the start of the last exercise, it should be at the path:
Code Block |
---|
ls $SCRATCH/ripseq_exercises/CLIP |
...
Results is a directory containing pre-made, well, results. The two BAM files are our test datasets, and they are a sub-sample of reads from only chr19. The ripseeker.cmds and ripseeker_script.R files are both commands files, which we will examine later. Before that, we need to make some adjustments so we can install some new libraries. First, execute these commands:
Code Block |
---|
cd $SCRATCH/ripseq_exercises/CLIP cp /scratch/02423/nsabell/RIPSeeker_1.4.0.tar.gz . cp /scratch/02423/nsabell/ripseeker_script.R . mkdir lib chmod 777 lib |
RIPSeeker
RIPSeeker is a Bioconductor R package, and we are going to run the job using the "Rscript" utility using the script and commands file that exist in the directory you just staged. Even on TINY files like the ones we are using, it will take a while to run, so we're going to kick it off now, and then look at what we're doing while it's processing.
...
Now, let's take a closer look at what we just did. Check out the "ripseeker_script.R" file by using cat or less:
Code Block |
---|
sourceinstall.packages(lib="http://bioconductor.org/biocLite.Rlib"), biocLite(pkgs="./RIPSeeker_1.4.0.tar.gz", repos=NULL) library(RIPSeeker, lib.loc="./lib") bamFiles = list.files(".", "\\.bam$", recursive=TRUE, full.names=TRUE) ripGal=combineAlignGals(bamFiles[2], genomeBuild="hg19") ctlGal=combineAlignGals(bamFiles[1], genomeBuild="hg19") cNAME="input" binSize=200 genomeBuild = "hg19" outDir <- file.path(getwd()) reverseComplement = FALSE uniqueHit = FALSE assignMultihits = FALSE rerunWithDisambiguatedMultihits = FALSE seekOut.AGO2 <- ripSeek(bamPath = bamFiles, cNAME = cNAME, reverseComplement = FALSE, genomeBuild = "hg19", \ uniqueHit = FALSE, assignMultihits = FALSE, rerunWithDisambiguatedMultihits = FALSE, binSize=binSize, outDir=outDir) |
Let's go through this section by section. First, we need to get the RIPSeeker packages from Bioconductor and load them into R.
Code Block |
---|
> sourceinstall.packages(lib="http://bioconductor.org/biocLite.Rlib") > biocLite("RIPSeeker", pkgs="./RIPSeeker_1.4.0.tar.gz", repos=NULL) > library(RIPSeeker, lib.loc="./lib") |
Then, we need to read in the BAM files we're going to use with their paths and use the RIPSeeker (or rather GenomicRanges) function "combineAlignGals" to generate a "gapped alignment" object in R using hg19 genomic and contig coordinates.
...
Expand | ||
---|---|---|
Go here to look at it in the genome browser - see anything interesting? |
We can also open R interactively to look at the contents of the big "seekOutput.RData" file we made. We need to use the R set up by the BioITeam, rather than the one in the module system, so we can use a version 3.x. These are a possible sequence of commands:
Code Block |
---|
cd $SCRATCH/ripseq_exercises/CLIP/
$BI/bin/R
#Now we are in R
library(RIPSeeker, lib.loc="./lib")
rip_data = load("seekOutput.RData")
rip
# [1] "mainSeekOutputRIP" "RIPGRList" "mainSeekOutputCTL"
# [4] "annotatedRIPGR" |
Here, "rip_data" is a list of four objects that we just loaded into R, and each contains different sub-objects and information.
Object | Component Objects | Information Content |
---|---|---|
mainSeekOutputRIP | nbhGRList alignGal alignGalFiltered | Information such as the HMM posterior probabilities and filtered alignment objects for the RIP sample |
RIPGRList | a GRangesList | A list (in GRanges object form) of all peak regions called by RIPSeeker |
mainSeekOutputCTL | nbhGRList alignGal alignGalFiltered | Information such as the HMM posterior probabilities and filtered alignment objects for the CTL sample |
annotatedRIPGR | sigGRangesAnnotated enrichedGO | Annotations for each of the regions flagged in RIPGRList, as well as calling of significantly enriched GO terms for those regions; this only works if "ChIPpeakAnno" is loaded and you set the functions biomaRt, biomaRt_dataset, and goAnno (see below) |
Alternatives
We could have added these lines to the ripseeker_script.R file:
...