...
This data is derived from a human (HeLa) Ago2 CLIP-seq experiment (Ameyar-zazoua M et al., Nat Struct Mol Biol. 2012;19(10):998-1004), and since you already copied it over at the start of the last exercise, it should be at the path:
Code Block |
---|
ls $SCRATCH/ripseq_exercises/CLIP |
...
Now, let's take a closer look at what we just did. Check out the "ripseeker_script.R" file by using cat or less:
Code Block |
---|
sourceinstall.packages(lib="http://bioconductor.org/biocLite.Rlib"), biocLite(pkgs="./RIPSeeker_1.4.0.tar.gz", repos=NULL) library(RIPSeeker, lib.loc="./lib") bamFiles = list.files(".", "\\.bam$", recursive=TRUE, full.names=TRUE) ripGal=combineAlignGals(bamFiles[2], genomeBuild="hg19") ctlGal=combineAlignGals(bamFiles[1], genomeBuild="hg19") cNAME="input" binSize=200 genomeBuild = "hg19" outDir <- file.path(getwd()) reverseComplement = FALSE uniqueHit = FALSE assignMultihits = FALSE rerunWithDisambiguatedMultihits = FALSE seekOut.AGO2 <- ripSeek(bamPath = bamFiles, cNAME = cNAME, reverseComplement = FALSE, genomeBuild = "hg19", \ uniqueHit = FALSE, assignMultihits = FALSE, rerunWithDisambiguatedMultihits = FALSE, binSize=binSize, outDir=outDir) |
Let's go through this section by section. First, we need to get the RIPSeeker packages from Bioconductor and load them into R.
Code Block |
---|
> sourceinstall.packages(lib="http://bioconductor.org/biocLite.Rlib") > biocLite("RIPSeeker", pkgs="./RIPSeeker_1.4.0.tar.gz", repos=NULL) > library(RIPSeeker, lib.loc="./lib") |
Then, we need to read in the BAM files we're going to use with their paths and use the RIPSeeker (or rather GenomicRanges) function "combineAlignGals" to generate a "gapped alignment" object in R using hg19 genomic and contig coordinates.
...
Expand | ||
---|---|---|
Go here to look at it in the genome browser - see anything interesting? |
We can also open R interactively to look at the contents of the big "seekOutput.RData" file we made. We need to use the R set up by the BioITeam, rather than the one in the module system, so we can use a version 3.x. These are a possible sequence of commands:
Code Block |
---|
cd $SCRATCH/ripseq_exercises/CLIP/
$BI/bin/R
#Now we are in R
library(RIPSeeker, lib.loc="./lib")
rip_data = load("seekOutput.RData")
rip
# [1] "mainSeekOutputRIP" "RIPGRList" "mainSeekOutputCTL"
# [4] "annotatedRIPGR" |
Here, "rip_data" is a list of four objects that we just loaded into R, and each contains different sub-objects and information.
Object | Component Objects | Information Content |
---|---|---|
mainSeekOutputRIP | nbhGRList alignGal alignGalFiltered | Information such as the HMM posterior probabilities and filtered alignment objects for the RIP sample |
RIPGRList | a GRangesList | A list (in GRanges object form) of all peak regions called by RIPSeeker |
mainSeekOutputCTL | nbhGRList alignGal alignGalFiltered | Information such as the HMM posterior probabilities and filtered alignment objects for the CTL sample |
annotatedRIPGR | sigGRangesAnnotated enrichedGO | Annotations for each of the regions flagged in RIPGRList, as well as calling of significantly enriched GO terms for those regions; this only works if "ChIPpeakAnno" is loaded and you set the functions biomaRt, biomaRt_dataset, and goAnno (see below) |
Alternatives
We could have added these lines to the ripseeker_script.R file:
...