Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

This data is derived from a human (HeLa) Ago2 CLIP-seq experiment (Ameyar-zazoua M et al., Nat Struct Mol Biol. 2012;19(10):998-1004), and since you already copied it over at the start of the last exercise, it should be at the path:

Code Block
ls $SCRATCH/ripseq_exercises/CLIP

...

Results is a directory containing pre-made, well, results.  The two BAM files are our test datasets, and they are a sub-sample of reads from only chr19.  The ripseeker.cmds and ripseeker_script.R files are both commands files, which we will examine later.  Before that, we need to make some adjustments so we can install some new libraries.  First, execute these commands:

Code Block
cd $SCRATCH/ripseq_exercises/CLIP
cp /scratch/02423/nsabell/RIPSeeker_1.4.0.tar.gz .
cp /scratch/02423/nsabell/ripseeker_script.R .
mkdir lib
chmod 777 lib

RIPSeeker

RIPSeeker is a Bioconductor R package, and we are going to run the job using the "Rscript" utility using the script and commands file that exist in the directory you just staged.  Even on TINY files like the ones we are using, it will take a while to run, so we're going to kick it off now, and then look at what we're doing while it's processing.

...

Now, let's take a closer look at what we just did.  Check out the "ripseeker_script.R" file by using cat or less:

Code Block
sourceinstall.packages(lib="http://bioconductor.org/biocLite.Rlib"), biocLite(pkgs="./RIPSeeker_1.4.0.tar.gz", repos=NULL)
library(RIPSeeker, lib.loc="./lib")
bamFiles = list.files(".", "\\.bam$", recursive=TRUE, full.names=TRUE)
ripGal=combineAlignGals(bamFiles[2], genomeBuild="hg19")
ctlGal=combineAlignGals(bamFiles[1], genomeBuild="hg19")

cNAME="input"
binSize=200
genomeBuild = "hg19"
outDir <- file.path(getwd())
reverseComplement = FALSE
uniqueHit = FALSE
assignMultihits = FALSE
rerunWithDisambiguatedMultihits = FALSE

seekOut.AGO2 <- ripSeek(bamPath = bamFiles, cNAME = cNAME, reverseComplement = FALSE, genomeBuild = "hg19", \
uniqueHit = FALSE, assignMultihits = FALSE, rerunWithDisambiguatedMultihits = FALSE, binSize=binSize, outDir=outDir)

Let's go through this section by section.  First, we need to get the RIPSeeker packages from Bioconductor and load them into R.

Code Block
> sourceinstall.packages(lib="http://bioconductor.org/biocLite.Rlib")
> biocLite("RIPSeeker", pkgs="./RIPSeeker_1.4.0.tar.gz", repos=NULL)
> library(RIPSeeker, lib.loc="./lib")

Then, we need to read in the BAM files we're going to use with their paths and use the RIPSeeker (or rather GenomicRanges) function "combineAlignGals" to generate a "gapped alignment" object in R using hg19 genomic and contig coordinates.

...

Expand
Code Block
cat RIPregions.gff3
chr19	rtracklayer	sequence_feature	47545201	47545400	.	*	.	count=829; fpk=4145; logOddScore=65.8422503917559; pval=0; pvalAdj=0; CTL_count=1491; CTL_fpk=7455; CTL_logOddScore=-61.5676075214777; eFDR=0

Go here to look at it in the genome browser - see anything interesting?

We can also open R interactively to look at the contents of the big "seekOutput.RData" file we made.  We need to use the R set up by the BioITeam, rather than the one in the module system, so we can use a version 3.x.  These are a possible sequence of commands:

Code Block
cd $SCRATCH/ripseq_exercises/CLIP/
$BI/bin/R
 
#Now we are in R
 
library(RIPSeeker, lib.loc="./lib")
rip_data = load("seekOutput.RData")
rip
# [1] "mainSeekOutputRIP" "RIPGRList"         "mainSeekOutputCTL"
# [4] "annotatedRIPGR"

Here, "rip_data" is a list of four objects that we just loaded into R, and each contains different sub-objects and information.

ObjectComponent ObjectsInformation Content
mainSeekOutputRIP

nbhGRList

alignGal

alignGalFiltered

Information such as the HMM posterior probabilities and filtered alignment objects for the RIP sample
RIPGRLista GRangesListA list (in GRanges object form) of all peak regions called by RIPSeeker
mainSeekOutputCTL

nbhGRList

alignGal

alignGalFiltered

Information such as the HMM posterior probabilities and filtered alignment objects for the CTL sample
annotatedRIPGR

sigGRangesAnnotated

enrichedGO

Annotations for each of the regions flagged in RIPGRList, as well as calling of significantly enriched GO terms for those regions; this only works if "ChIPpeakAnno" is loaded and you set the functions biomaRt, biomaRt_dataset, and goAnno (see below)

Alternatives

We could have added these lines to the ripseeker_script.R file:

...