Exercise #2: HITS-CLIP

Objectives

In this exercise, you will use a recently developed R package called RIPSeeker to identify binding regions in a small CLIP-seq dataset.

Primary Data

Ameyar-zazoua M, Rachez C, Souidi M, et al. Argonaute proteins couple chromatin silencing to alternative splicing. Nat Struct Mol Biol. 2012;19(10):998-1004.

RIPSeeker

RIPSeeker is a Bioconductor R package, and we are going to run the job using the "Rscript" utility using the script and commands file that exist in the directory you just staged. Check out the "ripseeker_script.R" file.

source("http://bioconductor.org/biocLite.R")
biocLite("RIPSeeker")
library(RIPSeeker)

bamFiles = list.files(".", "\\.bam$", recursive=TRUE, full.names=TRUE)
ripGal=combineAlignGals(bamFiles[2], genomeBuild="hg19")
ctlGal=combineAlignGals(bamFiles[1], genomeBuild="hg19")

cNAME="input"
genomeBuild = "hg19"
outDir <- file.path(getwd())
biomart="ensembl"
biomaRt_dataset="hsapiens_gene_ensembl"
goAnno="org.Hs.eg.db"
reverseComplement = FALSE
uniqueHit = FALSE
assignMultihits = TRUE
rerunWithDisambiguatedMultihits = TRUE

seekOut.AGO2 <- ripSeek(bamPath = bamFiles, cNAME = cNAME, reverseComplement = FALSE, genomeBuild = "hg19", uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = FALSE, binSize=binSize, biomart=biomart, biomaRt_dataset = biomaRt_dataset, goAnno = goAnno, outDir=outDir)

Now, before running anything, let's go through this section by section. First, we need to get the RIPSeeker packages from Bioconductor and load them into R.

> source("http://bioconductor.org/biocLite.R")
> biocLite("RIPSeeker")
> library(RIPSeeker)

Then, we need to read in the BAM files we're going to use with their paths and use the RIPSeeker (or rather GenomicRanges) function "combineAlignGals" to generate a "gapped alignment" object in R.

> bamFiles = list.files(".", "\\.bam$", recursive=TRUE, full.names=TRUE)
> ripGal=combineAlignGals(bamFiles[2], genomeBuild="hg19")
Processing ./ip.sort.bam ... All hits are returned with flags.
1 BAM files are combined
> ctlGal=combineAlignGals(bamFiles[1], genomeBuild="hg19")
Processing ./input.sort.bam ... All hits are returned with flags.
1 BAM files are combined

Now, we're ready to execute the primary RIPSeeker function, appropriately called ripSeek(). Since it takes so many options, we're going to declare a bunch of variables before executing the command, and then have the command refer back to those already-declared variables. This is what it would look like:

> cNAME="input"  #Which files are input
> genomeBuild = "hg19"   #Genome to use
> outDir <- file.path(getwd())  #Output file prefix (here, the current directory)
> biomart="ensembl"   #Source of genome annotations
> biomaRt_dataset="hsapiens_gene_ensembl"   #Specific dataset from biomaRt source
> goAnno="org.Hs.eg.db"   #Annotation database
> reverseComplement = FALSE   #Do not reverse complement BAM entries
> uniqueHit = FALSE   #If TRUE, use only unique hits to train initial model
> assignMultihits = TRUE  #If TRUE and uniqueHit is TRUE, assign multi-hits to regions based on initial model
> rerunWithDisambiguatedMultihits = TRUE   #If TRUE and prior two options are TRUE, re-train model after multi-hit assignment
 
> seekOut.AGO2 <- ripSeek(bamPath = bamFiles, cNAME = cNAME, reverseComplement = FALSE, genomeBuild = "hg19", uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = FALSE, binSize=200, biomart=biomart, biomaRt_dataset = biomaRt_dataset, goAnno = goAnno, outDir=outDir)

If we execute the ripSeek() command, we get an output that looks something like this:

source("http://bioconductor.org/biocLite.R")
biocLite("RIPSeeker")
library(RIPSeeker)
 
bamFiles = list.files(".", "\\.bam$", recursive=TRUE, full.names=TRUE)
ripGal=combineAlignGals(bamFiles[2], genomeBuild="hg19")
ctlGal=combineAlignGals(bamFiles[1], genomeBuild="hg19")
cNAME="input"
genomeBuild = "hg19"
binSize=200
outDir <- file.path(getwd())
biomart="ensembl"
biomaRt_dataset="hsapiens_gene_ensembl"
goAnno="org.Hs.eg.db"
reverseComplement = FALSE
uniqueHit = FALSE
assignMultihits = TRUE
rerunWithDisambiguatedMultihits = TRUE
seekOut.AGO2 <- ripSeek(bamPath = bamFiles, cNAME = cNAME, reverseComplement = FALSE, genomeBuild = "hg19", uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = FALSE, binSize=binSize, biomart=biomart, biomaRt_dataset = biomaRt_dataset, goAnno = goAnno, outDir=outDir)

Exercise #2: HITS-CLIP

Primary Data

RIPSeeker

Alternatives