Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
source("http://bioconductor.org/biocLite.R")
biocLite("RIPSeeker")
library(RIPSeeker)
bamFiles = list.files(".", "\\.bam$", recursive=TRUE, full.names=TRUE)
ripGal=combineAlignGals(bamFiles[2], genomeBuild="hg19")
ctlGal=combineAlignGals(bamFiles[1], genomeBuild="hg19")

cNAME="input"
binSize=200
genomeBuild = "hg19"
outDir <- file.path(getwd())
biomart="ensembl"
biomaRt_dataset="hsapiens_gene_ensembl"
reverseComplement = FALSE
uniqueHit = FALSE
assignMultihits = TRUE
rerunWithDisambiguatedMultihits = TRUE

seekOut.AGO2 <- ripSeek(bamPath = bamFiles, cNAME = cNAME, reverseComplement = FALSE, genomeBuild = "hg19", \
uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = FALSE, binSize=binSize, biomart=biomart, biomaRt_dataset = biomaRt_dataset, outDir=outDir)

Now, before running anything, let's go through this section by section.  First, we need to get the RIPSeeker packages from Bioconductor and load them into R.

...

Code Block
> binSize=NULL
> cNAME="input"  #Which files are input
> genomeBuild = "hg19"   #Genome to use
> outDir <- file.path(getwd())  #Output file prefix (here, the current directory)
> biomart="ensembl"   #Source of genome annotations
> biomaRt_dataset="hsapiens_gene_ensembl"   #Specific dataset from biomaRt source
> goAnno="org.Hs.eg.db"   #Annotation database
> reverseComplement = FALSE   #Do not reverse complement BAM entries
> uniqueHit = FALSE   #If TRUE, use only unique hits to train initial model
> assignMultihits = TRUE  #If TRUE and uniqueHit is TRUE, assign multi-hits to regions based on initial model
> rerunWithDisambiguatedMultihits = TRUE   #If TRUE and prior two options are TRUE, re-train model after multi-hit assignment
 
> seekOut.AGO2 <- ripSeek(bamPath = bamFiles, cNAME = cNAME, reverseComplement = FALSE, genomeBuild = "hg19", \
uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = FALSE, binSize=binSize, biomart=biomart, biomaRt_dataset = biomaRt_dataset, outDir=outDir)

The way we're going to run this at TACC is by having the actual R commands in a script called ripseeker_script.R, which is then called by a commands file using Rscript, which is a utility to run R scripts from the command line (or, in this case, a batch job) when it can't be interactive.

...

The output should deposit several files in your current directory, which includes the following:

Code Block
$ 

 

 

 

 

Alternatives

We could have added these lines to the ripseeker_script.R file:

Code Block
biocLite("ChIPpeakAnno")
library(ChIPpeakAnno)
 
> biomart="ensembl"   #Source of genome annotations
> biomaRt_dataset="hsapiens_gene_ensembl"   #Specific dataset from biomaRt source
> goAnno="org.Hs.eg.db"   #Annotation database

Which would automatically annotate the resulting peak coordinates with their gene names.  We didn't do this because the package that is used, ChIPpeakAnno, has quite a few dependencies and takes a while to download (maybe ~15 minutes) and run.