...
Code Block |
---|
source("http://bioconductor.org/biocLite.R") biocLite("RIPSeeker") library(RIPSeeker) bamFiles = list.files(".", "\\.bam$", recursive=TRUE, full.names=TRUE) ripGal=combineAlignGals(bamFiles[2], genomeBuild="hg19") ctlGal=combineAlignGals(bamFiles[1], genomeBuild="hg19") cNAME="input" binSize=200 genomeBuild = "hg19" outDir <- file.path(getwd()) biomart="ensembl" biomaRt_dataset="hsapiens_gene_ensembl" reverseComplement = FALSE uniqueHit = FALSE assignMultihits = TRUE rerunWithDisambiguatedMultihits = TRUE seekOut.AGO2 <- ripSeek(bamPath = bamFiles, cNAME = cNAME, reverseComplement = FALSE, genomeBuild = "hg19", \ uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = FALSE, binSize=binSize, biomart=biomart, biomaRt_dataset = biomaRt_dataset, outDir=outDir) |
Now, before running anything, let's go through this section by section. First, we need to get the RIPSeeker packages from Bioconductor and load them into R.
...
Code Block |
---|
> binSize=NULL > cNAME="input" #Which files are input > genomeBuild = "hg19" #Genome to use > outDir <- file.path(getwd()) #Output file prefix (here, the current directory) > biomart="ensembl" #Source of genome annotations > biomaRt_dataset="hsapiens_gene_ensembl" #Specific dataset from biomaRt source > goAnno="org.Hs.eg.db" #Annotation database > reverseComplement = FALSE #Do not reverse complement BAM entries > uniqueHit = FALSE #If TRUE, use only unique hits to train initial model > assignMultihits = TRUE #If TRUE and uniqueHit is TRUE, assign multi-hits to regions based on initial model > rerunWithDisambiguatedMultihits = TRUE #If TRUE and prior two options are TRUE, re-train model after multi-hit assignment > seekOut.AGO2 <- ripSeek(bamPath = bamFiles, cNAME = cNAME, reverseComplement = FALSE, genomeBuild = "hg19", \ uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = FALSE, binSize=binSize, biomart=biomart, biomaRt_dataset = biomaRt_dataset, outDir=outDir) |
The way we're going to run this at TACC is by having the actual R commands in a script called ripseeker_script.R, which is then called by a commands file using Rscript, which is a utility to run R scripts from the command line (or, in this case, a batch job) when it can't be interactive.
...
The output should deposit several files in your current directory, which includes the following:
Code Block |
---|
$ |
Alternatives
We could have added these lines to the ripseeker_script.R file:
Code Block |
---|
biocLite("ChIPpeakAnno")
library(ChIPpeakAnno)
> biomart="ensembl" #Source of genome annotations
> biomaRt_dataset="hsapiens_gene_ensembl" #Specific dataset from biomaRt source
> goAnno="org.Hs.eg.db" #Annotation database |
Which would automatically annotate the resulting peak coordinates with their gene names. We didn't do this because the package that is used, ChIPpeakAnno, has quite a few dependencies and takes a while to download (maybe ~15 minutes) and run.