Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
titleset up idev session in another tab
ssh <username>@stampede2.tacc.utexas.edu

idev -m 120 -q development -A UT-2015-05-18 -r intro_NGSBIO_DATA_week_1
Code Block
titleGet set up for the exercise
 #We will be doing all this in the idev session
 cds
 cd my_rnaseq_course/day_4_partA/wgcna
 module load Rstats
 module load RstatsPackages
 R
Code Block
titleInstall required R packages into your personal library
#These commands will work for any Bioconductor package!
R
 
#within R
source("http://bioconductor.org/biocLite.R")
biocLite("WGCNA")
install.packages("flashClust") 
 session
 cds
 cd my_rnaseq_course/day_4_partA/wgcna


Explanation of sample dataset: 
Time series of coral larval development from 4 hours post fertilization (Day 0) to 245 hours post fertilization (Day 12). Multiple other quantitative traits were measured through the time series. Only green and red fluorescence are added as quantitative traits in the sample dataset. Dataset has 48 samples total, four replicates (A-D) over 12 days. The goal is to find genes that correlate with developmental traits through time and differences in gene expression between early larval development and late larval development. 

...

Code Block
titleLoad data into WGCNA and reformat
# Only run the following commands once to install WGCNA and flashClust on your computer
#source("http://bioconductor.org/biocLite.R")
#biocLite("WGCNA")
#install.packages("flashClust") 


# Load WGCNA and flashClust libraries every time you open R
library(WGCNA)
library(flashClust)


# Uploading data into R and formatting it for WGCNA 
# This creates an object called "datExpr" that contains the normalized counts file output from DESeq2
datExpr = read.csv("SampleTimeSeriesRLD.csv")
# "head" the file to preview it
head(datExpr) # You see that genes are listed in a column named "X" and samples are in columns


# Manipulate file so it matches the format WGCNA needs 
row.names(datExpr) = datExpr$X
datExpr$X = NULL
datExpr = as.data.frame(t(datExpr)) # now samples are rows and genes are columns
dim(datExpr) # 48 samples and 1000 genes (you will have many more genes in reality)


# Run this to check if there are gene outliers
gsg = goodSamplesGenes(datExpr, verbose = 3)
gsg$allOK 


#If the last statement returns TRUE, all genes have passed the cuts. If not, we remove the offending genes and samples from the data with the following:
#if (!gsg$allOK)
#	{if (sum(!gsg$goodGenes)>0)
#		printFlush(paste("Removing genes:", paste(names(datExpr)[!gsg$goodGenes], collapse= ", ")));
#		if (sum(!gsg$goodSamples)>0)
#			printFlush(paste("Removing samples:", paste(rownames(datExpr)[!gsg$goodSamples], collapse=", ")))
#		datExpr= datExpr[gsg$goodSamples, gsg$goodGenes]
#		}


#Create an object called "datTraits" that contains your trait data
datTraits = read.csv("Traits_23May2015.csv")
head(datTraits)
#form a data frame analogous to expression data that will hold the clinical traits.
rownames(datTraits) = datTraits$Sample
datTraits$Sample = NULL
table(rownames(datTraits)==rownames(datExpr)) #should return TRUE if datasets align correctly, otherwise your names are out of order
head(datTraits)


# You have finished uploading and formatting expression and trait data
# Expression data is in datExpr, corresponding traits are datTraits


save(datExpr, datTraits, file="SamplesAndTraits.RData")
#load("SamplesAndTraits.RData")
 

...