Practice time/Bring your own RNA-Seq Data
IF YOU HAVE YOUR OWN DATA:
- Make sure data is on stampede2 and in your scratch directory
- Make sure genome/transcriptome files are on stampede2 and in your scratch directory. If you need to download reference files, some options include:
- Make sure the files are readable
IF YOU DON'T HAVE YOUR OWN DATA:
You have several choices for datasets to play with, depending on the stage in the analysis that you would like to work on.
RAW DATA: Start with 6 raw fastq files from the following dataset:
Bottomley et Al mouse dataset SRA026846.1 (http://dx.doi.org/10.1371/journal.pone.0017820)
Single end RNA-Seq data generated on Illumina GAIIx for 2 strains of mice (B6 and D2) to detect differential striatal gene expression between the two nbred mouse strains.
We have provided three B6 and three D2 fastq files for you to work with as well as the MM10 reference files.Get the datacds cd my_rnaseq_course cp -r /work2/projects/BioITeam/projects/courses/rnaseq_course/day_4_bottomley_raw_data . &
MAPPED FILES: Start with 6 bam files (mapped to the mouse MM9 genome) from the following dataset:
Bottomley et Al mouse dataset SRA026846.1 (http://dx.doi.org/10.1371/journal.pone.0017820)
Single end RNA-Seq data generated on Illumina GAIIx for 2 strains of mice (B6 and D2) to detect differential striatal gene expression between the two nbred mouse strains.
We have provided three B6 and three D2 fastq files for you to work with.Get the datacds cd my_rnaseq_course cp -r /work2/projects/BioITeam/projects/courses/rnaseq_course/day_4_bottomley_mapped_data . &
GENE COUNT DATA: Start with a table providing per-gene read counts for 3 treated and 4 untreatead Drosophila samples. From the following dataset:
Brooks et al, 2011 dataset GSE18508 (PMID: 20921232)
The experiment studied the effects of RNAi knockdown of Pasilla, the Drosophila melanogaster ortholog of mammalian NOVA1 and NOVA2, on the transcriptome. Treated samples have been RNAi depleted of mRNAs encoding RNA binding proteins and untreated samples have not.Get the datacds cd my_rnaseq_course cp -r /work2/projects/BioITeam/projects/courses/rnaseq_course/day_4_brooks_gene_count_data . &
- Want to find a different RNA-Seq dataset of your own?
- Search GEO for gene expression matrixes (Also download the series matrix file for metadata): https://www.ncbi.nlm.nih.gov/geo/
- COVID RNA-Seq datasets I've vetted:
RNA-Seq of PBMC samples from 18 patients with mild, moderate or severe COVID-19 symptoms during the treatment, convalescence and rehabilitation stages. Both mRNA and non-coding RNA were sequenced.
ii. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE171110
Whole blood RNA-Seq of 44 severe covid-19 patients and 10 healthy donors used to identify predictors for disease severity.
iii. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157103
Large RNA-Seq dataset consisting of128 plasma and leukocyte samples from hospitalized patients with or without COVID-19 (n=102 and 26 respectively) and with differing degrees of disease severity (ICU or non-ICU).
iv. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152418
RNAseq of PBMCs in a group of 17 COVID-19 subjects and 17 healthy controls.
REMINDERS
How to request an interactive session in one of stampede's compute nodes?
idev -m 120 -q normal -A UT-2015-05-18 -r RNAday5 (OR) idev -m 120 -q development -A UT-2015-05-18 #say no when it asks if you wanted to use the reservation
How to submit jobs? Submitting Jobs to Stampede2
How to submit jobs in a sequence?
You can create a job that is dependent on another job finishing only start after the first job has completed using this command:
sbatch --dependency=afterok:<job-ID> launcher.slurm
|
How to move files between your computer and stampeded2?
Welcome to the University Wiki Service! Please use your IID (yourEID@eid.utexas.edu) when prompted for your email address during login or click here to enter your EID. If you are experiencing any issues loading content on pages, please try these steps to clear your browser cache.