Practice time/Bring your own RNA-Seq Data

IF YOU HAVE YOUR OWN DATA:

  • Make sure data is on stampede2 and in your scratch directory
  • Make sure genome/transcriptome files are on stampede2 and in your scratch directory. If you need to download reference files, some options include:
    • igenomes: With one download, you can get genome fasta files, annotation and bowtie and bwa (slightly older version) indexes for many organisms.
    • Ensembl: Fasta files (genome and transcriptome), gtf files (annotation) for multiple organisms.
  • Make sure the files are readable

IF YOU DON'T HAVE YOUR OWN DATA:

 You have several choices for datasets to play with, depending on the stage in the analysis that you would like to work on.

  1. RAW DATA:  Start with 6 raw fastq files from the following dataset:
    Bottomley et Al mouse dataset SRA026846.1 (http://dx.doi.org/10.1371/journal.pone.0017820)
    Single end RNA-Seq data generated on Illumina GAIIx for 2 strains of mice (B6 and D2) to detect differential striatal gene expression between the two nbred mouse strains. 
    We have provided three B6 and three D2 fastq files for you to work with as well as the MM10 reference files.

    Get the data
     
    cds
    cd my_rnaseq_course
    cp -r /work2/projects/BioITeam/projects/courses/rnaseq_course/day_4_bottomley_raw_data . &
    
  2. MAPPED FILES: Start with 6 bam files (mapped to the mouse MM9 genome) from the following dataset:
    Bottomley et Al mouse dataset SRA026846.1 (http://dx.doi.org/10.1371/journal.pone.0017820)
    Single end RNA-Seq data generated on Illumina GAIIx for 2 strains of mice (B6 and D2) to detect differential striatal gene expression between the two nbred mouse strains. 
    We have provided three B6 and three D2 fastq files for you to work with.

    Get the data
     
    cds
    cd my_rnaseq_course
    cp -r /work2/projects/BioITeam/projects/courses/rnaseq_course/day_4_bottomley_mapped_data . &
     
  3. GENE COUNT DATA: Start with a table providing per-gene read counts  for 3 treated and 4 untreatead Drosophila samples. From the following dataset:
    Brooks et al, 2011 dataset GSE18508 (PMID: 20921232)
    The experiment studied the effects of RNAi knockdown of Pasilla, the Drosophila melanogaster ortholog of mammalian NOVA1 and NOVA2, on the transcriptome. Treated samples have been RNAi depleted of mRNAs encoding RNA binding proteins and untreated samples have not.

    Get the data
     
    cds
    cd my_rnaseq_course
    cp -r /work2/projects/BioITeam/projects/courses/rnaseq_course/day_4_brooks_gene_count_data . &
     
  4. Want to find a different RNA-Seq dataset of your own?
    1. Search GEO for gene expression matrixes (Also download the series matrix file for metadata): https://www.ncbi.nlm.nih.gov/geo/
    2. COVID RNA-Seq datasets I've vetted: 
      1. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157859

      RNA-Seq of PBMC samples from 18 patients with mild, moderate or severe COVID-19 symptoms during the treatment, convalescence and rehabilitation stages.  Both mRNA and non-coding RNA were sequenced.

            ii. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE171110

      Whole blood RNA-Seq of 44 severe covid-19 patients and 10 healthy donors used to identify predictors for disease severity.

            iii. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157103

      Large RNA-Seq dataset consisting of128 plasma and leukocyte samples from hospitalized patients with or without COVID-19 (n=102 and 26 respectively) and with differing degrees of disease severity (ICU or non-ICU).

           iv. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152418

      RNAseq of PBMCs in a group of 17 COVID-19 subjects and 17 healthy controls.

REMINDERS

How to request an interactive session in one of stampede's compute nodes?

idev -m 120 -q normal -A UT-2015-05-18 -r RNAday5
(OR)
idev -m 120 -q development -A UT-2015-05-18
#say no when it asks if you wanted to use the reservation


How to submit jobs?  Submitting Jobs to Stampede2

How to submit jobs in a sequence?

You can create a job that is dependent on another job finishing only start after the first job has completed using this command:


sbatch --dependency=afterok:<job-ID> launcher.slurm

How to move files between your computer and stampeded2?

 Click here to expand...

Transferring files to stampede2

On your computer's side:

Go to the directory where you want to copy files from.

scp stuff.fastq my_user_name@stampede2.tacc.utexas.edu:/home/.../

Replace the "/home/.../" with the "pwd" information obtained earlier.

This command would transfer "stuff.fastq" in your current directory to a specified directory on stampede2.

Transferring files from stampede2

On your computer's side:

Go to the directory where you want to copy files to.

scp my_user_name@stampede2.tacc.utexas.edu:/home/.../stuff.fastq ./

Replace the "/home/..." with the "pwd" information obtained earlier.

This command would transfer "stuff.fastq" from the specified directory on stampede2 to your current directory on your computer.

Copying Directories

Sometimes you may want to transfer more than one file.

If you wanted to transfer a directory, use the -r option like so:

scp -r my_folder my_user_name@stampede2.tacc.utexas.edu:/...

You can also transfer directories from stampede2 in the same manner:

scp -r my_user_name@stampede2.tacc.utexas.edu:/home/.../my_folder ./


Related pages