Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

  • SRA search page http://www.ncbi.nlm.nih.gov/sra.
  • Type in SRX112044 ? Search
  • On experiment summary page click SRR390925
    • takes you to the Run browser where you can see example reads
  • Under "Download", "Run" click "ftp" under .sra
    • save the file locally
  • Open a Terminal window, change into the directory where the file was stored
  • Copy from local machine to TACC
    Code Block
    scp SRR390925.sra username@lonestar.tacc.utexas.edu:~/
    
    • the colon ( : ) after the hostname indicates this is a remote destination
    • the ~/ indicates your home directory
  • Login to Lonestar:
    Code Block
    ssh username@lonestar.tacc.utexas.edu:~/
    
    • check that the file is in your home directory
      Code Block
      login2$ ls
      SRR390925.sra
      
  • Find the SRA toolkit module
    Code Block
    login2$ module spider sratoolkit
    
      ----------------------------------------------------------------------------
      sratoolkit: sratoolkit/2.1.9
      ----------------------------------------------------------------------------
        Description:
          The SRA Toolkit and SDK from NCBI is a collection of tools and
          libraries for using data in the INSDC Sequence Read Archives.
    
        This module can be loaded directly: module load sratoolkit/2.1.9
    
        Help:
          The sratoolkit module file defines the following environment variables:
          TACC_SRATOOLKIT_DIR for the location of the sratoolkit distribution.
    
          Version 2.1.9
    
  • Load the module
    Code Block
    login2$ module load sratoolkit
    
  • Invoke fastq-dump with no arguments to get basic usage
    Code Block
    login2$ fastq-dump
    
    Usage:
      /opt/apps/sratoolkit/2.1.9//fastq-dump [options] [ -A ] <accession>
      /opt/apps/sratoolkit/2.1.9//fastq-dump [options] <path [path...]>
    
    Use option --help for more information
    
    /opt/apps/sratoolkit/2.1.9//fastq-dump : 2.1.9
    
  • Extract to fastq
    Code Block
    login2$ $TACC_SRATOOLKIT_DIR/fastq-dump SRR390925.sra
    Written 1981132 spots for SRR390925.sra
    Written 1981132 spots total
    
  • Look at some data
    Code Block
    login2$ ls
    SRR390925.fastq  SRR390925.sra
    login2$ head SRR390925.fastq
    @SRR390925.1 ROCKFORD:1:1:0:1260 length=36
    NCAACAAGTTTCTTTGGTTATTAACTACGACTTACC
    \+SRR390925.1 ROCKFORD:1:1:0:1260 length=36
    \#CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
    @SRR390925.2 ROCKFORD:1:1:0:293 length=36
    NAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    \+SRR390925.2 ROCKFORD:1:1:0:293 length=36
    \####################################
    @SRR390925.3 ROCKFORD:1:1:0:330 length=36
    NAAAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAA
    
  • Count lines and number of reads (fastq has 4 lines/read)
    Code Block
    login2$ wc -l SRR390925.fastq
    7924528 SRR390925.fastq
    login2$ echo $((7924528 / 4))
    1981132
    

...