Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
# Make a new "wget" directory in your student Home directory and change into it
mkdir -p ~/wget; cd ~/wget

# download a Gencode statistics file using default output file naming
wget "https://ftp.ebi.ac.uk/pub/databases/gencode/_README_stats.txt"
wc -l _README_stats.txt

# if you execute the same wget again, and the output file already exists
# wget will create a new one with a numeric extension
wget "https://ftp.ebi.ac.uk/pub/databases/gencode/_README_stats.txt"
wc -l _README_stats.txt.1

# download the same Gencode statistics file to a different local filename
wget -O gencode_stats.txt "https://ftp.ebi.ac.uk/pub/databases/gencode/_README_stats.txt"
wc -l gencode_stats.txt

The find command

TBDThe find command is a powerful – and of course complex! – way of looking for files in a nested directory hierarchy. The general form I use is:

  • find <in_directory> [ operators ] -name <expression> [  tests ]
    • looks for files matching <expression> in <in_directory> and its sub-directories
    • <expression> can be a double-quoted string including pathname wildcards (e.g. "[a-g]*.txt")
    • there are tons of operators and tests:
      • -type f (file) and -type d (directory) are useful tests
      • -maxdepth NNis a useful operator to limit the depth of recursion.
    • returns a list of matching relative pathnames, relative to <in_directory>, one per output line.

Examples:

Code Block
languagebash
cd
find . -name "*.txt" -type f     # find all .txt files in the Home directory
find . -name "*docs*" -type d    # find all directories with "docs" in the directory name

Exercise 2-1

The /stor/work/CBRS_unix/fastq/ directory contains sequencing data from a GSAF Job. Its structure, as shown by tree, is:

Image Added

Use find to find all fastq.gz files in /stor/work/CBRS_unix/fastq/.

Expand
titleAnswer...

find /stor/work/CBRS_unix/fastq/ -name "*.fastq.gz" -type f
returns 4 file paths

How many fastq.gz files in /stor/work/CBRS_unix/fastq/ were run in sequencer lane L001.

Expand
titleAnswer...

find /stor/work/CBRS_unix/fastq/ -name "*L001*fastq.gz" -type f  | wc -l
reports 2 file paths

How many sample directories in /stor/work/CBRS_unix/fastq/ were run on July 10, 2020?

Expand
titleAnswer...

find /stor/work/CBRS_unix/fastq/ -name "*2020-07-10*" -type d  | wc -l
reports 2 directory paths

...