Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
# create a symlink to a non-existent "~../xxx.txt" file, naming the symlink "bad_link.txt"
mkdir -p ~/syms; cd ~/syms 
ln -sf ~../xxx.txt bad_link.txt
ls -l

Now both the symlink and the linked-to file are displayed in red, indicating a broken link.

Image RemovedImage Added

Multiple files can be linked by providing multiple file name arguments along and using the -t (target) option to specify the directory where links to all the files can be created.

...

Here are some ways to work with a compressed file:

Code Block
languagebash
cd ~/gzips                                    
cat ../jabberwocky.txt | gzip > jabber.gz    # make a compressed copy of the "jabberwocky.txt" file
less #jabber.gz make sure you're in your Home directory
cat jabberwocky.txt | gzip > jabber.gz  # make a compressed copy of the "jabberwocky.txt" file less jabber.gz                          # use 'less' to view the compressed "jabber.gz" file (q to exit)

zcat jabber.gz | wc -l                       # count lines in the compressed "jabber.gz" file
zcat jabber.gz | tail -4                     # view the last 4 lines of the "jabber.gz" file
zcat jabber.gz | cat -n                      # view "jabber.gz" text with line numbers (no zcat -n option)
zcat jabber.gz | cat -n | tail +6 | head -4  # display lines 6 - 9 of "jabber.gz" text

...

Recall the three standard Unix streams: they each have a number, a name and redirection syntax:

  • standard output is stream 1
    • redirect standard output to a file with a the > or 1> operator
      • a single > or 1> overwrites any existing data in the target file
      • a double >> or 1>> appends to any existing data in the target file
  • standard error is stream 2
    • redirect standard error to a file with a the 2> operator
      • a single 2> overwrites any existing data in the target file
      • a double 2>> appends to any existing data in the target file

We also saw that 3rd party bioinformatics tools are often written as a top-level program that handles multiple sub-commands. Examples include the bwa NGS aligner and samtools and bedtools tool suites. To see their menu of sub-commands, you usually just need to enter the top-level command, or <command> --help. Similarly, sub-command usage is usually available as <command> <sub-command> or <command> <sub-command> --help.

Tip
title3rd party tools and standard streams

Many tools write their main output to standard output by default but have options to write it to a file instead.

Similarly, tools often write processing status and diagnostics to standard error, and it is usually your responsibility to redirect this elsewhere (e.g. to a log file).

Finally, tools may support taking their main input from standard input, but need a "placeholder" argument where you'd usually specify a file. That standard input placeholder is usually a single dash ( - ) but can also be a reserved word such as stdin.

Now let's see how these concepts fit together when running 3rd party tools.

Exercise 1-1 bwa aln

Where does the bwa aln sub-command write its output?

Expand
titleAnswer...

The bwa aln usage

Usage:   bwa aln [options] <prefix> <in.fq>

does not specify an output file, so it must write its alignment information to standard output.

...

3rd party tool files and streams

In Intro Unix: The Bash shell and commands: Getting help we saw that 3rd party bioinformatics tools are often written to perform sub-command processing; that is, they have a top-level program that handles multiple sub-commands. Examples include the bwa NGS aligner and the samtools and bedtools tool suites.

To see their menu of sub-commands, you usually just need to enter the top-level command, or <command> --help. Similarly, sub-command usage is usually available as <command> <sub-command> or <command> <sub-command> --help.

Tip
title3rd party tools and standard streams

Many tools write their main output to standard output by default but have options to write it to a file instead.

Similarly, tools often write processing status and diagnostics to standard error, and it is usually your responsibility to redirect this elsewhere (e.g. to a log file).

Finally, tools may support taking their main input from standard input, but need a "placeholder" argument where you'd usually specify a file. That standard input placeholder is usually a single dash ( - ) but can also be a reserved word such as stdin.

Now let's see how these concepts fit together when running 3rd party tools.

Exercise 2-3 bwa mem

Where does the bwa mem sub-command write its output?

Expand
titleAnswer...

The bwa mem usage says:

Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]

This does not specify an output file, so it must write its alignment information to standard output.

How can this be changed?

Expand
titleAnswer...

The bwa mem options usage says:

  -o FILE       sam file to output results to [stdout]

bwa mem also writes diagnostic progress as it runs, to standard error. Show how you would invoke bwa mem to capture both its alignment output and its progress diagnostics. Use input from a my_fastq.fq file and ./refs/hg38 as the <idxbase>.

Expand
titleAnswers...

Redirecting the output to a file:
bwa mem ./refs/hg38 my_fastq.fq 1> my_fastq.sam  2>my_fastq.aln.log

Using the -o option:
bwa mem -o my_fastq.sam  ./refs/hg38  2>my_fastq.aln.log

Exercise 2-4 cutadapt

The cutadapt adapter trimming command reads NGS sequences from a FASTQ file, and writes adapter-trimmed reads to a FASTQ file. Find its usage.

Expand
titleAnswer...

cutadapt --help | less

Note that it also points you to https://cutadapt.readthedocs.io/ for full documentation.

Usage:

    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

Where does cutadapt write its output to from by default? How can that be changed?

Expand
titleAnswer...

The cutadapt usage says that output can be written to a file using the -o option

    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

But the brackets around [-o output.fastq] suggest this is optional. Reading a bit further we see:

... Without the -o option, output is sent to standard output.

This suggests output can be specified in 2 ways:

  • to a file, using the -o option
    • cutadapt -a CGTAATTCGCG -o trimmed.fastq small.fq
  • to standard output without the -o option
    • cutadapt -a CGTAATTCGCG small.fq > trimmed.fastq

Where does cutadapt read its input from by default? How can that be changed? Can the input FASTQ be in compressed format?

Expand
titleAnswer...

The bwa aln options cutadapt usage says an input.fastq file is a required argument:

Expand
titleAnswer...

cutadapt --help | more

Note that it also points you to https://cutadapt.readthedocs.io/ for full documentation.

      -f FILE   file to write output to instead of stdout

bwa aln also writes diagnostic progress as it runs, to standard error. Show how you would invoke bwa aln to capture both its alignment output and its progress diagnostics. Use input from a my_fastq.fq file and ./refs/hg38 as the <prefix>.

Expand
titleAnswers...

Redirecting the output to a file:
bwa aln ./refs/hg38 my_fastq.fq > my_fastq.aln  2>my_fastq.aln.log

Using the -f option:
bwa aln -f my_fastq.aln ./refs/hg38  2>my_fastq.aln.log

Exercise 1-2 cutadapt

The cutadapt adapter trimming command reads NGS sequences from a FASTQ file, and writes adapter-trimmed reads to a FASTQ file. Find its usage.

Usagecutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

But again, reading a bit further we see:


Input may also be in FASTA format. Compressed input and output is supported and
auto-detected from the file name (.gz, .xz, .bz2). Use the file name '-' for
standard input/output. Without the -o option, output is sent to standard output.



This suggests input can be specified in 2 ways:

  • from a file, using the -o option
    • cutadapt -a CGTAATTCGCG -o trimmed.fastq  small.fq
  • from standard input if the input.fastq argument is replaced with a dash ( - )
    • cat small.fq | cutadapt -a CGTAATTCGCG -o trimmed.fastq  -

And also says that the input.fastq file can be provided in one of three compression formats.

Where does cutadapt write its diagnostic output by default? How can that be changed?

Expand
titleAnswer...

The cutadapt usage doesn't say anything about diagnostics:

    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

Where does cutadapt write its output to from by default? How can that be changed?

Expand
titleAnswer...

The fastx_trimmer usage says that output is written to a file using the -o option

    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

But the brackets around [-o output.fastq] suggest this is optional. Reading a bit further we see:

...                                                  Use the file name '-' for
standard input/output. Without the -o option, output is sent to standard output.

x

Where does fastx_trimmer write its input from by default? How can that be changed?

Expand
titleAnswer...

The fastx_trimmer options usage says:

    [-i INFILE]  = FASTA/Q input file. default is STDIN.-o output.fastq] input.fastq

But again, reading in the Output: options section:


   -o FILE, --output=FILE
        Write trimmed reads to FILE. FASTQ or FASTA format is
        chosen depending on input. The summary report is sent
        to standard output. Use '{name}' in FILE to
        demultiplex reads into multiple files. Default: write
       
to standard output






Careful reading of this suggests that:

  • When the trimmed output is sent to a file with the -o output.fastq option,
    • diagnostics are written to standard output
      • so can be redirected to a log file with 1> trim.log
    • cutadapt -a CGTAATTCGCG -o trimmed.fastq  small.fq 1> trim.log
  • But when the -o option is omitted, and output goes to standard output,
    • diagnostics must be written to standard error
      • so can be redirected to a log file with 2> trim.log
    • cutadapt -a CGTAATTCGCG small.fq 1> trimmed.fastq 2> trim.log