Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Files and File systems

First, let's review Intro Unix: Files and File Systems. The most important takeaways are:

Working with remote files

...

Tip

When transferring files between your computer and a remote server, you always need to execute the command on your local computer. This is because your personal computer does not have an entry in the global hostname database (a.k.a. the , whereas the remote computer does.

The global Domain Name Service, or DNS), whereas the remote computer does database maps full host names to their IP (Internet Protocol) address. Computers that can be accessed from anywhere on the Internet have their host names registered in DNS.

wget (web get)

The wget <url> command lets you retrieve the contents of a valid Internet URL (e.g. http, https, ftp).

...

  • ln -s <path> says to create a symbolic link link (symlink) to the specified file (or directory) in the current directory
    • always use the -s option to avoid creating a hard link, which behaves quite differently
  • the default link name corresponds to the last name component in <path>
    • you can name the link file differently by supplying an optional link_file_name.
  • it is best to change into (cd) the directory where you want the link before executing ln -s
  • a symbolic link can be deleted without affecting the linked-to file
  • the -f (force) option says to overwrite any existing symbolic link with the same name

...

  • find returns a list of matching file paths on its standard output
  • ln wants its files listed as arguments, not on standard input
    • so the paths are piped to the standard input of xargs
  • xargs takes the data on its standard input and calls the specified function (here ln -sf -t .) with that data as the function's argument list.

...

Display lines 7 - 9 of the compressed "jabber.gz" text

Expand
titleHintAnswer...

zcat jabber.gz | cat -n | tail +7 | head -3
- or -
zcat jabber.gz | cat -n | head -9 | tail -3

...

The cutadapt adapter trimming command reads NGS sequences from a FASTQ file, and writes adapter-trimmed reads to a FASTQ file. Find its usage.

Expand
titleAnswer....

cutadapt    # overview; tells you to run cutadapt --help for details
cutadapt --help | less
cutadapt --help | more

Note that it also points you to https://cutadapt.readthedocs.io/ for full documentation.

Usage:

    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

Where does cutadapt write its output to from by default? How can that be changed?

Expand
titleAnswer...

The cutadapt usage says that output can be written to a file using the -o option

Usage:
    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

But the The brackets around [-o output.fastq] suggest this is optional. Reading a bit further we see:

... Without the -o option, output is sent to standard output.

This suggests output can be specified in 2 ways:

  • to a file, using the -o option
    • cutadapt -a CGTAATTCGCG -o trimmed.fastq  small.fq
  • to standard output without the -o option
    • cutadapt -a CGTAATTCGCG small.fq 1> trimmed.fastq

...

Expand
titleAnswer...

The cutadapt usage says an input.fastq file is a required argument:

    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

But again, reading a bit further we see:

...                           Compressed input and output is supported and
auto-detected from the file name (.gz, .xz, .bz2). Use the file name '-' for
standard input/output. ...

This says that the input.fastq file can be provided in one of three compression formats.

And the usage also suggests input can be specified in 2 ways:

  • from a file, using the -o option
    • cutadapt -a CGTAATTCGCG -o trimmed.fastq  small.fq
  • from standard input if the input.fastq argument is replaced with a dash ( - )
    • cat small.fq | cutadapt -a CGTAATTCGCG -o trimmed.fastq  -
And also says that the input.fastq file can be provided in one of three compression formats.
    • -o trimmed.fastq  -

Where does cutadapt write its diagnostic output by default? How can that be changed?

Expand
titleAnswer...

The cutadapt usage doesn't say anything directly about diagnostics:

    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

But again, reading in the Output: options section:

   -o FILE, --output=FILE
        Write trimmed reads to FILE. FASTQ or FASTA format is
        chosen depending on input. The summary report is sent
        to standard output. Use '{name}' in FILE to
        demultiplex reads into multiple files. Default: write
       
to standard output

Careful reading of this suggests that:

When

  • When the -o option is omitted, and output goes to standard output,
    • diagnostics must be written to standard error
      • so can be redirected to a log file with 2> trim.log
    • cutadapt -a CGTAATTCGCG small.fq 1> trimmed.fastq 2> trim.log
  • But when the trimmed output is sent to a file with the -o output.fastq option,
    • diagnostics are written to standard output
      • so can be redirected to a log file with 1> trim.log
    • cutadapt -a CGTAATTCGCG -o trimmed.fastq  small.fq 1> trim.log
    But when the -o option is omitted, and output goes to standard output,
  • diagnostics must be written to standard error
    • so can be redirected to a log file with 2> trim.log
  • cutadapt -a CGTAATTCGCG small.fq 1> trimmed.fastq 2> trim.log


Expand
titleReal example...


Code Block
languagebash
cd ~/gzips 
cutadapt -a AGATCGGAAGAGCACACGTCTGA small.fq  > trimmed.fq