Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
# copy a small.fq file into a new ~/gzips directory
cd; mkdir -p ~/gzips
cp -p /stor/work/CCBB_Workshops_1/misc_data/fastq/small.fq ~/gzips/
    
cd ~/gzips
ls -lh          # small.fq is 66K (~66,000) bytes long
wc -l small.fq  # small.fq is 1000 lines long

...

The gunzip command does the reverse – decompresses the file and writes the results back without the .gz extension. gzip -d (decompress) does the same thing.

Code Block
languagebash
gunzip small.fq.gz    # decompress the small.fq.gz file in place, producing small.fq file
gunzip small.fq.gz    
# or
gzip -d small.fq.gz

Both gzip and gunzip also have -c or --stdout  options that tell the command to write on standard output, keeping the original files unchanged.

Code Block
languagebash
cd ~/gzips            # change into your ~/gzips directory
ls small.fq           # make sure you have an uncompressed "small.fq" file

gzip -c small.fq > sm2.fq.gz  # compress the "small.fq" into a new file called "sm2.fq.gz"
gunzip -c sm2.fq.gz > sm3.fq  # decompress "sm2.fq.gz" into a new "sm3.fq" file
ls -lh

Both gzip and gunzip can also accept data on standard input. In that case, the output is always on standard output.

Code Block
languagebash
cd ~/gzips            # change into your ~/gzips directory
ls small.fq           # make sure you have an uncompressed "small.fq" file

cat small.fq | gzip > smallsm4.fq.gz

The good news is that most bioinformatics programs can accept data in compressed gzipped format. But how do you view these compressed files?

...

Code Block
languagebash
cd ~/gzips                                    
cat ../jabberwocky.txt | gzip > jabber.gz    # make a compressed copy of the "jabberwocky.txt"
file
less jabber.gz                               # use 'less' to view the compressed "jabber.gz" file
(q to exit)  zcat jabber.gz | wc -l                       # count lines in the compressed "jabber.gz"       #   (type 'q' to exit)
zcat jabber.gz | wc -l                     # count lines in the compressed "jabber.gz" file
zcat jabber.gz | tail -4                     # view the last 4 lines of the "jabber.gz" file
zcat jabber.gz | cat -n                    # view "jabber.gz | cat -n" text with line numbers 
                      # view "jabber.gz" text with line numbers (no zcat -n option) zcat jabber.gz | cat -n | tail +6 | head -4#  # display(zcat linesdoes 6not -have 9an of "jabber.gz" text-n option)

Exercise 2-2

Display lines 6 7 - 9 of the compressed "jabber.gz" text

Expand
titleHint...

zcat jabber.gz | cat -n | tail +6 7 | head -43
- or -
zcat jabber.gz | cat -n | head -10 9 | tail -43

Working with 3rd party program I/O

...