Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

When dealing with large data files, sometimes scattered in many directories, it is often convenient to create multiple symbolic links (symlinks) to those files in a directory where you plan to work with them. You can use them in your analysis as if they were local to your analysis working directory, without the storage cost of copying them.

Tip
titleAlways symlink large files

Storage is a limited resources, so never copy large data files! Create symbolic links to them in your analysis directory instead.

...

  • ln -s <path> says to create a symbolic link (symlink) to the specified file (or directory) in the current directory
    • always use the -s option to avoid creating a hard link, which behaves quite differently
  • the default link name corresponds to the last name component in <path>
    • you can name the link file differently by supplying an optional link_file_name.
  • it is best to change into (cd) the directory where you want the link before executing ln -s
  • a symbolic link can be deleted without affecting the linked-to file
  • the -f (force) option says to overwrite any existing file with the same name

Examples:

Code Block
languagebash
mkdir -p ~/syms; cd# create a symlink to the ~/symshaiku.txt file lnusing -srelative -f /stor/work/CCBB_Workshops_1/bash_scripting/data/sampleinfo.txt
ls -l

...

mkdir ~/test; cd ~/test
ln -s -f /stor/work/CCBB_Workshops_1/bash_scripting/data/sampleinfo.txt
ls -l

Multiple files can be linked by providing multiple file name arguments along and using the -t (target) option to specify the directory where links to all the files can be created.

...

rm -rf ~/test; mkdir ~/test; cd ~/test
ln -s -f -t . /stor/work/CCBB_Workshops_1/bash_scripting/data/*.txt
ls -l

...

path syntax
mkdir -p ~/syms; cd ~/syms 
ln -s -f ../haiku.txt
ls -l

The ls -l long listing in the ~/syms directory displays the symlink like this:

Image Added

  • The 10-character permissions field (lrwxrwxrwx) has an l in the left-most file type position, indicating this is a symbolic link.
  • The symlink itself is colored differently – in cyan
  • There are two extra fields after the symlink name
    • field 10 has an arrow -> pointing to ...
    • field 11 the path of the linked-to file ("../haiku.txt")

Now create a symlink to a non-existent file:

Code Block
languagebash
# create a symlink to a non-existent "~/xxx.txt" file, naming the symlink "bad_link.txt"
mkdir -p ~/syms; cd ~/syms 
ln -sf ~/xxx.txt
ls -l

Now both the symlink and the linked-to file are displayed in red, indicating a broken link.

Image Added

Multiple files can be linked by providing multiple file name arguments along and using the -t (target) option to specify the directory where links to all the files can be created.

Code Block
languagebash
# create a multiple symlinks to the *.bed files in the ~/data/bedfiles/ directory
# the -t . says create all the symlinks in the current directory
mkdir -p ~/syms; cd ~/syms  
ln -sf -t .  ../data/bedfiles/*.bed
ls -l

What about the case where the files you want are scattered in sub-directories? Consider a typical GSAF project directory structure, where FASTQ files are nested in sub-directories:

Image Added

Here's a solution using find and xargs:

Code Block
languagebash
mkdir -p ~/syms/fa; cd ~/syms/fa
find /stor/work/CBRS_unix/fastq -name "*.gz" | xargs ln -sf -t .

Step by step:

  • find returns a list of matching file paths on its standard output
  • the paths are piped to the standard input of xargs
  • xargs takes the data on its standard input and calls the specified function (here ln -sf -t .) with that data as the function's argument list.

About compressed files

Because a lot of scientific data is large, it is often stored in a compressed format to conserve storage space. The most common compression program used for individual files is gzip whose compressed files have the .gz extension. The tar and zip programs are most commonly used for compressing directories.

...