...
- Understanding the tree-like structure of directories and files in the file system hierarchy
- Absolute paths start with a slash ( / ), the root of the file system hierarchy
- More at: Intro Unix: Files and File Systems: The File System hierarchy
- Absolute paths start with a slash ( / ), the root of the file system hierarchy
- Knowing how to navigate the file system using the cd (change directory) command, Tab key completion, and relative path syntax:
- use the dot ( . ) metacharacter for the current directory
- use the dot-dot ( .. ) metacharacters for the parent directory
- More at:
- Selecting multiple files using pathname wildcards (a.k.a. "globbing")
- asterisk ( * ) to match any length of characters
- brackets ( [ ] ) match any character between the brackets, including hyphen ( - ) delimited character ranges such as [A-G]
- More at: Intro Unix: Files and File Systems: Pathname wildcards (globbing)
- A basic understanding of file attributes such as
- file type (file, directory)
- owner and group
- permissions (read, write, execute) for the owner, group and everyone
- More at: Intro Unix: Files and File Systems: File attributes
- Familiarly with basic file manipulation commands (mkdir, cp, mv, rm)
...
When dealing with large data files, sometimes scattered in many directories, it is often convenient to create multiple symbolic links (symlinks) to those files in a directory where you plan to work with them. You can use them in your analysis as if they were local to your analysis working directory, without the storage cost of copying them.
Tip | ||
---|---|---|
| ||
Storage is a limited resources, so never copy large data files! Create symbolic links to them in your analysis directory instead. |
...
- ln -s <path> says to create a symbolic link (symlink) to the specified file (or directory) in the current directory
- always use the -s option to avoid creating a hard link, which behaves quite differently
- the default link name corresponds to the last name component in <path>
- you can name the link file differently by supplying an optional link_file_name.
- it is best to change into (cd) the directory where you want the link before executing ln -s
- a symbolic link can be deleted without affecting the linked-to file
- the -f (force) option says to overwrite any existing file with the same name
Examples:
Code Block | ||
---|---|---|
| ||
mkdir -p ~/syms; cd# create a symlink to the ~/symshaiku.txt file lnusing -srelative -f /stor/work/CCBB_Workshops_1/bash_scripting/data/sampleinfo.txt ls -l |
...
mkdir
~
/test
;
cd
~
/test
ln
-s -f
/stor/work/CCBB_Workshops_1/bash_scripting/data/sampleinfo
.txt
ls
-l
Multiple files can be linked by providing multiple file name arguments along and using the -t (target) option to specify the directory where links to all the files can be created.
...
rm
-rf ~
/test
;
mkdir
~
/test
;
cd
~
/test
ln
-s -f -t .
/stor/work/CCBB_Workshops_1/bash_scripting/data/
*.txt
ls
-l
...
path syntax
mkdir -p ~/syms; cd ~/syms
ln -s -f ../haiku.txt
ls -l |
The ls -l long listing in the ~/syms directory displays the symlink like this:
- The 10-character permissions field (
lrwxrwxrwx
) has anl
in the left-most file type position, indicating this is a symbolic link. - The symlink itself is colored differently – in cyan
- There are two extra fields after the symlink name
- field 10 has an arrow -> pointing to ...
- field 11 the path of the linked-to file ("../haiku.txt")
Now create a symlink to a non-existent file:
Code Block | ||
---|---|---|
| ||
# create a symlink to a non-existent "~/xxx.txt" file, naming the symlink "bad_link.txt"
mkdir -p ~/syms; cd ~/syms
ln -sf ~/xxx.txt
ls -l |
Now both the symlink and the linked-to file are displayed in red, indicating a broken link.
Multiple files can be linked by providing multiple file name arguments along and using the -t (target) option to specify the directory where links to all the files can be created.
Code Block | ||
---|---|---|
| ||
# create a multiple symlinks to the *.bed files in the ~/data/bedfiles/ directory
# the -t . says create all the symlinks in the current directory
mkdir -p ~/syms; cd ~/syms
ln -sf -t . ../data/bedfiles/*.bed
ls -l |
What about the case where the files you want are scattered in sub-directories? Consider a typical GSAF project directory structure, where FASTQ files are nested in sub-directories:
Here's a solution using find and xargs:
Code Block | ||
---|---|---|
| ||
mkdir -p ~/syms/fa; cd ~/syms/fa
find /stor/work/CBRS_unix/fastq -name "*.gz" | xargs ln -sf -t . |
Step by step:
- find returns a list of matching file paths on its standard output
- the paths are piped to the standard input of xargs
- xargs takes the data on its standard input and calls the specified function (here ln -sf -t .) with that data as the function's argument list.
About compressed files
Because a lot of scientific data is large, it is often stored in a compressed format to conserve storage space. The most common compression program used for individual files is gzip whose compressed files have the .gz extension. The tar and zip programs are most commonly used for compressing directories.
...