Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Using the less program's search function is one way to find text in a file, but especially when you want to see the context surrounding the searched-for text. But for general text searching, the grep program is used much more frequently for text searching.

The word grep stands for general regular expression parser, and nearly every programming language offers grep functionality, where a pattern you specify – a regular expression or regex – describes how the search is performed.

In Unix, the grep program performs regular-expression text searching.

There are many grep regular expression metacharacters that control how the search is performed. This is a complex topic and we defer it to the Intermediate Unix workshop (for a preview, see the grep command).

...

Tip

Note that less and grep both support case-insensitive matching, and displaying line numbers, but they use slightly different options:

  • Case insensitive matching:
    • less -I or less --IGNORE-CASE
    • grep -i or grep --ignore-case
  • Display line numbers:
    • less -N or less --LINE_NUMBERS
    • grep -n or grep --line-number

It would be great if all Linux command options that mean the same thing used the same options – and some do, but some don't.

...

A key to text manipulation is understanding Unix streams. Every command and Unix program has three "built-in" streams: standard input, standard output and standard error


Most programs/commands read input data from some source, then write output to some destination. A data source can be a file, but can also be standard input. Similarly, a data destination can be a file but can also be a stream such as standard output.

The power of the Linux command line is due in no small part to the power of piping. The pipe operator ( | ) connects one program's standard output to the next program's standard input.

piping.png

The key to the power of piping is that most Unix commands can accept input from standard input instead of from files. So, for example, these two expressions appear equivalent:

...

  1. In the 1st, the more command reads some input from the jabberwocky.txt file
    • then writes the output to standard output, which is displayed on your Terminal
    • it pauses at page boundaries (--More--) waiting for input on standard input
    • when it receives a space on on standard input it reads more input from jabberwocky.txt
    • then writes the output to standard output, which is displayed on your Terminal
  2. In the 2nd, the cat command reads its input from the jabberwocky.txt file
    • then writes its output to standard output
    • the pipe operator ( | ) then connects the standard output from cat to standard input
    • the more command then reads its input from standard input, instead of from a file
      • then writes its output to standard output, which is displayed on your Terminal
      • more continues its processing as described in #1
      Note that the cat command "blocks" writing to its standard output until more says it's ready for more input
      • this "write until block" / "read when input wanted & available" behavior makes streams a very efficient means of inter-process communicationsimilar to #1, except reading from standard input instead of the file

Notes:

  • In #2, the cat command "blocks" writing to its standard output until more says it's ready for more input
    • This "write until block" / "read when input available" behavior makes streams a very efficient means of inter-process communication.
  • In #1, more can report how much of the file has been read, e.g. --More-- (24%) because it has access to the size information for the file it is reading.
    • But in #2, the text is "anonymous input" – from standard input – so more doesn't know how much of the total has been provided.

Excercise 2-3

Use the pipe operator to provide jabberwocky.txt data to the less command so that line numbers are displayed.

...

Code Block
languagebash
cat -n haiku.txt | tail -n 7   # display the last 7 lines of haiku.txt
cat -n haiku.txt | tail -n +7  # display text in haiku.txt starting at line 7
cat -n haiku.txt | tail +1210    # display text in haiku.txt starting at line 1210

When you use the tail -n +<integer> syntax it will display all output from that line to the end of its input. So to view only a few lines starting at a specified line number, pipe the output to head:

...

Tip

Note the slight difference when you give wc -l a file name versus when you pipe input to it.

  • wc -l <filename> displays the number of lines then the file name.
  • cat <filename> | wc -lonly displays the number of lines in its anonymous input from standard input.

What is text?

We've talked about viewing text using various Unix commands – but what exactly is text? That is, what is stored in files that the shell interprets as text?

...