Jump to content

Pipeline (Unix)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Jorge Stolfi (talk | contribs) at 01:34, 3 December 2004 (Created page from chunk of Pipeline). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

In UNIX and other UNIX-like operating systems, a pipeline' is a command line is a set of filter processes chained by their standard I/O streams, so that the output of each process is automatically fed to the next one. This feature of Unix became the pipes and filters design pattern of software engineering.

History

Douglas McIlroy, one of the authors of the early UNIX command shells, noticed that much of the time they were processing the output of one program as the input to another. The UNIX pioneers established a means of chaining the running programs together as co-processes so that the output of the first program becomes the input to the second.

Example

Below is an example of a pipeline that implements a kind of spell checker for the web resource indicated by a URL [1].

curl http://www.wikipedia.org/wiki/Pipeline |
sed 's/[^a-zA-Z ]//g' |
tr 'A-Z ' 'a-z\n' |
grep '[a-z]' |
sort -u |
comm -23 - /usr/dict/words

Here is an explanation of the pipeline:

  • First the curl program obtains the HTML contents of a web page.
  • The contents of this page are piped through sed, which removes all characters which are not spaces or letters.
  • tr then changes all of the uppercase letters into their corresponding lowercase counterparts, and converts the spaces in the lines of text to newlines.
  • Each 'word' is now on a separate line.
  • grep is used to remove lines of whitespace.
  • sort sorts the list of 'words' into alphabetical order, and removes duplicates.
  • Finally, comm finds which of the words in the list are not in the given dictionary file (in this case, /usr/dict/words).

See also