Pipeline (Unix)

2007-02-18T23:43:36Z

Slugger: Reverted edits by 201.243.38.17 (talk) using Mike's Wiki Tool 0.9.2

[[Image:Pipeline.svg|thumb|A pipeline of three programs run on a text terminal]]
In [[Unix-like]] computer [[operating system]]s, a '''pipeline''' is the original ''[[pipeline (software)|software pipeline]]'': a set of [[process (computing)|process]]es chained by their [[standard streams]], so that the output of each process (''[[stdout]]'') feeds directly as input (''[[stdin]]'') of the next one. Each connection is implemented by an [[anonymous pipe]]. [[Filter (Unix)|Filter program]]s are often used in this configuration. The concept was invented by [[Douglas McIlroy]] for [[Unix shell]]s and it was named by analogy to a physical [[pipeline transport|pipeline]].

==Example==

Below is an example of a pipeline that implements a kind of [[spell checker]] for the [[World Wide Web|web]] resource indicated by a [[Uniform Resource Locator|URL]]. An explanation of what it does follows. (Some machines have /usr/share/dict/words instead.)

[[CURL|curl]] <nowiki>"http://en.wikipedia.org/wiki/Pipeline_(Unix)"</nowiki> | \
[[sed]] 's/[^a-zA-Z ]/ /g' | \
[[tr (program)|tr]] 'A-Z ' 'a-z\n' | \
[[grep]] '[a-z]' | \
[[Sort (Unix)|sort]] -u | \
[[comm (Unix)|comm]] -23 - /usr/dict/words

*First, '''<tt>curl</tt>''' obtains the [[HTML]] contents of a web page.
*Second, '''<tt>sed</tt>''' removes all characters which are not spaces or letters from the web page's content, replacing them with spaces.
*Third, '''<tt>tr</tt>''' changes all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each 'word' is now on a separate line).
*Fourth, '''<tt>grep</tt>''' removes lines of [[whitespace]].
*Fifth, '''<tt>sort</tt>''' sorts the list of 'words' into alphabetical order, and removes duplicates.
*Finally, '''<tt>comm</tt>''' finds which of the words in the list are not in the given dictionary file (in this case, /usr/dict/words).

== Pipelines in command line interfaces ==
Most [[Unix shell]]s have a special syntax construct for the creation of pipelines. Typically, one simply writes the filter commands in sequence, separated by the [[ASCII]] [[vertical bar]] character "|" (which, for this reason, is often called "pipe character" by Unix users). The shell starts the processes and arranges for the necessary connections between their standard streams (including some amount of [[Buffer (computer science)|buffer]] storage). 

===Error stream===
By default, the standard error streams ("[[stderr]]") of the processes in a pipeline are not passed on through the pipe; instead, they are merged and directed to the [[computer console|console]]. However, many shells have additional syntax for changing this behaviour. In the [[C shell|csh]] shell, for instance, using "|&" instead of "| " signifies that the [[standard error]] stream too should be merged with the standard output and fed to the next process. The [[Bourne Shell]] can also merge standard error, as well as redirect it to a different file.

==Creating pipelines programmatically==

Pipelines can be created under program control.
The <code>pipe()</code> [[system call]] asks the operating system to construct a new [[anonymous pipe]] object.
This results in two new, opened file descriptors in the process: the read-only end of the pipe, and the write-only end.
The pipe ends appear to be normal, anonymous [[file descriptor]]s, except that they have no ability to seek.

To avoid [[deadlock]] and exploit parallelism, the process with one or more new pipes will then, generally, call
<code>[[fork (computing)|fork()]]</code> to create new
processes. Each process will then close the end(s) of
the pipe that it will not be using before producing or consuming any data.
Alternatively, a process might create a new [[pthreads|thread]] and use the pipe to communicate between them.

''[[Named pipe]]s'' may also be created using <code>mkfifo()</code> or <code>mknod()</code> and then presented as the input or output file to programs as they are invoked. They allow multi-path pipes to be created, and are especially effective when combined with standard error redirection, or with [[tee (Unix)|tee]].

==Implementation==

In most Unix-like systems, all processes of a pipeline are started at the same time, with their streams appropriately connected, and managed by the [[scheduler]] together with all other processes running on the machine.  An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept of [[Buffer (computer science)|buffering]]: a sending program may produce 5000 [[bytes]] per [[second]], and a receiving program may only be able to accept 100 bytes per second, but no data are lost. Instead, the output of the sending program is held in a buffer, or [[Queue (data structure)|queue]]. When the receiving program is ready to read data, the operating system sends it data from the buffer, then removes that data from the buffer. If the buffer fills up, the sending program is suspended (blocked) until the receiving program has had a chance to read some data and make room in the buffer.

=== Network pipes ===
Tools like [[netcat]] and [[socat]] can connect pipes to TCP/IP [[socket]]s, following the Unix philosophy of "[[everything is a file]]".

== History==
[[Image: Automator_Icon.png|thumb|75px|right|Apple Automator logo]]
The pipeline concept and the vertical-bar notation was invented by [[Douglas McIlroy]], one of the authors of the early [[Unix shell|command shells]], after he noticed that much of the time they were processing the output of one program as the input to another. The idea was eventually ported to other operating systems, such as [[DOS]], [[OS/2]], [[Windows NT]], and [[BeOS]] often with the same notation.

The robot in the icon for [[Apple Computer|Apple]]'s [[Automator (software)|Automator]], which also uses pipeline concept to chain repetitive commands together, holds a pipe as recognition of the application's Unix heritage.<ref name="automator"/>

=== Other operating systems ===
{{main|pipeline (software)}}

This feature of [[Unix]] was borrowed by other operating systems, such as [[Taos operating system|Taos]] and [[MS-DOS]], and eventually became the [[pipeline (software)|pipes and filters design pattern]] of [[software engineering]].

== See also ==
* [[Tee (Unix)]] for fitting together two pipes
* [[Pipeline (software)]] for the general software engineering concept.
* [[Pipeline (computer)]] for other computer-related pipelines.
* [[Hartmann pipeline]]
* [[Anonymous pipe]] a [[FIFO]] structure used for [[interprocess communication]]
* [[Named pipe]] persistent pipes used for interprocess communication
* [[XML pipeline]] for processing of XML files

==External links==
*[http://en.wikibooks.org/w/index.php?title=Ad_Hoc_Data_Analysis_From_The_Unix_Command_Line ''Ad Hoc Data Analysis From The Unix Command Line'' at Wikibooks] shows how to use pipelines composed of simple filters to do complex data analysis.
*[http://www.pixelbeat.org/programming/stdio_buffering/ stdio buffering]

==References==
* [[Sal Soghoian]] on [[MacBreak]] Episode 5 "Enter the Automatrix"

[[Category:Inter-process communication]]
[[Category:Unix]]

[[it:Pipeline (Unix)]]
[[ja:パイプ (コンピュータ)]]
[[pt:Pipeline (Unix)]]
[[zh:Pipe]]

Wikipedia - Benutzerbeiträge [de]

Pipeline (Unix)