„Pipeline (Unix)“ – Versionsunterschied

Versionsgeschichte interaktiv durchsuchen

[ungesichtete Version]

← Zum vorherigen Versionsunterschied Zum nächsten Versionsunterschied →

Inhalt gelöscht Inhalt hinzugefügt

VisuellWikitext

Inline

Version vom 19. Februar 2007, 01:43 Uhr

A pipeline of three programs run on a text terminal

In Unix-like computer operating systems, a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process (stdout) feeds directly as input (stdin) of the next one. Each connection is implemented by an anonymous pipe. Filter programs are often used in this configuration. The concept was invented by Douglas McIlroy for Unix shells and it was named by analogy to a physical pipeline.

Example

Below is an example of a pipeline that implements a kind of spell checker for the web resource indicated by a URL. An explanation of what it does follows. (Some machines have /usr/share/dict/words instead.)

curl "http://en.wikipedia.org/wiki/Pipeline_(Unix)" | \
sed 's/[^a-zA-Z ]/ /g' | \
tr 'A-Z ' 'a-z\n' | \
grep '[a-z]' | \
sort -u | \
comm -23 - /usr/dict/words

First, curl obtains the HTML contents of a web page.
Second, sed removes all characters which are not spaces or letters from the web page's content, replacing them with spaces.
Third, tr changes all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each 'word' is now on a separate line).
Fourth, grep removes lines of whitespace.
Fifth, sort sorts the list of 'words' into alphabetical order, and removes duplicates.
Finally, comm finds which of the words in the list are not in the given dictionary file (in this case, /usr/dict/words).

Pipelines in command line interfaces

Most Unix shells have a special syntax construct for the creation of pipelines. Typically, one simply writes the filter commands in sequence, separated by the ASCII vertical bar character "|" (which, for this reason, is often called "pipe character" by Unix users). The shell starts the processes and arranges for the necessary connections between their standard streams (including some amount of buffer storage).

Error stream

By default, the standard error streams ("stderr") of the processes in a pipeline are not passed on through the pipe; instead, they are merged and directed to the console. However, many shells have additional syntax for changing this behaviour. In the csh shell, for instance, using "|&" instead of "| " signifies that the standard error stream too should be merged with the standard output and fed to the next process. The Bourne Shell can also merge standard error, as well as redirect it to a different file.

Creating pipelines programmatically

Pipelines can be created under program control. The pipe() system call asks the operating system to construct a new anonymous pipe object. This results in two new, opened file descriptors in the process: the read-only end of the pipe, and the write-only end. The pipe ends appear to be normal, anonymous file descriptors, except that they have no ability to seek.

To avoid deadlock and exploit parallelism, the process with one or more new pipes will then, generally, call fork() to create new processes. Each process will then close the end(s) of the pipe that it will not be using before producing or consuming any data. Alternatively, a process might create a new thread and use the pipe to communicate between them.

Named pipes may also be created using mkfifo() or mknod() and then presented as the input or output file to programs as they are invoked. They allow multi-path pipes to be created, and are especially effective when combined with standard error redirection, or with tee.

Implementation

In most Unix-like systems, all processes of a pipeline are started at the same time, with their streams appropriately connected, and managed by the scheduler together with all other processes running on the machine. An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept of buffering: a sending program may produce 5000 bytes per second, and a receiving program may only be able to accept 100 bytes per second, but no data are lost. Instead, the output of the sending program is held in a buffer, or queue. When the receiving program is ready to read data, the operating system sends it data from the buffer, then removes that data from the buffer. If the buffer fills up, the sending program is suspended (blocked) until the receiving program has had a chance to read some data and make room in the buffer.

Network pipes

Tools like netcat and socat can connect pipes to TCP/IP sockets, following the Unix philosophy of "everything is a file".

History

Datei:Automator Icon.png

Apple Automator logo

The pipeline concept and the vertical-bar notation was invented by Douglas McIlroy, one of the authors of the early command shells, after he noticed that much of the time they were processing the output of one program as the input to another. The idea was eventually ported to other operating systems, such as DOS, OS/2, Windows NT, and BeOS often with the same notation.

The robot in the icon for Apple's Automator, which also uses pipeline concept to chain repetitive commands together, holds a pipe as recognition of the application's Unix heritage.^[1]

Other operating systems

This feature of Unix was borrowed by other operating systems, such as Taos and MS-DOS, and eventually became the pipes and filters design pattern of software engineering.

External links

Ad Hoc Data Analysis From The Unix Command Line at Wikibooks shows how to use pipelines composed of simple filters to do complex data analysis.
stdio buffering

References

Sal Soghoian on MacBreak Episode 5 "Enter the Automatrix"

↑ Referenzfehler: Ungültiges <ref>-Tag; kein Text angegeben für Einzelnachweis mit dem Namen automator.

[automator-1] Referenzfehler: Ungültiges <ref>-Tag; kein Text angegeben für Einzelnachweis mit dem Namen automator.

[1]

@@ Zeile 71: / Zeile 71: @@
 *[http://en.wikibooks.org/w/index.php?title=Ad_Hoc_Data_Analysis_From_The_Unix_Command_Line ''Ad Hoc Data Analysis From The Unix Command Line'' at Wikibooks] shows how to use pipelines composed of simple filters to do complex data analysis.
 *[http://www.pixelbeat.org/programming/stdio_buffering/ stdio buffering]
-*[http://www.playboy.com/ Conway's game of life (pipes implementation)]
 ==References==