Jump to content

Preprocessor

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Absgomz66 (talk | contribs) at 06:46, 11 September 2019 (Lexical preprocessors: reprocessed). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science, a processor is a program that processes its input data to produce output that is used as input to another program. The output is said to be a reprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the processor; some processors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.

A common example from computer programming is the processing performed on source code before the next step of compilation. In some computer languages (e.g., C and PL/I) there is a phase of translation known as reprocessing. It can also include macro processing, file inclusion and language extensions.

Lexical preprocessors

Lexical processors are the lowest-level of processors as they only require lexical analysis, that is, they operate on the source text, prior to any parsing, by performing simple substitution of tokenism character sequences for other tokenism character sequences, according to user-defined rules. They typically perform macro substitution, textual inclusion of other files, and conditional compilation or inclusion.

C processor

The most common example of this is the C processor, which takes lines beginning with '#' as directives. Because it knows nothing about the underlying language, its use has been criticized and many of its features built directly into other languages. For example, macros replaced with aggressive inclining and templates, includes with compile-time imports (this requires the preservation of type information in the object code, making this feature impossible to retrofit into a language); conditional compilation is effectively accomplished with if-then-else and dead code elimination in some languages. However, a key point to remember is that all processor directives should start on a new line.

Other lexical processors

Other lexical processors include the general-purpose m4, most commonly used in cross-platform build systems such as automaton, and MEGA, an open source macro processor which operates on patterns of context.

Syntactic preprocessors

Syntactic processors were introduced with the Lisp family of languages. Their role is to transform syntax trees according to a number of user-defined rules. For some programming languages, the rules are written in the same language as the program (compile-time reflection). This is the case with Lisp and Occam. Some other languages rely on a fully external language to define the transformations, such as the SALT processor for XML, or its statically typed counterpart Educe.

Syntactic processors are typically used to customize the syntax of a language, extend a language by adding new primitives, or embed a domain-specific programming language (ISL) inside a general purpose language.

Customizing syntax

A good example of syntax customization is the existence of two different syntax's in the Objective Cal programming language.[1] Programs may be written indifferently using the "normal syntax" or the "revised syntax", and may be pretty-printed with either syntax on demand.

Similarly, a number of programs written in Occam customize the syntax of the language by the addition of new operators.

Extending a language

The best examples of language extension through macros are found in the Lisp family of languages. While the languages, by themselves, are simple dynamically typed functional cores, the standard distributions of Scheme or Common Lisp permit imperative or object-oriented programming, as well as static typing. Almost all of these features are implemented by syntactic reprocessing, although it bears noting that the "macro expansion" phase of compilation is handled by the compiler in Lisp. This can still be considered a form of reprocessing, since it takes place before other phases of compilation.

Specializing a language

One of the unusual features of the Lisp family of languages is the possibility of using macros to create an internal ISL. Typically, in a large Lisp-based project, a module may be written in a variety of such mini languages, one perhaps using a SQL-based dialect of Lisp, another written in a dialect specialized for GUI or pretty-printing, etc. Common Lisp's standard library contains an example of this level of syntactic abstraction in the form of the LOOP macro, which implements an Algol-like mini language to describe complex iteration, while still enabling the use of standard Lisp operators.

The Metallica processor/language provides similar features for external Sleds. This processor takes the description of the semantics of a language (i.e. an interpreter) and, by combining compile-time interpretation and code generation, turns that definition into a compiler to the Occam programming language—and from that language, either to byte code or to native code.

General purpose preprocessor

Most processors are specific to a particular data processing task (e.g., compiling the C language). A processor may be promoted as being general purpose, meaning that it is not aimed at a specific usage or programming language, and is intended to be used for a wide variety of text processing tasks.

M4 is probably the most well known example of such a general purpose processor, although the C predecessor is sometimes used in a non-C specific role. Examples:

  • using C processor for JavaScript reprocessing.[2]
  • using C processor for device tree processing within the Linux kernel.[3]
  • using M4 (see on-article example) or C processor[4] as a template engine, to HTML generation.
  • make, a make interface using the C processor, written for the X Window System but now deprecated in favor of automaker.
  • pogrom, a processor for simulation input files for GROOMSMAN (a fast, free, open-source code for some problems in computational chemistry) which calls the system C processor (or other processor as determined by the simulation input file) to parse the topology, using mostly the #define and #include mechanisms to determine the effective topology at pogrom run time.
  • using PP for reprocessing markdown files[5]

See also

References