Jump to content

User:Steamerandy/editing/metacompiler2

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Steamerandy (talk | contribs) at 00:52, 31 October 2014 (Created page with 'A '''Metacompiler''' is a compiler that is used chiefly to construct compilers for other programming languages.<ref name=McGraw/> They are a subset of a speciali...'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

A Metacompiler is a compiler that is used chiefly to construct compilers for other programming languages.[1] They are a subset of a specialized class of compiler writing tools called compiler-compilers. The feature that sets a metacompiler apart from a standard compiler-compiler is that a metacompiler is written in its own language and translates itself. A feature that that should not be confused with Self-hosting compilers that only compile the language they are written in.

A metacompiler is a program whose input is a description of the grammar of a programming language, plus a description of its semantics. The language in which such a description is coded is called a metalanguage the output of a metacompiler is a compiler for the described programming language.[1][2][3][4]

Metacompilers reduces the task of writing compilers by automating the aspects that are the same regardless of the language the produced compiler is to translate. This makes possible the design of domain-specific language which are appropriate to the specification of a particular problem. A metacompiler reduces the cost of producing translators for such languages to a point where it becomes economically feasible to begin the solution of a problem with language design.[2][3]

Metacompiler are powerful string and symbol processing languages that are also useful for generating a wide range of other software engineering and analysis tools.[2][3][4][5]

Besides being useful for parsing domain-specific languages, a metacompiler is itself a prime example of a domain-specific language, designed for the domain of compiler writing.

A metacompiler is defined by a set of grammar production rules defining itself, written in its own specialized language. The metacompiler translates this grammar definition into the executable form of itself. Usually the grammar reduction rules are intermixed with semantic translation rules. Defining itself and translating itself this way constitutes the meta-step that sets a metacompiler apart from other compiler-compilers.[6][7]

Since the metacompiler defines and translates itself, the question arises as to how it is initially created (a chicken and egg problem). This is solved in one of two ways: by cross-compiling or by bootstrapping. Cross-compiling involves translating the new metacompiler using some other compiler or metacompiler running on some other platform. This is similar to how you make more sourdough starter.

Bootstrapping, the other method, is an elegant (and usually mind-bending) process whereby the metacompiler is defined in progressively sophisticated stages and "bootstraps" itself. The first version of the metacompiler translation is executed by hand. That is, the programmer pretends to be the metacompiler, parsing its rules and generating the code just as the metacompiler would do if it existed, which it doesn't, at least not at that first stage. Once the initial metacompiler is up and running, in a simple embryonic form, the full-strength metacompiler is created by successively defining and translating more sophisticated versions of itself. That is, at each succeeding stage, version n of the metacompiler is used to generate its successor, version n+1.

A runtime module consisting of support functions required for the translation process usually rounds out the full metacompiler package. This would include input/output, symbol table, and string processing functions.

Historical

Metacompilers have played a significant role in both computer science and the build-up of the computer industry. Initial metacompilers including Meta-II[5] and its descendant TreeMeta.[4]

The ideas about grammars, self-generation and extensions are widely used by Program transformation systems.

Early History of Metacompilers

The early history of metacompilers is closely tied with the history of SIG/PLAN Working group 1 on Syntax Driven Compilers. The group was started primarily through the effort of Howard Metcalfe in the Los Angeles area.[8] In the fall of 1962 Howard Metcalfe designed two compiler-writing interpreters. One used a bottom-to-top analysis technique based on a method described by Ledley and Wilson.[9] The other used a top-to-bottom approach based on a work by glennie to generate random English sentences from a context-free grammar.[10]

At the same time, Val Schorre described two "meta machines". One generative and one analytic. The generative machine was implemented and produced random algebraic expressions. Meta I the first metacompiler was implemented by Schorre on an IBM 1401 at UCLA in January 1963. His original interpreters and metamachines were written directly in a pseudo-machine language. Meta II, however, was written in a higher-level metalanguage able to describe its own compilation into the pseudo-machine language.[11][12]

A Metacompiler Tutorial provides an online way to learn about Meta II.

Lee Schmidt at Bolt, Beranek, and Newman wrote a metacompiler in March 1963 that utilized a CRT display on the time-sharing PDP-l.[13] This compiler produced actual machine code rather than interpretive code and was partially bootstrapped from Meta I.

Schorre bootstrapped Mteta II from Meta I during the Spring of 1963. The paper on the refined metacompiler system presented at the 1964 Philadelphia ACM conference is the first paper on a metacompiler available as a general reference. The syntax and implementation technique of Schorre's system laid the foundation for most of the systems that followed. The system was implemented on a small 1401, and was used to implement a small ALGOL-like language.

Many similar systems immediately followed.

Roger Rutman of A. C. Sparkplug developed and implemented LOGIK, a language for logical design simulation, on the IBM 7090 in January 1964.[14] This compiler used an algorithm that produced efficient code for Boolean expressions.

Another paper in the 1964 ACM proceedings describes Meta III, developed by Schneider and Johnson at UCLA for the IBM 7090.[15] Meta III represents an attempt to produce efficient machine code, for a large class of languages. Meta III was implemented completely in assembly language. Two compilers were written in Meta III, CODOL, a compiler-writing demonstration compiler, and PUREGOL, a dialect of ALGOL 60. (It was pure gall to call it ALGOL).

Late in 1964, Lee Schmidt bootstrapped the metacompiler EQGEN, from the PDP-l to the Beckman 420. EQGEN was a logic equation generating language.

In 1964, System Development Corporation began a major effort in the development of metacompilers. This effort includes powerful metacompilers, Bookl, and Book2 written in LISP which have extensive tree-searching and backup capability. An outgrowth of one of the Q-32 systems at SDC is Meta 5.[16] This system was successfully released to a wide number of users and had many string-manipulation applications other than compiling. The Meta 5 system incorporates backup of the input stream and enough other facilities to parse any context-sensitive language. It has many elaborate push-down stacks, attribute setting and testing facilities, and output mechanisms. The fact that Meta 5 successfully translates JOVIAL programs to PL/l programs clearly demonstrates its power and flexibility.

The LOT system was developed during 1966 at Stanford Research Institute and was modeled very closely after Meta II.[17] It ha6d new special-purpose constructs allowing it to generate a compiler which would in turn be able to compile a subset of PL/l. This system had extensive statistic-gathering facilities and was used to study the characteristics of top-down analysis.

SIMPLE is a specialized translator system designed to aid the writing of pre-processors for PL/I, SIMPLE, written in PL/I, is composed of three components: An executive, a syntax analyzer and a semantic constructor. [18]

The TREE META compiler was developed at Stanford Research Institute in Menlo Park, California. April 1968.

The early metacompiler history is well documented in the TREE META manual. TREE META paralleled some of the SDC developments. Unlike earlier metacompilers it separated the semantics processing from the syntax processing. The syntax rules contained tree building operations that combined recognized language elements with tree nodes. The tree structure representation of the input was then processed by a simple form of unparse rules. The unparse rules used node recognition and attribute testing that when matched resulted in the associated action being performed. In addition like tree element could also be tested in an unparse rule. Unparse rules were also a recursive language being able to call unparse rules passing elements of thee tree before the action of the unparse rule was performed.

The concept of the metarnachine originally put forth by Glennie is so simple that three hardware versions have been designed and one actually implemented. The latter at Washington University in St. Louis. This machine was built from macro-modular components and has for instructions the codes described by Schorre.

CWIC (Compiler for Writing and Implementing Compilers) is the last known Schorre metacompiler. It was developed at System Development Corporation. Documentation on CWIC is available from: "CWIC/36O system, a compiler for writing and implementing compilers ACM SIGPLAN Notices June 1970 volume=5 issue=6 pages=11–29" and System Development Corporation CWIC Technical Manual archived at Charles Babbage Institute Center for the History of Information Technology (Box 12, folder 21). With the full power of (lisp 2) a list processing language optimizing algorithms could operate on syntax generated lists and trees before code generation. CWIC also had a symbol table built into the language.

Information about later descendants of these metacompilers is not generally available. With the resurgence of domain-specific languages and the need for parser generators which are easy to use, easy to understand, and easy to maintain, metacompilers are becoming a valuable tool for advanced software engineering projects.

The implementations of Forth have had a long history of bootstrapping using the Forth-based metacompiler .[19] This uses an approach based on the concept of the Forth dictionary and domain-specific word definitions, quite different from the approach used in the Meta II family of metacompilers.

Versions of metacompilers using different approaches from any of the above have been implemented in industry and academia.[7]

CWIC

In 1969-1970, Erwin Book, Dewey Val Schorre, and Steven J. Sherman developed CWIC.[3] (Compiler for Writing and Implementing Compilers) at System Development Corporation Charles Babbage Institute Center for the History of Information Technology (Box 12, folder 21),

CWIC is a meta-compiler system composed of three special-purpose languages, each intended to permit the description of certain aspects of translation in a straightforward, natural manner. The syntax language is used to describe the recognition of source text and the construction from it to an intermediate tree structure. The generator language is used to describe the transformation of the tree into appropriate object language. MOL/360 ,Machine Oriented Language-360, is a mid-level systems programming language for the IBM System/360 family of computers that is used to provide an interface with the machine and it's operatoring system. MOL/360 was independently developed in 1968 and used to develop the ADEPT time sharing system.

The syntax language is similar to that of Dewey Val Schorre's line of metacompilers. It most resembles TREEMETA having tree building operations in the syntax language. The unparse rules of TREEMETA are extended to work with the object oriented generator language based on LISP 2.

CWIC and subsequent versions of this technology became part of a classified project at SDC and so information about any advancements is now unavailable. Well except for: The "CWIC/36O system, a compiler for writing and implementing compilers ACM SIGPLAN Notices June 1970 volume=5 issue=6 pages=11–29" and System Development Corporation CWIC Technical Manual archived at Charles Babbage Institute Center for the History of Information Technology (Box 12, folder 21)

CWIC includes three languages:

  • Syntax: Transforms the source program input, into list structures using specialized grammars. A parsed expression structure is passed to a generator by placement of a generator call in a rule. A tree is a list whose first element is a node. The language has operators, < and >, specifically for making lists. The colon : operator is used to create node objects. :ADD creates an ADD node. The exclamation ! operator combines a number of parsed entries with the most previous created node to make a tree . The syntax rules are processed by generator functions, returning success or failure. The syntax language is very close to TREEMATA. Both use a colon to create a node. CWIC's tree building exclamation !<number> functions the same as TREEMETA's [<number>].
  • Generator: a named series of transforming rules, each consisting of an unparse, pattern matching, rule. and an output production written in a LISP 2 like language. the translation was to IBM 360 binary machine code. Other facilities of the generator language generalized output.[3]
  • MOL-360: an independent mid level implementation language language developed in 1968 and used for writing the underlying support library.

Generators Language

Generators Language had semantics similar to Lisp. The parse tree was thought of as a recursive list. The general form of a Generators Language function is:

  function-name(first-unparse_rule) => first-production_code_generator
                   (second-unparse_rule) => second-production_code_generator
                   (third-unparse_rule) => third-production_code_generator
               ...

The code to process a given tree included the features of a general purpose programming language, plus a form: <stuff>, which would emit (stuff) onto the output file. A generator call may be used in the unparse_rule. The generator is passed the element of unparse_rule pattern in which it is placed and returns values are listed in (). For example:

  expr_gen(ADD[expr_gen(x),expr_gen(y)]) =>
                                <AR + (x*16)+y;>
                                releasereg(y);
                                return x;
               (SUB[expr_gen(x),expr_gen(y)])=>
                                <SR + (x*16)+y;>
                                releasereg(y);
                                return x;
               (MUL[expr_gen(x),expr_gen(y)])=>
                              .
                              .
                              .
               (x)=>     r1 = getreg();
                            load(r1, x);
                            return r1;
...

That is, if the parse tree looks like (ADD[<something1>,<something2>]), expr_gen(x) would be called with <something1> and return x. A variable in the unparse rule is a local variable that can be used in the production_code_generator. expr_gen(y) is called with <something2> and returns y. Of note here is a generator call in an unparse rule is passed the element in the position it occupies. Hopefully in the above x and y will be registers on return. The last transforms is intended to load an atomic into a register and return the register. The first production would be used to generate the 360 "AR" (Add Register) instruction with the appropriate values in general registers. The above example is only a part of a generator. Every generator expression evaluates to a value that con then be further processed. The last transform could just as well have beenr writen as:

               (x)=>     return load(getreg(), x);

In this case load returns it's first parameter, the register returned by getreg(). the functions load and getreg are other CWIC generators.

CWIC addressed domain-specific languages before the term domain-specific language existed

From the authors of CWIC:

"A metacompiler assists the task of compiler-building by automating its noncreative aspects, those aspects that are the same regardless of the language which the produced compiler is to translate. This makes possible the design of languages which are appropriate to the specification of a particular problem. It reduces the cost of producing processors for such languages to a point where it becomes economically feasible to begin the solution of a problem with language design."[3]

See also

References

  1. ^ a b Sci-Tech Dictionary McGraw-Hill Dictionary of Scientific and Technical Terms, 6th edition, published by The McGraw-Hill Companies, Inc. http://www.answers.com/topic/metacompiler
  2. ^ a b c Neighbors, J.M. Software Construction using Components. Technical Report 160, Department of Information and Computer Sciences, University of California, Irvine, 1980.
  3. ^ a b c d e f Book, Erwin; Dewey Val Schorre; Steven J. Sherman (June 1970). "The CWIC/36O system, a compiler for writing and implementing compilers". ACM SIGPLAN Notices. 5 (6): 11–29. doi:10.1145/954344.954345.
  4. ^ a b c C. Stephen Carr, David A. Luther, Sherian Erdmann, The TREE-META Compiler-Compiler System: A Meta Compiler System for the Univac 1108 and General Electric 645, University of Utah Technical Report RADC-TR-69-83.
  5. ^ a b Schorre, D.V., META II a syntax-oriented compiler writing language, Proceedings of the 1964 19th ACM National Conference, pp. 41.301-41.3011, 1964
  6. ^ MetaCompiler -- an introductory reference point to Forth metacompilers
  7. ^ a b MetaStic -- a specialized metacompiler based on Forth
  8. ^ Howard Metcalfe, "A Parameterized Compiler Based on Mechanical Linguistics" Planning Research Corporation R-311, March 1, 1963, also in Annual Review in Automatic Programming, Vol. 4
  9. ^ Robert Ledley and.J. B. Wilson, "Automatic Programming, Language Translation Through Syntactical Analysis," Communications of the Association for Computing Machinery, Vol. 5, No.3 pp. 145-155, March 1962.
  10. ^ A. E. Glennie, "On the Syntax Machine and the Construction of a Universal Computer," Technical Report Number 2, AD 240-512, Computation Center, Carnegie Institute of Technology, 1960.
  11. ^ Dewey, Val Schorre (1963). "A Syntax - Directed SMALGOL for the 1401,". ACM Natl. Conf., Denver,Colo.
  12. ^ Meta I is described in the paper given at the 1963 Colorado ACM conference. See SMALGOL
  13. ^ L. O. Schmidt, "The Status Bitt ACM SegPlan "Special Interest Group on Programming Languages" Working Group 1 News Letter, 1964.
  14. ^ Roger Rutman, "LOGIK. A Syntax Directed Compiler for Computer Bit-Time Simulation," Master Thesis, UCLA, August 1964
  15. ^ F. W. Schneider and (G. D. Johnson, "A Syntax-Directed Compiler-writing, Compiler to generate Efficient Code," Proceedings of the 19th National Conference of the Association for Computing Machinery, 1964
  16. ^ D. Oppenheim and D. Haggerty, "META 5: A Tool to Manipulate Strings of Data," Proceedings of the 21st National Conference of the Association for Computing Machinery, 1966.
  17. ^ Charles R. Kirkley and Johns F. Rulifson, "The LOT System of Syntax Directed Compiling," Stanford Research Institute Internal Report ISR 187531-139, 1966.
  18. ^ George J. E. (1967a). Syntax Analyzer, Recognizer, Parser and Semantic interpretation System, Stanford Linear Accelerator Center, November 15, 1967
  19. ^ Retro -- "Retro is a concatenative, stack based language with roots in Forth."