Talk:Compiler-compiler/Archive 1
![]() | This is an archive of past discussions about Compiler-compiler. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 |
Info for 'citation needed'
"The first compiler-compiler to use that name was written by Tony Brooker in 1960 and was used to create compilers for the Atlas computer at the University of Manchester, including the Atlas Autocode compiler.[citation needed]" - Here's an article on how one version of the Atlas Autocode compiler was written using the Compiler Compiler: https://academic.oup.com/comjnl/article/9/4/350/390181
While I'm here I feel obliged to comment on some of the ramblings below.
A true compiler-compiler is what is nowadays called an 'extensible language'. Brooker and Irons were working in similar areas and interacted at conferences etc so it's not a coincidence that there are similarities in their work. Brooker coined the term compiler-compiler but Irons preferred the term extensible language. Systems such as Yacc etc are *not* compiler-compilers - they're parser generators. The Brooker compiler-compiler is effectively a very simple compiler that is modified on the fly to add new syntax to the underlying system and which having made these modifications goes on in the same execution to compile a source program in the defined language.
If by 'metacompiler' you mean a compiler engine that has no built-in input language, but will compile a program for any language if given a configuration file that defines that language, then yes, that would be a compiler-compiler. Yacc is *not* a compiler-compiler - or a metacompiler - because it doesn't compile. It just generates a parser.
However this stuff about metacompilers being defined as compilers that compile themselves is utter nonsense. That's just bootstrapping and has nothing to do with compiler-compilers.
The later derivative of Atlas Autocode, "Imp", also had a compiler-compiler facility that was extremely similar in use to the Brooker compiler-compiler - however in the description for the Imp extensible language facility, it is described as a macro facility because it was implemented as a separate phase from the compiler itself, but the end result is identical to what would have been done with code such as was built-in to the compiler-compiler. It is documented at https://gtoal.com/history.dcs.ed.ac.uk/archive/languages/imp/ImpMacros.pdf with an example file at https://gtoal.com/history.dcs.ed.ac.uk/archive/languages/imp/macros.imp This style of language extension using syntax directed macros (ie macros which require the same parser as used in the compiler itself) was also implemented in Robert Rae's WonderPop, the Dec10 implementation of Pop2. It too could reasonably be described as a compiler-compiler, as the language extension was built in (and I've implemented a simple Pascal using Wonderpop macros so I know first-hand it was a proper extensible language)
Oh - and a little throwaway detail that I think has been overlooked - the parsing mechanism used by Tony Brooker in the compiler-compiler, and Atlas Autocode, and later in every Imp compiler ... was what is now called a Parsing Expression Grammar (PEG). It wasn't invented by Brian Ford in 2004, it was invented by Tony Brooker in the late 1950's. You won't find a reference to that online because that term wasn't invented until the 2000's, but if you know enough about parsers to actually read and work on any of the ones I mentioned above, it's obvious that the two technologies are identical.
Clarity
This article is a perfect example of an explanation written by and for people who already know what the subject is in detail. I am a tech writer specializing in software engineering and even though I have more than just a few notions of what this is about I am struggling to make sense of this. What it is missing is real world anchors - e.g. "A compiler-compiler or compiler generator is a tool" should read "A compiler-compiler or compiler generator is a computer program" - there are many more examples throughout the text that show that the author assumes that not only are all the readers of the article software engineers but that they all work in the same field. Some readers may be well informed onlookers; others may have no knowledge whatsoever of software engineering but have enough intelligence to understand what compiler-compilers do. Those people are not being catered for in this article. Tolken (talk) 05:57, 28 May 2010 (UTC)
Contradictory
Main description:
The earliest and still most common form of compiler-compiler is a parser generator, whose input is a grammar (usually in BNF) of a programming language, and whose generated output is the source code of a parser often used as a component of a compiler.
What was the earliest parser generator? Referance!!
The earliest Compiler-Compilers I can find referenced are:
"A SYNTAX DIRECTED COMPILER FOR ALGOL 60" Edgar T. Irons, Communications of the ACM Volume 4 Issue 1, Jan. 1961.
"A General Translation Program for Phrase Structure Languages" Brooker and Morris, Journal of the ACM (JACM) JACM Homepage archive Volume 9 Issue 1, Jan. 1962
"A compiler-building system developed by Brooker and Morris" Brooker and Morris, Communications of the ACM Volume 7 Issue 7, July 1964
"A general-purpose table-driven compiler" Stephen Warshall and Robert M. Shapiro , AFIPS '64 (Spring) Proceedings of the April 21-23, 1964, spring joint computer conference Found no early parser generators. Maybe just not used back then. The earliest ACM topic is in 1971
Also: The compiler-compiler, a seminal idea first presented at a British Computer Society Conference in July 1960 by Brooker and Morris.
As far as I know usable stand alone parser generators, separate from compiler compiler systems, didn't appear until yacc in the 1970's.
There were notable compiler compilers developed befor parser generators separately existed.
This missconception highlights the a problem. Computers existed before computer science existed. Computer terminology existed before computer science. Vary few collages in the late sixties offered computer science. Computer technology was studied under engineering. Programming was tought by math and data processing departments. That is when I was in collage. Computer science was just begining at universities. A few like Stanford and MIT were pioneers. ACM members were leading the technology progress. Academic Computer Science has just ignored what came beforehand.Steamerandy (talk) 12:17, 10th and 20th of September 2018 (UTC) Update of earler post from me--Steamerandy (talk) 20:04, 1 November 2014 (UTC)--
I think it might be useful to merge this article (Compiler-compiler) with the article on Parsers (subsection "parser generator"). (comment by Netizen 2003-10-31)
METACOMPIER example
The history section looks a bit confusing, is there any example of a real compiler-compiler?
I am posting this for use here. I developed SLIC that was based on work done by Val Schorre
TREE-META and META II are compiler-compilers / translator systems that document their self as META "language" compilers. That is they were not called metacompilers by their creators. I know from attending ACM L.A. SEGPLAN meetings that they were being refered to as metacompilers. So I believe metacompiler is a
I feel that the term metacompiler came out of its common use when talking about the compilers of these early META "languages".
We had several META "languages": META I, META II, META III META 5 etc. Of course the languages had compilers. We had a compiler for the META II language. A META III compiler etc. So I believe that metacompiler is a portmanteau of META "language x" compiler. So metacompiler is a generic term for a compiler whoes input is a programming metalanguage. So sense metalanguages are languages for talking about a language or languages. A metacompiler is a compiler that takes as input a metaprogramming metalanguage. There is an early metalanguage compiler that was based on Schorre's META II. Only it took as input boolean formula and compiled them into boolean logic simulations used in circuitry design. So no a metacompiler does not necessarily have to compile itself. But they do generally take as input a programming language designed for describing the analysis and translation of programming languages.
I have a real problem with the clame that it is computer science that dictates the meaning terminology. Its kind of the other way around. Computer science tries to put meaningful lables on specific topics.
Such an idea that computer science dictates anything runs afoul of the concept of science. Science is not like religious dogma. Even if old technology is obsolete its terminology must be understood as to what it ment at the time. Termonolgy changes can very much effect the understanding subjects. The terms compiler-compiler and metacompiler have long been misused. There is no possibility of straightening this mess out without a general consensus. The oldest defination of metacompiler makes the most sense. A compiler that takes a metalanguage as input. The term compiler compiler was a compiler that is specificly used to write compilers. (not parser generators). The yacc a parser generator calling itself a compiler compiler. So the main reason for the confusion can be directly blamed of yacc developers calling yacc a compiler compiler when it less then half a compiler. It was a bullshit move on the authors of yacc. Where is the code generation. There isn't any. Is a bold faced lie that yacc is a compiler compiler as it was originally being used it that time. But now we are stuck with it
CWIC (Compiler for Writing and Implementing Compilers) is a compiler compiler developed by Val Schorre and Erwin Book at System Development Corporation also documents itself as a META "language" compiler.
CWIC produced binary IBM 360 executable code. It was a further development of V Schorre the developer of Meta II.
The syntax language was similar to Tree Meta producing a tree structure. But very different in that generator procedures were called to process them. The Generator language was far more powerful having several unique features.
A simple arithmetic expression example in the syntax language.
EXPR = TERM $(('+':ADD | '-':SUB) TERM !2); TERM = FACTOR $(('*':MPY | '/':DIV) FACTOR !2); FACTOR = ('(' EXPR ')' | NUMBER | VAR ) ('^' FACTOR:POW!2|.EMPTY);
The syntax language is composed of rules that specify the valid syntax of input. CWIC maintains two stacks. One containing nodes and the other, the parse tree. The ":" operator puts a node on the node stack and the "!" operator combines into a list the top node stack entry and the top numbern of entries from the parse stack and pushes the tree onto parse stack.
The EXPR rule defines an expression to be a TERM followed by a series of, zero or more, TERMs separated + OR - OPERATORS. The "$" operator specifies zero or more or the following may be present.
The expression "a + 2*(b+c) + D" would be turned into a tree representation as
- ADD[a,ADD[MPY[2.ADD[b,c]],D]] or in list form:
- [ADD,a,[ADD,[MPY,2.[ADD,b,c]],D]]
CWIC provide 3 types of rules. Syntax, Token and character class.
A digit character class rule for example:
DIGIT: '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';
A simple integer number.
NUMBER .. DIGIT $DIGIT MAKENUM[];
MAKENUM[] is a call to a library generator function that takes the token parsed in the token buffer, converts it to a numeric value and puts it on the parse stack. In the absence of a generator function as MAKENUMBER the token buffer would be simply entered into the symbol table and placed on the parse stack.
The GENERATOR language was an algorithmic procedural language that emitted 360 machine code.
The CWIC generator language provided pattern matched arguments. For example:
gen(ADD[gen(x),gen(y))) => {...} {SUB[gen(x),gen(y)]) => {...} ... integer(x) => return x; varable(x) => return x;
A generator consisted of its name (gen above), and pattern productions pairs.
(<pattern>) => < production>
The pattern could be simply a varable list or as in the above a specific pattern match. The pattern valadated the input and could brake it apart. The unique feature of function call in a pattern is that the function is called with the argument at which it resides and it's parameter list, if any, are returned values.
In 1969 I started implementation of
a Mata compiler system called SLIC that was developed at Cerritos Collage in Norwalk Calif. on a DEC-SYSTEM-10.
It consisted of several special purpose languages. The syntax language much like CWIC and TREEMETA was a rule based language producing tree or list structures. Though I now use formula instead of rule hopefully avoiding their confusion with production grammar rewrite rules. Formula are in part reductive grammar rules that are combined with directed transform operators that in early parser languages produce stack machine code. In later versions they produce abstract syntax trees that are passed to generator functions programed in a LISP 2 dialect. The CWIC LISP 2 based generator language produces 8 bit byte code insteuctions targeting IBM 360 computers. SLIC's GENERATOR language generates PSEUDO instructionns. PSEUDO instructions programed procedures that emit machine code. MACHINE instructions are defined in the MACHOP language. A MACHOP define an instruction's assembly to binary code. A machop defines an instruction's mnemonic The Generator language emitted PSEUDO code instructions to named section lists. Section were simply a data structure that defined the section parameters and a list containing emitted pseudo instructions. A call to flush a section would sequentially process the sections pseudo code list, calling the pseudo code procedures. The execution of the pseudo code procedures output relocatable machine code.
The MACHOP defined a target machine instruction or instruction class. A MACHOP could define a single instruction using it's mnemonic name or a set of instructions having the same binary format using a vectored entry. In the vectored entry the MACHOP name was instead a variable preceded by a "#" character. In the vectored entry the MACHOP name was set to the opcode value. A list of opcode mnemonics, value pairs followed the instruction definition.
Machine code was defined in a bit addressable memory space.
An example of a vectored entry MACHOP:
.MACHOP #OP; .MORG 8: H(16): $/8; H(8): OP; #AAA 0b00110111; #AAS 0b00111111; #CBW 0b10011000; #CLC 0b11111000; #CLD 0b11111100; #CLI 0b11111010; #CMC 0b11110101; #CWD 0b10011001; #DAA 0b00100111; #DAS 0b00101111; #HLT 0b11110100; #INT3 0b11001100; #INTO 0b11001110; #IRET 0b11001111 #LAHF 0b10011111; #LEAVE 0b11001001; #NOP 0b10010000; #POPA 0b01100001; #POPF 0b10011101; #PUSHA 0b01100000; #PUSHF 0b10011100; #SAHF 0b10011110; #STC 0b11111001; #STD 0b11111101; #STI 0b11111011; #XLAT 0b11010111;
The above is a very simple instruction consisting of only an 8 bit opcode. #OP specifies a vectored entry point. OP id a variable that contains the opcode binary value. The line:
#AAA 0b00110111;
is an AAA opcode entry. OP would contain the binary value 0b00110111
The .MORG operation is a modular org. The first operand 8 specifies to align the plant location to an 8 bit boundary.
.MORG 8: H(16): $/8;
H(16) is specifying listing output of a 16 bit field $/8 in hex format. $ is the plant location in bit addressable memory space
H(8): OP;
The above outputs OP to the output code file. H(8) specifies listing output in HEX of the 8 bit opcode in OP. Assembly listing output is optional.
Another MACHOP example of a more complex instruction:
.MORG 8: H(16): $/8; if isnumber(source) then { if isreg(dest) then { +H(4):0b1011; (1); if size:(dest) = 8 then 0 else 1; (3): dest;} else { if !frame:(dest) and segment:(dest) != curSegment then +H(8): overide(dest) +H(7):0b1100011; (1): if size:(dest) = 8 then 0 else 1; +H(2): mod(dest); (3): 0; (3): R_M(dest); +H(szoff): offset:(dest/8);} +H(size:(dest)): source} else if isreg(source) && isreg(dest) then { +H(7):0b1000100; (1): if size:(dest) = 8 then 0 else 1; +H(2):0b11; (3):source; (3):dest;} else if isreg(source) then { +H(7):0b1000100; (1): if size:(dest) = 8 then 0 else 1; +H(2): mod(dest); (3):source; (3):R_M(dest);} else isreg(dest) then { +H(7):0b1000101; (1): if size:(source) = 8 then 0 else 1; +H(2): mod(source); (3):source; (3):R_M(source);} else if issegreg(dest) then { +H(8):0b10001110; (2): mod(source); (3): issegreg(dest); (3): R_M(source); if !isreg(source); +H(szoff): offset:(source/8);} else { +H(8):0b10001100; (2): mod(dest); (3): issegreg(source); (3): R_M(dest); if !isreg(source); +H(szoff): offset:(dest/8);}
Many optimazations can be done in the generator language. To provide optimization on the sequential instructions an ISO language was planed but so far not implemented. The ISO language was planed to operate on the pseudo instruction lists.
The main thing SLIC did was to separate the target machine code generation from the semitics process. The pseudo instruction were designed with the language being compiled in mind. Pseudo instruction were much like an assembly macro. I.E. The Pseudo code for a function call:
- Output a procedure call instruction to the object file
PSEUDO call proc { ; if a 'far proc' use a far call ; if a 'proc' use a near call ; if a memory referance use memory size to deduce far or near ; call needed ; Some of this could be done in the call and farcall MACHOP's if TypeAttr:(proc) == farproc if segment:(proc) == CurrentSegment { ^pushreg cs ; Simulate far call ^call proc} else ^farcall proc; else if TypeAttr:(proc) == proc ^call proc; else if SizeAttr(proc) == 2 ^call proc; else ^farcall proc}
The basic concept was to produce code for any target machine architecture make it a simple matter to target different machines. By simply defining PSEUDO code using a target machine instructions.
In 1972 SLIC was used to program a COBOL compiller running on a DEC- SYSTEM-10 generating code for a TI990 mini computer.
Development time for the COBOL compiler was <4 months for myself and one other programer.
Edit of previous post from 2014.Steamerandy (talk) 13:20, 10 September 2018 (UTC) Steamerandy (talk) 21:40, 7 August 2019 (UTC)
CWIC
I added a section about CWIC to the "metacompiler" article. (We might want to consider merging "Metacompiler" and "Compiler-compiler" -- does anybody know how to propose that?)
The problem I'm seeing is that CWIC is probably too small in the great scheme of things to deserve its own article, but adding it as a section to "metacompiler" means it takes up too much of the article, compared with other material. I'm not sure what the "right" thing to do is.
I came into the CWIC project (we called it "The CWIC Mafia" because Erwin Book always wore a suit and looked a little bit like a mafioso) shortly after the initial development, and I basically became the maintenance engineer while Erwin, Val, and Steve worked on direct and indirect applications of the language. The Generators Language compiler was a masterpiece of passing the buck, aka "recursion".
One thing that has bothered me is that the work done on code generation in CWIC seems to have been lost to posterity. YACC, Bison, and friends concentrate on the easy problem: parsing, but offer no help with code generation. CWIC had many features that aided code generation -- the pattern matching, the <...> operator that emitted code, and operators that automatically handled repetitive sections of the tree.
If you used a pattern like (...$x...)=>, then "$x" was a list of things, and in the right hand side of the production you could write something like:
.do (... x ...)
This meant to repeatedly execute the code in parens, assigning successive elements of the list $x to "x" in each iteration. Or you could write $(...x...), which would generate a new list, with whatever transformation you specified on each element of $x. Similarly,
$&(...x...) was true only if the parenthesized expression was true for every element of $x $|(...x...) was true if the expression was true for any element $+(...x...) was the sum , $* was the product.
So CWIC was a significant advance on parser-only metalanguages. What it lacked (and AFAIK no metacompiler has tackled since) was assistance in maintaining the symbol dictionary. Of course, every "atom" that got scanned turned into a symbol that you could assign Lisp-style attributes to, but by now we should have meta-compilers that assist the compiler writer in at least the following two problems:
- The syntax says there's supposed to be an ID here. There should be operators to check that (a) the ID has been previously defined (or not defined), and (b) the ID represents a specific type of name (e.g., a variable name, a function name, etc.)
- Block-structured languages: you should be able to say "start a new block", which allows re-defining already defined variable names (and even function names).
Bgoldnyxnet (talk) 18:27, 5 June 2013 (UTC)
- This article should probably be moved to Parser generator, as that seems to be the main focus. Compiler-compiler can then be made into a disambiguation page to both Parser generator and Metacompiler.
- You might also want to have a look at attribute grammars, they add a semantic layer to parser generators to e.g. maintain symbol tables. —Ruud 20:02, 5 June 2013 (UTC)
- Thank you, Ruud, I'll look at attribute grammar. Bgoldnyxnet (talk) 04:55, 7 November 2014 (UTC)
- I thought CWIC had a symbol table stack. SLIC has a symbol table stack. Built in generaters are used to push and pop the symbol table stack. GETSYMTBL returns the top stack entry. It can be assigned to a symbol attribute for example. Scoped variables locals are handled that way. Debuger needed scope info. I am fairly sure CWIC had those functions. Block structured languages were handled by CWIC acording to Erwin Book. POPSYMTBL also returned the poped symbol table object.
- Bgoldnyxnet you may be a bit rusty on CWIC symbol tables. Remember MAKEINT[], MAKESTR[], MAKEOCT[] etc. Intercede function stopped the default symbol table entry.
- Earley Schorre meta compiler all produced code. META II generated stack machine code. TREE-META added unpardonable rules and tree making constructs : and [ ]. : followed by a node name is exact the same as CWIC. The [ ] inclosed a number and created a tree of with the number of specified branches just like the ! operator in CWIC. TREE-META did not have the power of CWIC. But was an intermediate step creating a tree and using unparse rules precursors to CWIC's generators. One could describe CWIC's generators as named grouped unparse rules having LISP 2 actions. I do not find Schorre mentioned in the TREE-META documents.
- I would say that CWIC should have its own artical. Compared to TREE-META OR META II,there is a lot more to CWIC. CWIC features that earlier ones did not have. 1)Symbol table. 2)Linkable binary output. 3)Token making rules instead of built in functions. .ID, .NUM, .STR. 4)Programed backtracking. Backtrack alternative operator. 5)The much more powerful LISP 2 generater language action code. More powerful unparse rules calling generators and returning results to named local variables. etc. Steamerandy (talk) 09:17, 10 November 2015 (UTC)
Merge
As was stated in #CWIC, we don't need two (or more) pages on the same topic:
I would pick name of the first public occurrence. Thus, not "Language workbench". Ushkin N (talk) 05:22, 23 May 2016 (UTC)
- Support merge as the topic seem to be identical. Given the early 1960s META II, Metacompiler seems be to be the better name. I note that the lede for that page includes the statement that a metacompiler is "a subset of a specialized class of compiler writing tools called compiler-compilers."; however, I can't find examples of compliler-complilers that would not be described as metacompilers. The contents of Compiler-compiler seem easier to digest, so material from that page should be used over the more arcane text at metacompiler. 13:35, 27 February 2018 (UTC) by Klbrain
- When working on implementing this, I have changed the merge direction, given the 'subset' argument above, as I'm no longer confident in my own interpretation; I'll assume that the text in the lede is correct. {{merge done)) Klbrain (talk) 07:57, 7 April 2018 (UTC)