Compiled language
![]() | This article includes a list of general references, but it lacks sufficient corresponding inline citations. (January 2013) |
A compiled language is a programming language whose implementations are typically compilers (translators that generate machine code from source code), and not interpreters (step-by-step executors of source code, where no pre-runtime translation takes place).
The term is somewhat vague. In principle, any language can be implemented with a compiler or with an interpreter.[1] A combination of both solutions is also common: a compiler can translate the source code into some intermediate form (often called p-code or bytecode), which is then passed to an interpreter which executes it.
Advantages and disadvantages
Programs compiled into native code at compile time tend to be faster than those translated at run time, due to the overhead of the translation process. Newer technologies such as just-in-time compilation, and general improvements in the translation process are starting to narrow this gap, though. Mixed solutions using bytecode tend toward intermediate efficiency.
Low-level programming languages are typically compiled, especially when efficiency is the main concern, rather than cross-platform support. For such languages, there are more one-to-one correspondences between the programmed code and the hardware operations performed by machine code, making it easier for programmers to control the use of central processing unit (CPU) and memory in fine detail.
With some effort, it is always possible to write compilers even for traditionally interpreted languages. For example, Common lisp can be compiled to Java bytecode (then interpreted by the Java virtual machine), C code (then compiled to native machine code), or directly to native code. Programming languages that support multiple compiling targets give more control to developers to choose either execution speed or cross-platform compatibility.
Traditional compiled languages
The prototypical compiled languages from the traditional language family that remain in widespread use are Fortran, COBOL, C, and C++, all from the procedural language family tree.
C and C++ both descend from Algol 60, which was an important stepping stone in the development of the strongly typed type system and structured programming idiom that now effectively defines this language family. Within this family, Pascal was notable for a time in the educational setting, Ada became prominent in the military setting, and Java became prominent in both the educational setting and for applications deployed on the internet. A modern spin within this family family is the Go programming language, now widely used in the modern cloud data center.
In a great many contexts in modern computer science and computer engineering, the phrase "compiled language" is implicitly taken to mean "traditional compiled language" referencing this particular rich and diverse tradition, a fundamental building block upon which much of the rest of computing—including most other compiled languages—is erected to this day.
Origins of traditional compiled languages
ENIAC plugboard era and early assemblers
Historically, computers were originally programmed in a form of machine code that predated the formulation of a standardized instruction set architecture; these early machine codes were effectively a direct outgrowth of the plugboard used to program the first ENIAC machines in 1945. Work at this level is arduous and painstaking to an extreme degree; for the most part, there is no such thing as a simple change and code reuse is limited at best, and confined to a small set of highly related problems for a single machine architecture.
Cognitive science professor Douglas Hofstadter has compared machine code to genetic code, saying that "Looking at a program written in machine language is vaguely comparable to looking at a DNA molecule atom by atom."[2] The first level of abstraction in the development of computer software was the assembler which performed a fairly direct translation from a human-readable notation for machine instructions into the actual machine codes.
The greatest initial benefit was having the assembler computer branch target addresses automatically, as these are sensitive to code size and often change in multiple places with each instruction added or deleted. Assemblers are technically a form of compiler, but are rarely spoken about as compilers because there is little conceptual distance between assembly language and machine code, and the assembly language remains far too low-level for the assembler to diagnose coding directives that would almost surely lead to erratic runtime behaviour, such as failing to preserve a register in an interrupt handler whose preservation was vital to the program code interrupted.
Fortran era
As processors became more complex, a second burden emerged when working at the level of assembly language: register allocation. An early application of computers was numerical simulation, which involved the evaluation of complex mathematical expressions, often with common elements which could be cleverly arranged to evaluate once, with the result shared between expression evaluations.
Reuse of these intermediate values was most efficient when these values were saved spare registers, but registers were a precious and scarce resource. With enough cleverness on the part of an assembly language programming, these resources could be orchestrated with the efficiency of a factory assembly line, but this was arduous and the code in a numerical simulation could change daily.
In 1957 the Fortran compiler was introduced to automate formula translation. Register allocation, common subexpression elimination, and related forms of low-level optimization are the bread and butter of traditional compilers of the Fortran mode, notably including C and C++ which are extensively used in modern software development when code performance is paramount.
Compilers within this compiled language family evolved to contain highly detailed models of machine architectures and the specific implementation of each chip generation within the chip family, and the most efficient instruction sequences for each common operation.
Note also that details of the machine architecture are not entirely abstracted from the application programmer within this class of compiled languages: in C/C++ the size of a machine word which implements the integer data type is decided by the compiler at compile time based on the execution model selected for the target architecture.
Compiled language/compiler relationship as mixed blessing
Ever since the early days of Fortran, traditional compiled languages and their compilation environments have largely been treated as synonymous, because the abstraction between the language definition and the machine model was deliberately left incomplete, so that a determined programmer could explicitly obtain the highest possible performance by exploiting low-level details (quirks) of the target machine, if so desired.
Modern academic computer science regards this tradition as extraordinarily error prone, and favours a far more complete abstraction between the language and the execution environment. A key progenitor of the first Fortran compiler, John Backus, himself recanted from his involvement in what he termed "the von Neumann style" when he introduced one of the first functional programming languages, FP.
Despite their reputation for being difficult and error prone to code correctly, Fortran, C and C++ have maintained large footprints in the industrial setting because each of these languages has a very large ecosystem of robust tools, capable of generating efficient machine code for the smallest microcontrollers to the very largest supercomputers, for applications with tens of lines of code, or tens of millions of lines of code.
Front ends and back ends
The machine model is to a large degree independent from the syntax of any particular compiled language, and so it became commonplace to implement several languages within a single compiler package. In addition to C, C++ and Fortran, the widely used GNU Compiler Collection (GCC) implements front-ends for Objective C/C++, Java, Ada, and Go, among others.
All these front ends share the same code generator back end. Most of these compiled languages are classified as procedural languages from the ALGOL 60 family tree, sharing an emphasis on structured programming and strong typing. C++ is more abstract than C concerning its type system, with considerable support for concrete type inference at compile time; however, once the type is inferred at compile time, it remains fixed in the generated machine code (primitive support for runtime polymorphism exists in C++, but only if explicitly coded, as a special language construction mainly used in programs which can't reasonably avoid it).
Hosting and self-hosting with a traditional compiled language
GCC is itself coded in C/C++, which are system programming languages originally developed explicitly to support the coding of operating systems and this kind of traditional compiler. At the other end of the spectrum, the V8 JavaScript engine, targeting an extremely dynamic programming language, is also implemented in C++.
Mixed programming with a traditional compiled language
This type of compiler maintains a close relationship with assembly language, and usually generates an assembly language translation of the source program as input to the back end for final code generation. for this reason, it is generally a trivial matter for the programmer to code in assembler directly as part of a C or C++ program; this can be used for especially time-critical programming tasks (often interrupt handlers) or to access esoteric features of a computer platform that the available compiler is not yet able to harness. When using a debugging tool at runtime, such as the GNU Debugger (GDB), the programmer will often view the source code and the generated machine code directly interleaved.
Bootstrapping with traditional compiled languages
A common task in computer engineering is porting an operating system to a new hardware design, often with a novel instruction set (e.g. the RISC-V architecture introduced in 2010) for which no software preexists. The first step is usually to take a compiler such as GCC and develop a cross compiler mode where the compiler can be run on an existing host architecture, while generating code for a novel, foreign architecture (often termed the target architecture). This will permit the developer to compile enough operating system support for the target to implement a native assembler on the target itself, and from there bootstrap into a self-hosted C compiler to achieve initial autonomy. Once a native C compiler is available on the target, further development of the target machine environment can continue on the target machine itself, or with continue support from a cross compiler on a foreign host, as proves most convenient.[3]
Because so many compilers for other languages are themselves implemented in C/C++ (or employ C/C++ as stepping stones to achieve their own self-hosted operation), this is often the route to obtain every other desired compiler on the target system.
A few other languages exist, such as Forth, with a small enough primitive kernel to implement in machine code directly (without even an assembler at first) and thus can be bootstrapped fairly quickly involving no other compiler or compiled language at all. Forth is entirely capable of implementing a native assembler, at which point the target machine can be consider fully self-hosted for binary executables.
Binary standardization
An important aspect of traditional compiled languages is the support provided in their compilers for a standardized application binary interface and standardized register and stack-based calling conventions. This permits a variety of different compiled languages to be compiled and linked together into the same program.
Languages
Some languages that are commonly considered to be compiled:
- Ada
- ALGOL
- BASIC
- PowerBasic
- Visual Basic (to bytecode)
- PureBasic
- C
- C++
- C# (to bytecode)
- CLEO
- COBOL
- Cobra
- Crystal
- D
- eC
- Eiffel
- Erlang (to bytecode)
- F# (to bytecode)
- Factor (later versions)
- Forth
- Fortran
- Go
- Haskell
- Haxe (to bytecode or C++)
- Java (to bytecode)
- JavaScript (to bytecode JIT)
- JOVIAL
- Julia
- LabVIEW, G
- Lisp
- Lush
- Mercury
- ML
- Nim (to C, C++, or Objective-C)
- Open-URQ
- Pascal
- Objective-C
- PL/I
- RPG
- Rust
- Seed7
- SPITBOL
- Swift
- Visual Foxpro
- Visual Prolog
- W
- Zig
Tools
See also
References
- ^ Ullah, Asmat. "Features and Characteristics of Compiled Languages". www.sqa.org.uk.
- ^ D. Hofstadter (1980). "Gödel, Escher, Bach: An Eternal Golden Braid": 290.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Jolix, William; Jolix, Lynne (1991). "Porting Unix to the 386". Dr. Dobb's Journal. Retrieved 24 September 2020. Note: Individual links to the 18-part magazine article series is available from the author's personal web site under the first drop-down control, which is not initially obvious given the non-standard site design.
- ^ Hickey, Rich. "Clojure is a compiled language", Retrieved on 11 September 2020.
External links
.