Extensible programming
'Note: article is currently a work in progress.'
Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language(s), compiler and runtime environment. A system that supports extensible programming will provide all of the following features:
- Extensible syntax
- Extensible compiler
- Extensible runtime
- Content separated from form
- Source language debugging support
Each of these aspects of extensible programming are described in further detail below.
Extensible Syntax
This simply means that the source language(s) to be compiled must not be closed, fixed or static. It must be possible to add new keywords, concepts and structures to the source language(s). While it is acceptable for some fundamental and intrinsic language features to be immutable, the system must not rely solely on those language features. It must be possible to add new ones.
Extensible Compiler
In extensible programming, a compiler is not a monolithic program that converts source code input into binary executable output. The compiler itself must be extensible to the point that it is really a collection of plugins that assist with the translation of source language input into anything. For example, an extensible compiler will support the generation of object code, code documentation, re-formatted source code, or any other desired output. The architecture of the compiler must permit its users to "get inside" the compilation process and provide alternative processing tasks at every reasonable step in the compilation process.
For just the task of translating soure code into something that can be executed on a computer, an extensible compiler should:
- utilize a plug-in or component architecture for nearly every aspect of its function
- determine which language or language variant is being compiled and locate the appropriate plug-in to recognize and validate that language
- use formal language specifications to syntactically and structurally validate arbitrary source languages
- assist with the semantic validation of arbitrary source languages by invoking an appropriate validation plug-in
- allow users to select from different kinds of code generators so that the resulting executable can be targeted for different processors, operating systems, virtual machines, or other execution environment.
- provide facilities for error generation and extensions to it
- allow new kinds of nodes in the abstract syntax tree (AST),
- allow new values in nodes of the AST,
- allow new kinds of edges between nodes,
- support the transformation of the input AST, or portions thereof, by some external "pass"
- support the translation of the input AST, or portions thereof, into another form by some external "pass"
- assist with the flow of information between internal and external passes as they both transform and translate the AST into new ASTs or other representations
Extensible Runtime
At runtime, extensible programming systems must permit languages to extend the set of operations that it permits. For example, if the system uses a bytecode interpreter, it must allow new bytecode values to be defined. As with extensible syntax, it is acceptable for there to be some (smallish) set of fundamental or intrinsic operations that are immutable. However, it must be possible to overload or augment those intrinsic operations so that new or additional behavior can be supported.
Content Separated From Form
Extensible programming systems should regard programs as data to be processed. Those programs should be completely devoid of any kind of formatting information. The visual display and editing of programs to users should be a translation function, supported by the extensible compiler, that translates the program data into forms more suitable for viewing or editing. Naturally, this should be a two-way translation. This is important because it must be possible to easily process extensible programs in a variety of ways. It is unacceptable for the only uses of source language input to be editing, viewing and translation to machine code. The arbitrary processing of programs is facilitated by de-coupling the source input from specifications of how it should be processed (formatted, stored, displayed, edited, etc.).
Source Language Debugging Support
Extensible programming systems must support the debugging of programs using the constructs of the original source language regardless of the extensions or transformation the program has undergone in order to make it executable. Most notably, it cannot be assumed that the only way to display runtime data is in structures or arrays. The debugger, or more correctly 'program inspector', must permit the display of runtime data in forms suitable to the source language. For example, if the language supports a data structure for a business process or work flow, it must be possible for the debugger to display that data structure as a fishbone chart or other form provided by a plugin.