ECL (data-centric programming language)

ECL
ECL
Paradigm	declarative structured, data-centric
First appeared	2000
Typing discipline	static, strong, safe
Website	hpccsystems.com
Dialects
	UCSD, Borland, Turbo
Influenced by
	Prolog, Pascal, SQL, Snobol4, C++, Clarion
Influenced
	Big Data

Template:New unreviewed article

ECL is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process Big Data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions.^[1]

History

ECL was initially designed and developed in 2000 as an in-house productivity tool within Seisint Inc and was considered to be ‘secret weapon’ that allowed Seisint to gain market share in its data business. The technology was cited as a driving force behind the acquisition of Seisint by LexisNexis and then again as a major source of synergies when LexisNexis acquired ChoicePoint Inc.

Implementations

The first implementation of ECL in June 2000 translated the input ECL into a variant of SQL, to run on a (now retired) in-memory query engine known as hOle. Later in 2000 a second implementation of an ECL execution engine was created (known as Thor), which ran on a cluster of Windows 2000 servers, and the ECL compiler was extended to generate C++ code, which was then compiled using MSVC to create executable DLLs that the execution engine would load and run. In 2002 the engines were ported to Linux and the ECL compiler extended to support generation of Gnu g++ code. A third execution engine, designed for rapid repeated execution of similar queries (known as Roxie) was also developed around this time, using the same ECL compiler, language, and generated DLL technology.

Language Constructs

ECL, at least in its purest form, is a declarative, data centric language. Programs, in the strictest sense, do not exist. Rather an ECL application will specify a number of core datasets (or data values) and then the operations which are to be performed on those values.

Hello world

ECL is to have succinct solutions to problems and sensible defaults. The ‘Hello World’ program is characteristically short: ‘Hello World’ Perhaps a more flavorful example would take a list of strings, sort them into order, and then return that as a result instead.

// First declare a dataset with one column containing a list of strings
// Datasets can also be binary, csv, xml or externally defined structures

D := DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},{'Language'}],{STRING Value;});
SD := SORT(D,Value);
output(SD)

The statements containing a := are defined in ECL as attribute definitions. They do not denote an action; rather a definition of a term. Thus, logically, an ECL program can be read: “bottom to top”

OUTPUT(SD)

What is an SD?

SD := SORT(D,Value);

SD is a D that has been sorted by ‘Value’

What is a D?

D := DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},{'Language'}],{STRING Value;});

D is a dataset with one column labeled ‘Value’ and containing the following list of data.

ECL Primitives

ECL primitives that act upon datasets include: SORT, ROLLUP, DEDUP, ITERATE, PROJECT, JOIN, NORMALIZE, DENORMALIZE, PARSE, CHOOSEN, ENTH, TOPN, DISTRIBUTE

ECL Encapsulation

Whilst ECL is terse and LexisNexis claims that 1 line of ECL is roughly equivalent to 120 lines of C++ it still has significant support for large scale programming including data encapsulation and code re-use. The constructs available include: MODULE, FUNCTION, INTERFACE, MACRO, EXPORT, SHARED

Support for Parallelism in ECL

In the HPCC implementation, by default, most ECL constructs will execute in parallel across the hardware being used. Many of the primitives also have a LOCAL option to specify that the operation is to occur locally on each node.

Comparison to Map-Reduce

The Hadoop Map-Reduce paradigm actually consists of three phases which correlate to ECL primitives as follows.

Hadoop Name/Term	ECL equivalent	Comments
MAPing within the MAPper	PROJECT/TRANSFORM	Takes a record and coverts to a different format; in the Hadoop case the conversion is into a key-value pair
SHUFFLE (Phase 1)	DISTRIBUTE(,HASH(KeyValue))	The records from the mapper are distributed dependent upon the KEY value
SHUFFLE (Phase 2)	SORT(,LOCAL)	The records arriving at a particular reducer are sorted into KEY order
REDUCE	ROLLUP(,Key,LOCAL)	The records for a particular KEY value are now combined together

References

^ A Guide to ECL, Lexis-Nexis.

External links

[1] A Guide to ECL, Lexis-Nexis.

[1]