Extract, load, transform

ELT is an alternative to extract, transform, load (ETL) used with data lake implementations. In ELT models the data is not processed on entry to the data lake which enables faster loading times. ELT is a data pipeline model.^[1] But ELT does require sufficient processing within the data processing engine to carry out the transformation on demand and return the results to the consumer in a timely manner. Since the data is not processed on entry to the data lake the query and schema do not need to be defined a-priori (often the schema will be available during load since many data sources are extracts from databases or similar structured data systems and hence have an associated schema).

Cloud Data Lake Components

Common Storage Options

AWS
- Simple Storage Service (S3)
- AWS RDS
Azure
- Azure Blog Storage
GCP
- Google Storage (GCS)

Querying

References

^ Using Redshift Spectrum to load data pipelines Published by dativa.com on January 17, 2018, retrieved on April 3, 2019

External links

Dull, Tamara, "The Data Lake Debate: Pro is Up First", smartdatacollective.com, March 20, 2015.

This computing article is a stub. You can help Wikipedia by expanding it.

[1] Using Redshift Spectrum to load data pipelines Published by dativa.com on January 17, 2018, retrieved on April 3, 2019

[1]