Trino (SQL query engine)

Trino
Trino
	Trino UI Version 358
Original author(s)	Martin Traverso, Dain Sundstrom, David Phillips, Eric Hwang
Initial release	10 November 2013; 11 years ago
Repository	Trino Repository
Written in	Java
Operating system	Cross-platform
Standard(s)	ANSI SQL, JDBC
Type	Data Warehouse
License	Apache License 2.0
Website	trino.io

Trino is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources^[1]. Trino is commonly used as a query engine over datalakes and data warehouses using the Hive and Iceberg^[2] table formats. In these configurations Trino queries can query data in open column-oriented data file formats like ORC or Parquet residing on different storage systems like HDFS, AWS S3, Google Cloud Storage, or Azure Blob Storage. Trino also has the ability to run federated queries across multiple disparate data sources such as MySQL, PostgreSQL, Cassandra, Kafka, MongoDB and Elasticsearch. Trino is community driven and released under the Apache License.

History

Trino was originally designed and developed by Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang at Facebook to allow data analysts to run interactive queries on its large data warehouse in Apache Hadoop. The project was originally named Presto and shares the first six years of development with the Presto project^[3]^[4]. Before Presto, data analysts at Facebook relied on Apache Hive, which was too slow for running interctive SQL analytics on their 250 petabyte data warehouse^[5].

Martin, Dain, David, and Eric began development in 2012 and they deployed an initial version later that year. Later, Facebook announced its release as open source late Fall of 2013^[5]^[6]. As Presto gained popularity, many well known companies, such as Netflix^[7], AirBnB^[8], among others, disclosed they used Presto in both on premise and cloud deployments at equivalent petabyte scales. In late 2016, Amazon released that it would provide Presto as a service called Athena^[9].

In late 2018, a disagreement around the stewardship of Presto between the founders and Facebook formed as Facebook management pushed to have tighter control over the project. This move included giving automatic committership rights to Facebook developers without prior experience with the project. Shortly after Facebook management moved forward with these changes, the creators left the original Presto project to create a fork.^[10] This fork was also initially named Presto, so to differentiate them, users called the original project PrestoDB and the fork PrestoSQL named after their respective web addresses, https://prestodb.io and https://prestosql.io. It is worth noting that this split has striking similarities to the Jenkins and Hudson split.

In January 2019, the Trino Software Foundation (formerly Presto Software Foundation) was announced. The foundation is a not-for-profit organization dedicated to the advancement of the Trino open source distributed SQL query engine.^[11]^[12]

In September 2019, Facebook donated PrestoDB to the Linux Foundation establishing the Presto Foundation.^[13] Neither the creators of Presto, nor the top contributors and committers, were invited to join this foundation.^[14]^[10]

In December 2020, PrestoSQL was rebranded as Trino.^[10]

Architecture

Trino is written in Java. It contains two types of nodes, a coordinator and a worker.

The coordinator is responsible for parsing, analyzing, optimizing, planning, and scheduling a query submitted by a client. The coordinator interacts with the service provider interface(SPI) to obtain the available tables, obtain table statistics, check permissions, and other information needed to carry out its tasks.

The workers are responsible for executing the tasks and operators fed to it by the scheduler. These tasks process rows from data sources and produce results that are returned to the coordinator and ultimately back to the client.

Trino attempts to follow the ANSI SQL standard as closely as possible to include: SQL-92, SQL:1999, SQL:2003, SQL:2008, SQL:2011, SQL:2016. Trino favors implementing SQL features more relevant to OLAP over OLTP.

Trino supports separation of compute and storage and may be deployed both on premises and in the cloud.

Trino has a distributed MPP architecture, which was a big departure from the map reduce design used by most popular data lake systems like Hive, Impala, and Apache Spark. Trino first distributes work over multiple workers by running ad-hoc partitioning operations or relying on existing partitions in the data of the underlying data store. Once this data has reached the worker, the data is processed over pipelined operators carried out on multiple threads. Another decided characteristic of Trino was avoiding the checkpointing operations involving expensive writes, used by systems like Hive and Spark. This leaves queries vulnerable to needing to be restarted if there is a failure. In practice, this is not reported to happen too often.

Use Cases

In general, Trino is used for OLAP scenarios instead of OLTP uses^[15].

Data Lake Query Engine

Trino was originally created to replace the Apache Hive runtime while maintaining the ability to query data in HDFS or object storage. Many companies use Trino as a query engine to speed up analytics reads from the data lake.

Federated Query Engine

Trino can combine data from multiple sources in a single query. Using the SPI, Trino connectors can query data sources, including files in HDFS, Amazon S3, MySQL, PostgreSQL, Microsoft SQL Server, Amazon Redshift, Apache Kudu, Apache Pinot, Apache Kafka, Apache Cassandra, Apache Druid, MongoDB, Elasticsearch, and Redis. Unlike Apache Impala and other prior Hadoop-specific tools, Trino can work with any underlying system.

References

^ "Overview — Trino 361 Documentation". trino.io. Retrieved 20 September 2021.
^ "About - Apache Iceberg". iceberg.apache.org. Retrieved 18 September 2021.
^ "Contributors to trinodb/trino". GitHub. Retrieved 20 September 2021.
^ "Contributors to prestodb/presto". GitHub. Retrieved 20 September 2021.
^ ^a ^b Joab Jackson (November 6, 2013). "Facebook goes open source with query engine for big data". Computer World. Retrieved April 26, 2017.
^ Jordan Novet (June 6, 2013). "Facebook unveils Presto engine for querying 250 PB data warehouse". Giga Om. Retrieved April 26, 2017.
^ "Using Presto in our Big Data Platform on AWS". Netflix technical blog. October 7, 2014. Retrieved April 26, 2017. {{cite news}}: Unknown parameter |authors= ignored (help)
^ "Airpal: a Web UI for PrestoDB". Medium. 4 April 2016. Retrieved 20 September 2021.
^ "AWS Launches Amazon Athena | Amazon.com, Inc. - Press Room". press.aboutamazon.com. Retrieved 20 September 2021.
^ ^a ^b ^c Traverso, Martin; Sundstrom, Dain; Phillips, David (27 December 2020). "We're rebranding PrestoSQL as Trino". trino.io. Retrieved 7 September 2021.
^ "Presto Software Foundation Launches to Advance Presto Open Source Community". PRWeb. Retrieved 2019-02-01.
^ "Presto's New Foundation Signals Growth for the Big Data SQL Engine". The New Stack. 2019-01-31. Retrieved 2019-02-01.
^ "Facebook, Uber, Twitter and Alibaba form Presto Foundation to Tackle Distributed Data Processing at Scale". Retrieved 2019-11-12.
^ "What's the relationship between prestosql and prestodb?". 2019-11-22.
^ "Use cases — Trino 361 Documentation". trino.io. Retrieved 20 September 2021.

External links

Category:SQL Category:Free system software Category:Hadoop Category:Cloud platforms Category:Java platform

[1] "Overview — Trino 361 Documentation". trino.io. Retrieved 20 September 2021.

[iceberg-2] "About - Apache Iceberg". iceberg.apache.org. Retrieved 18 September 2021.

[3] "Contributors to trinodb/trino". GitHub. Retrieved 20 September 2021.

[4] "Contributors to prestodb/presto". GitHub. Retrieved 20 September 2021.

[2013facebook-5] Joab Jackson (November 6, 2013). "Facebook goes open source with query engine for big data". Computer World. Retrieved April 26, 2017.

[2013facebook2-6] Jordan Novet (June 6, 2013). "Facebook unveils Presto engine for querying 250 PB data warehouse". Giga Om. Retrieved April 26, 2017.

[7] "Using Presto in our Big Data Platform on AWS". Netflix technical blog. October 7, 2014. Retrieved April 26, 2017. {{cite news}}: Unknown parameter |authors= ignored (help)

[8] "Airpal: a Web UI for PrestoDB". Medium. 4 April 2016. Retrieved 20 September 2021.

[9] "AWS Launches Amazon Athena | Amazon.com, Inc. - Press Room". press.aboutamazon.com. Retrieved 20 September 2021.

[2020rename-10] Traverso, Martin; Sundstrom, Dain; Phillips, David (27 December 2020). "We're rebranding PrestoSQL as Trino". trino.io. Retrieved 7 September 2021.

[2019psf-11] "Presto Software Foundation Launches to Advance Presto Open Source Community". PRWeb. Retrieved 2019-02-01.

[2019psf2-12] "Presto's New Foundation Signals Growth for the Big Data SQL Engine". The New Stack. 2019-01-31. Retrieved 2019-02-01.

[13] "Facebook, Uber, Twitter and Alibaba form Presto Foundation to Tackle Distributed Data Processing at Scale". Retrieved 2019-11-12.

[14] "What's the relationship between prestosql and prestodb?". 2019-11-22.

[15] "Use cases — Trino 361 Documentation". trino.io. Retrieved 20 September 2021.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]