Trino (SQL Abfrageprogramm)

Vorlage:Infobox software

Trino is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources^[1]. Trino is commonly used as a query engine over datalakes and data warehouses using the Hive and Iceberg^[2] table formats. In these configurations Trino queries can query data in open column-oriented data file formats like ORC or Parquet residing on different storage systems like HDFS, AWS S3, Google Cloud Storage, or Azure Blob Storage. Trino also has the ability to run federated queries across multiple disparate data sources such as MySQL, PostgreSQL, Cassandra, Kafka, MongoDB and Elasticsearch. Trino is community driven and released under the Apache License.

History

Trino was originally designed and developed by Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang at Facebook to allow data analysts to run interactive queries on its large data warehouse in Apache Hadoop. The project was originally named Presto and shares the first six years of development with the Presto project^[3]^[4]. Before Presto, data analysts at Facebook relied on Apache Hive, which was too slow for running interctive SQL analytics on their 250 petabyte data warehouse^[5].

Martin, Dain, David, and Eric began development in 2012 and they deployed an initial version later that year. Later, Facebook announced its release as open source late Fall of 2013^[5]^[6]. As Presto gained popularity, many well known companies, such as Netflix ^[7], AirBnB ^[8], among others, disclosed they used Presto in both on premise and cloud deployments at equivalent petabyte scales. In late 2016, Amazon released that it would provide Presto as a service called Athena ^[9].

In late 2018, a disagreement around the stewardship of Presto between the founders and Facebook formed as Facebook management pushed to have tighter control over the project. This move included giving automatic committership rights to Facebook developers without prior experience with the project. Shortly after Facebook management moved forward with these changes, the creators left the original Presto project to create a fork.^[10] This fork was also initially named Presto, so to differentiate them, users called the original project PrestoDB and the fork PrestoSQL named after their respective web addresses, https://prestodb.io and https://prestosql.io. It is worth noting that this split has striking similarities to the Jenkins and Hudson split.

In January 2019, the Trino Software Foundation (formerly Presto Software Foundation) was announced. The foundation is a not-for-profit organization dedicated to the advancement of the Trino open source distributed SQL query engine.^[11]^[12]

In September 2019, Facebook donated PrestoDB to the Linux Foundation establishing the Presto Foundation.^[13] Neither the creators of Presto, nor the top contributors and committers, were invited to join this foundation.^[14]^[10]

In December 2020, PrestoSQL was rebranded as Trino. ^[10]

Architecture

Trino is written in Java. It contains two types of nodes, a coordinator and a worker.

The coordinator is responsible for parsing, analyzing, optimizing, planning, and scheduling a query submitted by a client. The coordinator interacts with the service provider interface(SPI) to obtain the available tables, table statistics, and other information needed to carry out its tasks.

The workers are responsible for executing the tasks and operators fed to it by the scheduler. These tasks process rows from data sources and produce results that are returned to the coordinator and ultimately back to the client.

Trino strictly adheres to the ANSI SQL standard to include: SQL-92, SQL:1999, SQL:2003, SQL:2008, SQL:2011, SQL:2016. While adhering to these standards, Trino doesn't implement anything outside of the OLAP syntax.

Trino supports separation of compute and storage and may be deployed both on premises and in the cloud.

Trino has a Distributed computing MPP architecture. Trino first distributes work over multiple workers by running ad-hoc partitioning operations or relying on existing partitions in the data of the underlying data store. Once this data has reached the worker, the data is processed over pipelined operators carried out on multiple threads. Another decided characteristic of Trino was the lack of fault tolerance, which avoids the check-in operations involving expensive writes to disk. This leaves queries vulnerable to needing to be restarted if there is a failure. In practice, this is not reported to happen too often.

Use Cases

In general, Trino is to be used for OLAP scenarios instead of OLTP uses^[15].

Data Lake Query Engine

Trino was originally created to replace the Apache Hive runtime while maintaining the ability to query data in HDFS or object storage. Many companies use Trino as a query engine to speed up analytics reads from the data lake.

Federated Query Engine

Trino can combine data from multiple sources in a single query. Using the SPI, Trino connectors can query data sources, including files in HDFS, Amazon S3, MySQL, PostgreSQL, Microsoft SQL Server, Amazon Redshift, Apache Kudu, Apache Pinot, Apache Kafka, Apache Cassandra, Apache Druid, MongoDB, Elasticsearch, and Redis. Unlike Apache Impala and other prior Hadoop-specific tools, Trino works with any underlying system.

References

Vorlage:Reflist

External links

Category:SQL Category:Free system software Category:Hadoop Category:Cloud platforms Category:Java platform

↑ Overview — Trino 361 Documentation. In: trino.io. Abgerufen am 20. September 2021.
↑ About - Apache Iceberg. In: iceberg.apache.org. Abgerufen am 18. September 2021.
↑ Contributors to trinodb/trino. In: GitHub. Abgerufen am 20. September 2021 (englisch).
↑ Contributors to prestodb/presto. In: GitHub. Abgerufen am 20. September 2021 (englisch).
↑ ^a ^b Joab Jackson: Facebook goes open source with query engine for big data In: Computer World, November 6, 2013. Abgerufen im April 26, 2017
↑ Jordan Novet: Facebook unveils Presto engine for querying 250 PB data warehouse In: Giga Om, June 6, 2013. Abgerufen im April 26, 2017
↑ Using Presto in our Big Data Platform on AWS In: Netflix technical blog, October 7, 2014. Abgerufen im April 26, 2017
↑ Airpal: a Web UI for PrestoDB. In: Medium. 4. April 2016, abgerufen am 20. September 2021 (englisch).
↑ AWS Launches Amazon Athena | Amazon.com, Inc. - Press Room. In: press.aboutamazon.com. Abgerufen am 20. September 2021 (englisch).
↑ ^a ^b ^c Martin Traverso, Dain Sundstrom, David Phillips: We’re rebranding PrestoSQL as Trino. In: trino.io. 27. Dezember 2020, abgerufen am 7. September 2021 (englisch).
↑ Presto Software Foundation Launches to Advance Presto Open Source Community. In: PRWeb. Abgerufen am 1. Februar 2019.
↑ Presto's New Foundation Signals Growth for the Big Data SQL Engine. In: The New Stack. 31. Januar 2019, abgerufen am 1. Februar 2019 (amerikanisches Englisch).
↑ Facebook, Uber, Twitter and Alibaba form Presto Foundation to Tackle Distributed Data Processing at Scale. Abgerufen am 12. November 2019.
↑ What's the relationship between prestosql and prestodb?, 22. November 2019
↑ Use cases — Trino 361 Documentation. In: trino.io. Abgerufen am 20. September 2021.

[1] Overview — Trino 361 Documentation. In: trino.io. Abgerufen am 20. September 2021.

[iceberg-2] About - Apache Iceberg. In: iceberg.apache.org. Abgerufen am 18. September 2021.

[3] Contributors to trinodb/trino. In: GitHub. Abgerufen am 20. September 2021 (englisch).

[4] Contributors to prestodb/presto. In: GitHub. Abgerufen am 20. September 2021 (englisch).

[2013facebook-5] Joab Jackson: Facebook goes open source with query engine for big data In: Computer World, November 6, 2013. Abgerufen im April 26, 2017

[2013facebook2-6] Jordan Novet: Facebook unveils Presto engine for querying 250 PB data warehouse In: Giga Om, June 6, 2013. Abgerufen im April 26, 2017

[7] Using Presto in our Big Data Platform on AWS In: Netflix technical blog, October 7, 2014. Abgerufen im April 26, 2017

[8] Airpal: a Web UI for PrestoDB. In: Medium. 4. April 2016, abgerufen am 20. September 2021 (englisch).

[9] AWS Launches Amazon Athena | Amazon.com, Inc. - Press Room. In: press.aboutamazon.com. Abgerufen am 20. September 2021 (englisch).

[2020rename-10] Martin Traverso, Dain Sundstrom, David Phillips: We’re rebranding PrestoSQL as Trino. In: trino.io. 27. Dezember 2020, abgerufen am 7. September 2021 (englisch).

[2019psf-11] Presto Software Foundation Launches to Advance Presto Open Source Community. In: PRWeb. Abgerufen am 1. Februar 2019.

[2019psf2-12] Presto's New Foundation Signals Growth for the Big Data SQL Engine. In: The New Stack. 31. Januar 2019, abgerufen am 1. Februar 2019 (amerikanisches Englisch).

[13] Facebook, Uber, Twitter and Alibaba form Presto Foundation to Tackle Distributed Data Processing at Scale. Abgerufen am 12. November 2019.

[14] What's the relationship between prestosql and prestodb?, 22. November 2019

[15] Use cases — Trino 361 Documentation. In: trino.io. Abgerufen am 20. September 2021.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]