Apache Arrow

Apache Arrow
Basisdaten
Entwickler	Wes McKinney, Antoine Pitrou, Sutou Kouhei, Matt Topol
Erscheinungsjahr	17. Februar 2016
Aktuelle Version	22.0.0 ; (24. Oktober 2025)
Lizenz	Apache-Lizenz, Version 2.0
	arrow.apache.org

Apache Arrow is a language-agnostic software framework for developing applications that efficiently load and consume in-memory columnar data in a standardized manner. It also specifies a standard memory format that represents flat and hierarchical data in an optimised columnar manner for efficient analytic operations on modern CPU and GPU hardware.^[4]^[5]^[6]^[7]^[8] This reduces or eliminates factors that limit the feasibility of working with large sets of data, such as the cost, volatility, or physical constraints of dynamic random-access memory.^[9]

Interoperability

Arrow can be used with Apache Parquet, Apache Spark, NumPy, PySpark, pandas and other data processing libraries. The project provides an open source software library written in C++ with bindings for many other programming languages, e.g. Python and Java. Arrow allows for zero-copy reads and fast data access and interchange without serialisation overhead between these languages and systems.^[4]

Applications

Arrow has been used in diverse domains, including analytics,^[10] genomics,^[11]^[9] and cloud computing.^[12]

Comparison to Apache Parquet and ORC

Apache Parquet and Apache ORC are popular examples of on-disk columnar data formats. Arrow is designed as a complement to these formats for processing data in-memory.^[13] The hardware resource engineering trade-offs for in-memory processing vary from those associated with on-disk storage.^[14] The Arrow and Parquet projects includes libraries that allow for reading and writing data between the two formats.^[15]

Reception

Daniel Abadi, Darnell-Kanal Professor of Computer Science at the University of Maryland^[16] and a pioneer of column-oriented databases,^[17] reviewed Apache Arrow in March 2018.^[18] "The time is right for database systems architects to agree on and adhere to a main memory data representation standard," he concluded. "[If your] workloads are typically scanning through a few attributes of many entities, I do not see any reason not to embrace the Arrow standard."

Governance

Arrow was announced by Cloudera^[19] and donated to the Apache Software Foundation^[20] in 2016, where it has been maintained and extended since.^[20]^[21]^[8]^[22]^[23] In October 2019, the Apache Arrow team announced that it plans to split the Arrow format and library versioning starting with the planned v1.0 release.^[24]

References

Vorlage:Reflist

External links

Apache Arrow project web site
Apache Arrow GitHub project source code

Vorlage:AFC submission

↑ github.com.
↑ Origin and History of Apache Arrow. (abgerufen am 16. November 2025).
↑ Release 22.0.0. 24. Oktober 2025 (abgerufen am 11. November 2025).
↑ ^a ^b Apache Arrow and Distributed Compute with Kubernetes. 13. Dezember 2018; abgerufen im 1. Januar 1.
↑ Tony Baer: Apache Arrow: Lining Up The Ducks In A Row... Or Column. In: Seeking Alpha. 17. Februar 2016; abgerufen im 1. Januar 1.
↑ Tony Baer: Apache Arrow: The little data accelerator that could. In: ZDNet. 25. Februar 2019; abgerufen im 1. Januar 1.
↑ Susan Hall: Apache Arrow's Columnar Layouts of Data Could Accelerate Hadoop, Spark. In: The New Stack. 23. Februar 2016; abgerufen im 1. Januar 1.
↑ ^a ^b Serdar Yegulalp: Apache Arrow aims to speed access to big data. In: InfoWorld. 27. Februar 2016; abgerufen im 1. Januar 1.
↑ ^a ^b Tanveer Ahmad: ArrowSAM: In-Memory Genomics Data Processing through Apache Arrow Framework. In: bioRxiv. 2019, S. 741843, doi:10.1101/741843 (biorxiv.org).
↑ Dinsmore T.W.: In-Memory Analytics. In: Disruptive Analytics. Apress, Berkeley, CA, 2016, ISBN 978-1-4842-1312-4, In-Memory Analytics, S. 97–116, doi:10.1007/978-1-4842-1311-7_5.
↑ Versaci F, Pireddu L, Zanetti G: Scalable genomics: from raw data to aligned reads on Apache YARN. In: IEEE International Conference on Big Data. 2016, S. 1232–1241 (biorxiv.org [PDF]).
↑ Maas M, Asanović K, Kubiatowicz J: Return of the runtimes: rethinking the language runtime system for the cloud 3.0 era. In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems (ACM). 2017, S. 138–143, doi:10.1145/3102980.3103003 (acm.org [PDF]).
↑ Julien LeDem: Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory. In: KDnuggets. Abgerufen im 1. Januar 1
↑ Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation? 31. Oktober 2017; abgerufen im 1. Januar 1.
↑ PyArrow:Reading and Writing the Apache Parquet Format. Abgerufen im 1. Januar 1
↑ Daniel Abadi. In: Department of Computer Science, University of Maryland. Abgerufen im 1. Januar 1
↑ Prof. Abadi Wins VLDB 10-Year Best Paper Award. Abgerufen im 1. Januar 1
↑ An analysis of the strengths and weaknesses of Apache Arrow. 27. März 2018; abgerufen im 1. Januar 1.
↑ Introducing Apache Arrow. 18. Februar 2016; abgerufen im 1. Januar 1.
↑ ^a ^b Alexander J. Martin: Apache Foundation rushes out Apache Arrow as top-level project. In: The Register. 17. Februar 2016; abgerufen im 1. Januar 1.
↑ Big data gets a new open-source project, Apache Arrow: It offers performance improvements of more than 100x on analytical workloads, the foundation says. 17. Februar 2016; abgerufen im 1. Januar 1.
↑ Julien LeDem: The first release of Apache Arrow. In: SD Times. 28. November 2016; abgerufen im 1. Januar 1.
↑ Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow. Abgerufen im 1. Januar 1
↑ pmc: Apache Arrow 0.15.0 Release. In: Apache Arrow. 6. Oktober 2019, abgerufen am 18. Dezember 2019 (amerikanisches Englisch).

[_7be69b1a178b125b-1] thub.com.

[_54d787821077a532-2] Origin and History of Apache Arrow. (abgerufen am 16. November 2025).

[_c97f35c051a3463b-3] Release 22.0.0. 24. Oktober 2025 (abgerufen am 11. November 2025).

[xenonstack-4] Apache Arrow and Distributed Compute with Kubernetes. 13. Dezember 2018; abgerufen im 1. Januar 1.

[seekingalpha-5] Tony Baer: Apache Arrow: Lining Up The Ducks In A Row... Or Column. In: Seeking Alpha. 17. Februar 2016; abgerufen im 1. Januar 1.

[zdnet-6] Tony Baer: Apache Arrow: The little data accelerator that could. In: ZDNet. 25. Februar 2019; abgerufen im 1. Januar 1.

[7] Susan Hall: Apache Arrow's Columnar Layouts of Data Could Accelerate Hadoop, Spark. In: The New Stack. 23. Februar 2016; abgerufen im 1. Januar 1.

[infoworld-8] Serdar Yegulalp: Apache Arrow aims to speed access to big data. In: InfoWorld. 27. Februar 2016; abgerufen im 1. Januar 1.

[biorxiv-9] Tanveer Ahmad: ArrowSAM: In-Memory Genomics Data Processing through Apache Arrow Framework. In: bioRxiv. 2019, S. 741843, doi:10.1101/741843 (biorxiv.org).

[10] Dinsmore T.W.: In-Memory Analytics. In: Disruptive Analytics. Apress, Berkeley, CA, 2016, ISBN 978-1-4842-1312-4, In-Memory Analytics, S. 97–116, doi:10.1007/978-1-4842-1311-7_5.

[11] Versaci F, Pireddu L, Zanetti G: Scalable genomics: from raw data to aligned reads on Apache YARN. In: IEEE International Conference on Big Data. 2016, S. 1232–1241 (biorxiv.org [PDF]).

[12] Maas M, Asanović K, Kubiatowicz J: Return of the runtimes: rethinking the language runtime system for the cloud 3.0 era. In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems (ACM). 2017, S. 138–143, doi:10.1145/3102980.3103003 (acm.org [PDF]).

[13] Julien LeDem: Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory. In: KDnuggets. Abgerufen im 1. Januar 1

[14] Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation? 31. Oktober 2017; abgerufen im 1. Januar 1.

[15] PyArrow:Reading and Writing the Apache Parquet Format. Abgerufen im 1. Januar 1

[16] Daniel Abadi. In: Department of Computer Science, University of Maryland. Abgerufen im 1. Januar 1

[17] Prof. Abadi Wins VLDB 10-Year Best Paper Award. Abgerufen im 1. Januar 1

[18] An analysis of the strengths and weaknesses of Apache Arrow. 27. März 2018; abgerufen im 1. Januar 1.

[19] Introducing Apache Arrow. 18. Februar 2016; abgerufen im 1. Januar 1.

[reg17Feb2016-20] Alexander J. Martin: Apache Foundation rushes out Apache Arrow as top-level project. In: The Register. 17. Februar 2016; abgerufen im 1. Januar 1.

[21] Big data gets a new open-source project, Apache Arrow: It offers performance improvements of more than 100x on analytical workloads, the foundation says. 17. Februar 2016; abgerufen im 1. Januar 1.

[22] Julien LeDem: The first release of Apache Arrow. In: SD Times. 28. November 2016; abgerufen im 1. Januar 1.

[23] Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow. Abgerufen im 1. Januar 1

[24] : Apache Arrow 0.15.0 Release. In: Apache Arrow. 6. Oktober 2019, abgerufen am 18. Dezember 2019 (amerikanisches Englisch).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

Apache Arrow

Basisdaten
Entwickler	Wes McKinney, Antoine Pitrou, Sutou Kouhei, Matt Topol^[1]
Erscheinungsjahr	17. Februar 2016^[2]
Aktuelle Version	22.0.0^[3] (24. Oktober 2025)
Lizenz	Apache-Lizenz, Version 2.0
arrow.apache.org