아파치 애로

아파치 애로
개발자	아파치 소프트웨어 재단
발표일	2016년 10월 10일(8년 전)
안정화 버전	20.0.0 / 2025년 4월 27일(3개월 전)
저장소	github.com/apache/arrow
프로그래밍 언어	C, C++, C#, Go, Java, 자바스크립트, 매트랩, Python, R, Ruby, Rust
종류	데이터 형식, 알고리즘
라이선스	아파치 라이선스 2.0
웹사이트	arrow.apache.org

아파치 애로(Apache Arrow)는 언어 불가지론적 소프트웨어 프레임워크로, 컬럼형 데이터를 처리하는 데이터 분석 애플리케이션을 개발하는 데 사용된다. 여기에는 현대 CPU 및 GPU 하드웨어에서 효율적인 분석 작업을 위해 플랫 및 계층적 데이터를 나타낼 수 있는 표준화된 컬럼 지향 메모리 형식이 포함되어 있다.^[2]^[3]^[4]^[5]^[6] 이를 통해 동적 램의 비용, 변동성 또는 물리적 제약과 같이 대규모 데이터 세트 작업의 실현 가능성을 제한하는 요소가 줄어들거나 제거된다.^[7]

상호 운용성

애로는 아파치 파케이, 아파치 스파크, NumPy, PySpark, Pandas 및 기타 데이터 처리 라이브러리와 함께 사용할 수 있다. 이 프로젝트에는 C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python(PyArrow^[8]), R, Ruby, Rust로 작성된 네이티브 소프트웨어 라이브러리가 포함되어 있다. 애로는 이러한 언어와 시스템 간에 직렬화 오버헤드 없이 제로 복사 읽기 및 빠른 데이터 액세스 및 교환을 허용한다.^[2]

애플리케이션

애로는 분석,^[9] 유전체학,^[10]^[7] 및 클라우드 컴퓨팅을 포함한 다양한 영역에서 사용되었다.^[11]

아파치 파케이 및 ORC와의 비교

아파치 파케이와 아파치 ORC는 온디스크 컬럼형 데이터 형식의 인기 있는 예시이다. 애로는 메모리 내 데이터 처리를 위해 이러한 형식을 보완하도록 설계되었다.^[12] 메모리 내 처리를 위한 하드웨어 리소스 엔지니어링의 장단점은 온디스크 저장소와 관련된 것과는 다르다.^[13] 애로 및 파케이 프로젝트에는 두 형식 간에 데이터를 읽고 쓰는 라이브러리가 포함되어 있다.^[14]

거버넌스

아파치 애로는 아파치 소프트웨어 재단에 의해 2016년 2월 17일에 발표되었으며,^[15] 다른 오픈 소스 데이터 분석 프로젝트 개발자들의 연합에 의해 개발이 주도되었다.^[16]^[17]^[6]^[18]^[19] 초기 코드베이스와 자바 라이브러리는 아파치 드릴의 코드를 시드하여 만들어졌다.^[15]

각주

↑ “Release Apache Arrow 20.0.0”. 2025년 4월 27일. 2025년 5월 7일에 확인함.
↑ ^가 ^나 “Apache Arrow and Distributed Compute with Kubernetes”. 2018년 12월 13일.
↑ Baer, Tony (2016년 2월 17일). “Apache Arrow: Lining Up The Ducks In A Row... Or Column”. 《Seeking Alpha》.
↑ Baer, Tony (2019년 2월 25일). “Apache Arrow: The little data accelerator that could”. 《ZDNet》.
↑ Hall, Susan (2016년 2월 23일). “Apache Arrow's Columnar Layouts of Data Could Accelerate Hadoop, Spark”. 《The New Stack》.
↑ ^가 ^나 Yegulalp, Serdar (2016년 2월 27일). “Apache Arrow aims to speed access to big data”. 《인포월드》.
↑ ^가 ^나 Tanveer Ahmad (2019). 《ArrowSAM: In-Memory Genomics Data Processing through Apache Arrow Framework》. 《BioRxiv》. 741843쪽. doi:10.1101/741843.
↑ “Python — Apache Arrow v20.0.0”.
↑ Dinsmore T.W. (2016). 〈In-Memory Analytics: Satisfying the Need for Speed〉. 《Disruptive Analytics》. Apress, Berkeley, CA. 97–116쪽. doi:10.1007/978-1-4842-1311-7_5. ISBN 978-1-4842-1312-4.
↑ Versaci F, Pireddu L, Zanetti G (2016). 《Scalable genomics: from raw data to aligned reads on Apache YARN》 (PDF). 《IEEE International Conference on Big Data》. 1232–1241쪽.
↑ Maas M, Asanović K, Kubiatowicz J (2017). 〈Return of the Runtimes: Rethinking the Language Runtime System for the Cloud 3.0 Era〉. 《Proceedings of the 16th Workshop on Hot Topics in Operating Systems》. 138–143쪽. doi:10.1145/3102980.3103003. ISBN 978-1-4503-5068-6.
↑ Le Dem, Julien. “Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory”. 《KDnuggets》.
↑ “Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation?”. 2017년 10월 31일.
↑ “PyArrow:Reading and Writing the Apache Parquet Format”.
↑ ^가 ^나 “The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project”. 《The Apache Software Foundation Blog》. 2016년 2월 17일. 2016년 3월 13일에 원본 문서에서 보존된 문서.
↑ Martin, Alexander J. (2016년 2월 17일). “Apache Foundation rushes out Apache Arrow as top-level project”. 《더 레지스터》.
↑ “Big data gets a new open-source project, Apache Arrow: It offers performance improvements of more than 100x on analytical workloads, the foundation says.”. 2016년 2월 17일. 2016년 7월 27일에 원본 문서에서 보존된 문서. 2018년 1월 31일에 확인함.
↑ Le Dem, Julien (2016년 11월 28일). “The first release of Apache Arrow”. 《SD Times》.
↑ “Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow.”.

외부 링크

Apache Arrow 프로젝트 웹사이트
Apache Arrow GitHub 프로젝트 소스 코드

[wikidata-ae8cc9de51e34dc1f639b8cb4a3929b77d2068f3-v3-1] “Release Apache Arrow 20.0.0”. 2025년 4월 27일. 2025년 5월 7일에 확인함.

[xenonstack-2] 가 ^나 “Apache Arrow and Distributed Compute with Kubernetes”. 2018년 12월 13일.

[seekingalpha-3] Baer, Tony (2016년 2월 17일). “Apache Arrow: Lining Up The Ducks In A Row... Or Column”. 《Seeking Alpha》.

[zdnet-4] Baer, Tony (2019년 2월 25일). “Apache Arrow: The little data accelerator that could”. 《ZDNet》.

[5] Hall, Susan (2016년 2월 23일). “Apache Arrow's Columnar Layouts of Data Could Accelerate Hadoop, Spark”. 《The New Stack》.

[infoworld-6] 가 ^나 Yegulalp, Serdar (2016년 2월 27일). “Apache Arrow aims to speed access to big data”. 《인포월드》.

[biorxiv-7] 가 ^나 Tanveer Ahmad (2019). 《ArrowSAM: In-Memory Genomics Data Processing through Apache Arrow Framework》. 《BioRxiv》. 741843쪽. doi:10.1101/741843.

[8] “Python — Apache Arrow v20.0.0”.

[9] Dinsmore T.W. (2016). 〈In-Memory Analytics: Satisfying the Need for Speed〉. 《Disruptive Analytics》. Apress, Berkeley, CA. 97–116쪽. doi:10.1007/978-1-4842-1311-7_5. ISBN 978-1-4842-1312-4.

[10] Versaci F, Pireddu L, Zanetti G (2016). 《Scalable genomics: from raw data to aligned reads on Apache YARN》 (PDF). 《IEEE International Conference on Big Data》. 1232–1241쪽.

[11] Maas M, Asanović K, Kubiatowicz J (2017). 〈Return of the Runtimes: Rethinking the Language Runtime System for the Cloud 3.0 Era〉. 《Proceedings of the 16th Workshop on Hot Topics in Operating Systems》. 138–143쪽. doi:10.1145/3102980.3103003. ISBN 978-1-4503-5068-6.

[12] Le Dem, Julien. “Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory”. 《KDnuggets》.

[13] “Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation?”. 2017년 10월 31일.

[14] “PyArrow:Reading and Writing the Apache Parquet Format”.

[:0-15] 가 ^나 “The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project”. 《The Apache Software Foundation Blog》. 2016년 2월 17일. 2016년 3월 13일에 원본 문서에서 보존된 문서.

[reg17Feb2016-16] Martin, Alexander J. (2016년 2월 17일). “Apache Foundation rushes out Apache Arrow as top-level project”. 《더 레지스터》.

[17] “Big data gets a new open-source project, Apache Arrow: It offers performance improvements of more than 100x on analytical workloads, the foundation says.”. 2016년 2월 17일. 2016년 7월 27일에 원본 문서에서 보존된 문서. 2018년 1월 31일에 확인함.

[18] Le Dem, Julien (2016년 11월 28일). “The first release of Apache Arrow”. 《SD Times》.

[19] “Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow.”.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

v t e 아파치 소프트웨어 재단
최상위 프로젝트	HTTP 서버 Abdera Accumulo ACE ActiveMQ 에어플로 앤트 APR Archiva Aries 아브로 액시스 액시스2 Buildr BVal 캐멀 카산드라 Cayenne Chemistry 클라우드스택 클릭 Cocoon 커먼즈 Continuum 카우치DB Creadur CXF 더비 Deltacloud 디렉터리 드릴 드루이드 Empire-db 펠릭스 Forrest 플룸 프리메이커 제로니모 검프 하둡 HBase 하이브 임팔라 HttpComponents 잭래빗 제임스 제나 제이미터 jUDDI 카프카 쿠두 Lenya 로깅 루씬 루시 머하웃 메이븐 MINA MRUnit 마이페이스 나이파이 너치 ODE OODT OFBiz OpenEJB OpenJPA OpenNLP OpenWebBeans PDFBox mod_perl 피그 피봇 POI Portals Qpid 레이브 리버 롤러 Santuario ServiceMix Shindig Shiro 슬링 솔 스파크 스톰 스팸어쌔신 STDCXX 스쿱 스트럿츠 서브버전 시냅스 Tapestry Tcl 스리프트 Tika Tiles 톰캣 트래픽서버 터빈 Tuscany UIMA 벨로시티 웹 서비스 Whirr Wicket 잴런 서세스 XML빈즈 XML 그래픽스 주키퍼 Juneau
커먼즈 프로젝트	Attributes BCEL BeanUtils Betwixt BSF 체인 CLI 코덱 Collections 컴프레스 Configuration CSV 데몬 DBCP DBUtils 다이제스터 디스커버리 EL 이메일 Exec FileUpload Functor 이미징 IO JCI JCS 젤리 Jxel JXPath Lang 런처 로깅 매스 모델러 넷 OGNL 풀 Primitives 프록시 SCXML 트랜잭션 Validator VFS
Apache Attic (종료된 프로젝트)	아발론 AxKit 비하이브 크림슨 Excalibur 하모니 하이브마인드 iBATIS 자카르타 Jakarta Cactus 자카르타 ECS 자카르타 ORO 자카르타 Regexp 자카르타 슬라이드 Jakarta Taglibs OJB Quetzalcoatl Shale Xang Xindice
라이선스: 아파치 라이선스 홈페이지: www.apache.org