llama.cpp

GGUF
파일 확장자	.gguf
매직 넘버	0x47 0x47 0x55 0x46
개발	게오르기 게르가노프 및 커뮤니티
발표일	2023년 8월 22일(20개월 전)
최신 버전	v3
포맷 종류	기계 학습 텐서

llama.cpp
원저자	게오르기 게르가노프
개발자	게오르기 게르가노프 및 커뮤니티
발표일	2023년 3월 10일(2년 전)
저장소	github.com/ggerganov/llama.cpp
프로그래밍 언어	C++, C
종류	대형 언어 모델용 라이브러리
라이선스	MIT 라이선스

llama.cpp는 LLaMA와 같은 다양한 대형 언어 모델에 대해 추론을 수행하는 오픈 소스 소프트웨어 라이브러리이다.^[3] 범용 텐서 라이브러리인 GGML 프로젝트와 함께 공동 개발되었다.^[4]

라이브러리에는 간단한 웹 인터페이스가 있는 서버와 함께 명령줄 도구가 포함되어 있다.^[5]^[6]^[7]

GGUF 파일 포맷

GGUF(GGML Universal File)^[10] 파일 형식은 텐서와 메타데이터를 하나의 파일에 저장하는 바이너리 형식으로, 모델 데이터를 빠르게 저장하고 불러올 수 있도록 설계되었다.^[11] 다른 모델 아키텍처에 대한 지원이 추가됨에 따라 이전 버전과의 호환성을 더 잘 유지하기 위해 2023년 8월 llama.cpp 프로젝트에서 도입되었다.^[12]^[13] GGML과 같이 프로젝트에서 사용했던 이전 형식을 계승했다.

GGUF 파일은 일반적으로 PyTorch와 같은 다른 기계 학습 라이브러리로 개발된 모델을 변환하여 생성된다.^[11]

지원 모델

각주

↑ “Initial release · ggerganov/llama.cpp@26c0846”. 《GitHub》 (영어). 2024년 5월 15일에 확인함.
↑ “llama.cpp/LICENSE at master · ggerganov/llama.cpp”. 《GitHub》 (영어).
↑ Connatser, Matthew. “How this open source LLM chatbot runner hit the gas on x86, Arm CPUs”. 《theregister.com》. 2024년 4월 15일에 확인함.
↑ Gerganov, Georgi (2024년 5월 17일). “ggerganov/ggml”. 《GitHub》.
↑ Mann, Tobias (2024년 7월 14일). “Honey, I shrunk the LLM! A beginner's guide to quantization – and testing it”. 《theregister》.
↑ Alden, Daroc. “Portable LLMs with llamafile [LWN.net]”. 《lwn.net》. 2024년 7월 30일에 확인함.
↑ Mann, Tobias (2024년 12월 15일). “Intro to speculative decoding: Cheat codes for faster LLMs”. 《theregister》 (영어).
↑ “GGUF by ggerganov · Pull Request #2398 · ggerganov/llama.cpp”. 《GitHub》 (영어).
↑ “ggml/docs/gguf.md at master · ggerganov/ggml”. 《GitHub》 (영어).
↑ “ggerganov/llama.cpp/gguf-py/README.md”. 《GitHub》. 2024년 11월 10일에 확인함.
↑ ^가 ^나 “GGUF”. 《huggingface.co》. 2024년 5월 9일에 확인함.
↑ Rajput, Saurabhsingh; Sharma, Tushar (2024년 6월 4일). 〈Benchmarking Emerging Deep Learning Quantization Methods for Energy Efficiency〉. 《2024 IEEE 21st International Conference on Software Architecture Companion (ICSA-C)》. 238–242쪽. doi:10.1109/ICSA-C63560.2024.00049. ISBN 979-8-3503-6625-9.
↑ Mucci, Tim (2024년 7월 3일). “GGUF versus GGML”. 《www.ibm.com》 (미국 영어). 2024년 7월 26일에 확인함.

[githubrelease-1] “Initial release · ggerganov/llama.cpp@26c0846”. 《GitHub》 (영어). 2024년 5월 15일에 확인함.

[license-2] “llama.cpp/LICENSE at master · ggerganov/llama.cpp”. 《GitHub》 (영어).

[register-llamafile-3] Connatser, Matthew. “How this open source LLM chatbot runner hit the gas on x86, Arm CPUs”. 《theregister.com》. 2024년 4월 15일에 확인함.

[ggml-4] Gerganov, Georgi (2024년 5월 17일). “ggerganov/ggml”. 《GitHub》.

[theregister_14_Jul_2024-5] Mann, Tobias (2024년 7월 14일). “Honey, I shrunk the LLM! A beginner's guide to quantization – and testing it”. 《theregister》.

[lwn-6] Alden, Daroc. “Portable LLMs with llamafile [LWN.net]”. 《lwn.net》. 2024년 7월 30일에 확인함.

[theregister_15_December_2024-7] Mann, Tobias (2024년 12월 15일). “Intro to speculative decoding: Cheat codes for faster LLMs”. 《theregister》 (영어).

[githubgguf-8] “GGUF by ggerganov · Pull Request #2398 · ggerganov/llama.cpp”. 《GitHub》 (영어).

[ggufdoc-9] “ggml/docs/gguf.md at master · ggerganov/ggml”. 《GitHub》 (영어).

[gguf-py-10] “ggerganov/llama.cpp/gguf-py/README.md”. 《GitHub》. 2024년 11월 10일에 확인함.

[huggingface-11] 가 ^나 “GGUF”. 《huggingface.co》. 2024년 5월 9일에 확인함.

[Rajput-12] Rajput, Saurabhsingh; Sharma, Tushar (2024년 6월 4일). 〈Benchmarking Emerging Deep Learning Quantization Methods for Energy Efficiency〉. 《2024 IEEE 21st International Conference on Software Architecture Companion (ICSA-C)》. 238–242쪽. doi:10.1109/ICSA-C63560.2024.00049. ISBN 979-8-3503-6625-9.

[ibm-gguf-vs-ggml-13] Mucci, Tim (2024년 7월 3일). “GGUF versus GGML”. 《www.ibm.com》 (미국 영어). 2024년 7월 26일에 확인함.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]