Jump to content

Draft:Apache TVM

From Wikipedia, the free encyclopedia


Apache TVM

[edit]

Apache TVM (Tensor Virtual Machine) is an open source machine learning compiler framework for optimization and execution of machine learning models across a variety of computing platforms, such as central processing units (CPUs), graphics processing units (GPUs) and specialized accelerators such as field-programmable gate arrays (FPGAs). It provides an end-to-end compilation stack that lowers high-level computational graphs from frameworks such as TensorFlow, PyTorch, and ONNX into optimized machine code, using a modular and extensible architecture.

History

[edit]

TVM originated as an academic project at the University of Washington in 2017 led by Tianqi Chen and the SAMPL group out of the Paul G. Allen School of Computer Science and Engineering. The software was formally introduced at OSDI 2018.[1] In March of 2019, the project joined the Apache Incubator.

Architecture

[edit]

TVM is composed of several major components:

Relay IR: Relay IR is a high-level functional intermediate representation (IR) to represent and transform neural networks prior to low-level optimization and code generation. Introduced as a successor to NNVM IR, Relay encodes computation graphs as abstract syntax trees (ASTs) and extends them with language features such as first-class functions, recursion, and a dependent-like type system that supports shape and tensor types. Relay represents neural networks as a program composed of nested expressions. Each operator invocation is a CallNode, and the entire graph is structured as a series of expressions and bindings. Relay uses an SSA-like structure where temporary identifiers (e.g., %1, %2) correspond to let-bound expressions.

Relay's core modules include:

  • A Python interface to enable users to interact with the compiler system. The frontend includes a Python library containing standard deep learning operators and Relay-specific functions.
import tvm
from tvm import relay

# Define a simple function using Relay
def simple_addition(x, y):
    return relay.add(x, y)

# Create Relay variables for the function
x = relay.var("x", relay.TensorType((3, 3), dtype="float32"))
y = relay.var("y", relay.TensorType((3, 3), dtype="float32"))

# Call the function
add_fn = simple_addition(x, y)
  • Relay IR also supports reverse-mode automatic differentiation by transforming functions to compute both their values and corresponding partial derivatives by employing a functional programming approach that uses dual numbers and dynamic closures for backpropagation. Each function is transformed to calculate its result alongside the partial derivatives, which are propagated upstream via references. Thusly, Relay can support higher-order functions and closures, enabling efficient differentiation even for programs with complex control flow and higher-order constructs. [2]

TE (Tensor Expression) language: A domain-specific language to define tensor computations and apply optimizations like loop transformations, memory layout adjustments, and parallel execution.

AutoTVM and Ansor: AutoTVM and Ansor are TVM's auto-tuning engines, searching for the most efficient scheduling parameters for tensor computations. AutoTVM uses a combination of machine learning and statistical cost models to explore a range of possible optimizations, evaluating the performance of different schedules across various hardware targets. Ansor, a more recent addition, builds on AutoTVM’s capabilities by leveraging search algorithms and model-based techniques to optimize the configuration space. [3][4]

BYOC (Bring Your Own Codegen): A plugin mechanism allowing hardware vendors to integrate their own code generation backends or libraries. This enables the use of hardware-specific instruction sets, specialized libraries, and custom optimization routines that are tailored to the needs of proprietary or non-standard hardware architectures.

Applications

[edit]

Apache TVM has been applied in embedded systems, data center inference workloads, and edge computing devices. Cloud providers and hardware vendors including AWS, AMD, ARM, and Qualcomm have contributed to or adopted TVM for compiling deep learning workloads to run efficiently on their hardware.[5] [6] [7] Research applications of TVM include automatic scheduling, hardware-aware neural architecture search, and integration with compiler infrastructures such as LLVM and MLIR.

See also

[edit]

References

[edit]
  1. ^ Chen, Tianqi, et al. "TVM: An Automated End-to-End Optimizing Compiler for Deep Learning." OSDI '18: Proceedings of the 13th USENIX Conference on Operating Sy
  2. ^ Roesch, Jared, et al. "Relay: A New IR for Machine Learning Frameworks." Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL '18), 2018. [1]
  3. ^ Zheng, Lianmin, et al. "Ansor: Generating High-Performance Tensor Programs for Deep Learning." OSDI 20, 2020. [2]
  4. ^ Kwon, Donggyu, et al. "Learning to Optimize Tensor Programs with a Graph-Based Approach." ICLR 2021. [3]
  5. ^ AWS Labs. "AWS Neuron SDK and Apache TVM." GitHub Repository. [4]
  6. ^ AMD. "Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm." AMD Developer Blog, 2023. [5]
  7. ^ Arm Developer. "Resources for Ethos-U." [6]