This joint whitepaper from Intel and MatLogica provides an in-depth technical analysis of AADC’s performance characteristics on Intel Xeon Scalable processors. It covers the full pipeline from DAG extraction through JIT compilation to kernel execution, with detailed benchmark results on representative financial workloads.
DAG Extraction and Compilation
The whitepaper explains how AADC’s type system captures the mathematical operations performed by a model (whether written in Python or C++) and constructs a Directed Acyclic Graph representing the pure computation. The JIT compiler then transforms this DAG into optimized native code, applying SIMD vectorization (AVX2/AVX-512), loop unrolling, constant propagation, and register allocation tuned for the target Xeon microarchitecture.
Benchmark Methodology
The benchmarks use standard quantitative finance workloads: Monte Carlo option pricing, portfolio risk aggregation, and sensitivity computation via adjoint mode. Each benchmark is run on multiple Xeon generations to characterize how AADC leverages wider SIMD lanes and improved memory subsystems. The whitepaper provides reproducible configurations and methodology for independent verification.
Results
The results demonstrate that AADC-compiled kernels achieve throughput within 10-20% of the theoretical FLOP ceiling for the given instruction set. That level of hardware utilization is extremely difficult to achieve with hand-written code and essentially impossible with interpreted Python. The whitepaper quantifies the speedup over NumPy, hand-optimized C++, and competing AD tools.
Read the full whitepaper (PDF)
Published by MatLogica. Implemented using AADC, a commercial adjoint AD compiler (matlogica.com).