Interactive Benchmark

AAD Tools Comparison

Comparing tape-based and source-transform AAD implementations for Monte Carlo Greeks

Performance Comparison

Trades
101001K
Scenarios
10K100K500K

Greeks Computation Overhead

How much longer does Greeks computation take vs price-only? Lower is better.

Library Forward Overhead Reverse Pass Total Overhead
Enzyme-AD ~1.0x +3.0s 1.9x
CoDiPack ~1.0x +2.9s 1.5x
Adept ~1.0x +24.4s 5.2x
CppAD* ~1.0x +11.6s 5.3x
autodiff* ~1.1x +15.0s 90x

Greeks Overhead = (Greeks Time / Price-Only Time). This measures how much longer it takes to compute sensitivities (Delta, Rho, Vega) compared to just computing the price. Lower is better. AADC achieves low overhead through kernel recording and native SIMD optimizations. *CppAD and autodiff timed out at larger scales (100K+ scenarios).

Why Integration Matters

Performance is only part of the story for large codebases

Enzyme-AD

  • Requires LLVM/Clang — most quant systems use MSVC/GCC
  • Cannot differentiate precompiled libraries
  • Build system overhaul required

Tape-based (CoDiPack, CppAD, Adept, autodiff)

  • Type replacement throughout codebase (doubleAD<double>)
  • Memory scales with computation graph size
  • Compilation time explosion with templates

AADC Different approach

  • No type changes — works with existing double
  • Compiler agnostic (GCC, Clang, MSVC)
  • Records at runtime — handles external dependencies
  • <1% code changes for integration

Related Resources

Benchmark Environment

All benchmarks executed on enterprise-grade server hardware

System Configuration

CPU2x Intel Xeon Platinum 8280L @ 2.70GHz
Cores56 physical (28 per socket), 112 threads
Architecturex86_64, Cascade Lake
L3 Cache77 MiB (38.5 MiB per socket)
RAM283 GB DDR4
OSLinux kernel 6.1.0-13-amd64 (Debian)
CPU Features: AVX-512, AVX2, FMA, AES-NI

Test Configuration

ModelAsian Option Monte Carlo
DynamicsGeometric Brownian Motion (GBM)
Timesteps252 (daily over 1 year)
GreeksDelta, Rho, Vega (3 sensitivities)
Threads8 (configurable)
SIMDAVX2 (4 doubles/instruction)
Note: AVX-512 (8 doubles/instruction) provides ~1.7x additional speedup on supported hardware

Compilers & Versions

GCC12.2.0 (Debian)
Clang14.0.6 (Debian)
Python3.11.2
NumPy1.26.x
AADC2.0.0
C++ compiled with -O3 -march=native -std=c++17