AAD Tools Comparison

Greeks Overhead Analysis

Greeks Computation Overhead

How much longer does Greeks computation take vs price-only? Lower is better.

Library	Forward Overhead	Reverse Pass	Total Overhead
AADC C++	~1.0x	+0.3s	1.5x
AADC Python	~1.0x	+0.2s	1.5x
Enzyme-AD	~1.0x	+3.0s	1.9x
CoDiPack	~1.0x	+2.9s	1.5x
Adept	~1.0x	+24.4s	5.2x
CppAD*	~1.0x	+11.6s	5.3x
autodiff*	~1.1x	+15.0s	90x

Greeks Overhead = (Greeks Time / Price-Only Time). This measures how much longer it takes to compute sensitivities (Delta, Rho, Vega) compared to just computing the price. Lower is better. AADC achieves low overhead through kernel recording and native SIMD optimizations. *CppAD and autodiff timed out at larger scales (100K+ scenarios).

Production Reality

Why Integration Matters

Performance is only part of the story for large codebases

Enzyme-AD

Requires LLVM/Clang — most quant systems use MSVC/GCC
Cannot differentiate precompiled libraries
Build system overhaul required

Tape-based (CoDiPack, CppAD, Adept, autodiff)

Type replacement throughout codebase (double → AD<double>)
Memory scales with computation graph size
Compilation time explosion with templates

 AADC Different approach No type changes — works with existing double
Compiler agnostic (GCC, Clang, MSVC)
Records at runtime — handles external dependencies
<1% code changes for integration
 

Learn More

Related Resources

How AADC Works

AADC extracts a computational graph (DAG) from existing analytics at runtime and compiles it to optimized machine code, achieving 100-1000× for Python and 6-100× for C++ with automatic derivatives.

Learn More

AADC Toolkit

Combine ease of integration with high performance. A toolkit that delivers 6-1000x speedups and automatic derivatives with <1% code changes. Focus on your models—AADC handles performance and AAD automatically.

Learn More

AADC Scalability (C++)

AADC C++ achieves near-linear scaling across threads (28.5x at 32 threads), trades (100x for 10-1000), and scenarios (50x for 10K-500K). Memory scales only with scenarios (~1.9 MB per 1K). Predictable performance enables accurate capacity planning.

Learn More

Test Environment

Benchmark Environment

All benchmarks executed on enterprise-grade server hardware

System Configuration

CPU	2x Intel Xeon Platinum 8280L @ 2.70GHz
Cores	56 physical (28 per socket), 112 threads
Architecture	x86_64, Cascade Lake
L3 Cache	77 MiB (38.5 MiB per socket)
RAM	283 GB DDR4
OS	Linux kernel 6.1.0-13-amd64 (Debian)

CPU Features: AVX-512, AVX2, FMA, AES-NI

Test Configuration

Model	Asian Option Monte Carlo
Dynamics	Geometric Brownian Motion (GBM)
Timesteps	252 (daily over 1 year)
Greeks	Delta, Rho, Vega (3 sensitivities)
Threads	8 (configurable)
SIMD	AVX2 (4 doubles/instruction)

Note: AVX-512 (8 doubles/instruction) provides ~1.7x additional speedup on supported hardware

Compilers & Versions

GCC	12.2.0 (Debian)
Clang	14.0.6 (Debian)
Python	3.11.2
NumPy	1.26.x
AADC	2.0.0

C++ compiled with -O3 -march=native -std=c++17

AAD Tools Comparison

Performance Comparison

Greeks Computation Overhead

Why Integration Matters

Enzyme-AD

Tape-based (CoDiPack, CppAD, Adept, autodiff)

AADC Different approach

Related Resources

How AADC Works

AADC Toolkit

AADC Scalability (C++)

Benchmark Environment

System Configuration

Test Configuration

Compilers & Versions