Comparing tape-based and source-transform AAD implementations for Monte Carlo Greeks
How much longer does Greeks computation take vs price-only? Lower is better.
| Library | Forward Overhead | Reverse Pass | Total Overhead |
|---|---|---|---|
| AADC C++ | ~1.0x | +0.3s | 1.5x |
| AADC Python | ~1.0x | +0.2s | 1.5x |
| Enzyme-AD | ~1.0x | +3.0s | 1.9x |
| CoDiPack | ~1.0x | +2.9s | 1.5x |
| Adept | ~1.0x | +24.4s | 5.2x |
| CppAD* | ~1.0x | +11.6s | 5.3x |
| autodiff* | ~1.1x | +15.0s | 90x |
Greeks Overhead = (Greeks Time / Price-Only Time). This measures how much longer it takes to compute sensitivities (Delta, Rho, Vega) compared to just computing the price. Lower is better. AADC achieves low overhead through kernel recording and native SIMD optimizations. *CppAD and autodiff timed out at larger scales (100K+ scenarios).
Performance is only part of the story for large codebases
double → AD<double>)doubleAll benchmarks executed on enterprise-grade server hardware
| CPU | 2x Intel Xeon Platinum 8280L @ 2.70GHz |
| Cores | 56 physical (28 per socket), 112 threads |
| Architecture | x86_64, Cascade Lake |
| L3 Cache | 77 MiB (38.5 MiB per socket) |
| RAM | 283 GB DDR4 |
| OS | Linux kernel 6.1.0-13-amd64 (Debian) |
| Model | Asian Option Monte Carlo |
| Dynamics | Geometric Brownian Motion (GBM) |
| Timesteps | 252 (daily over 1 year) |
| Greeks | Delta, Rho, Vega (3 sensitivities) |
| Threads | 8 (configurable) |
| SIMD | AVX2 (4 doubles/instruction) |
| GCC | 12.2.0 (Debian) |
| Clang | 14.0.6 (Debian) |
| Python | 3.11.2 |
| NumPy | 1.26.x |
| AADC | 2.0.0 |