C++ AAD Benchmark

Multi-Threaded Bump
Beats Open-Source AAD

Open-source C++ AAD libraries introduce significant overhead. For Monte Carlo workloads, multi-threaded bump-and-revalue (75ms) outperforms all tested AAD libraries.

MatLogica AADC provides the optimal solution: AAD speed with native SIMD vectorization and multi-threading support.

75ms
Best Greeks time (16T Bump)
581ms
Best open-source AAD
132s
Worst AAD (autodiff)
1.9MB
Memory (all)

Important Finding: For Monte Carlo workloads, open-source C++ AAD libraries can be slower than bump-and-revalue, especially when the inner loop is tight. Multi-threaded bump-and-revalue (75ms) beats all tested AAD libraries for Greeks computation.

Greeks Computation Time

100 trades × 1,000 scenarios × 252 timesteps

OpenMP Bump (16T) BEST
75ms
13x
Naive Bump (1T)
563ms
1.7x
CoDiPack AAD Best OSS
581ms
1.7x
OpenMP Bump (1T)
986ms
1x
Adept AAD
2.0s
0.5x
CppAD AAD
12.6s
0.08x
autodiff AAD
132s
0.007x

Speedup relative to single-threaded optimised bump-and-revalue (986ms baseline)

75ms Best Greeks (16T Bump)
581ms Best AAD (CoDiPack)
132s Worst AAD (autodiff)
1.9MB Memory (all)

Price-Only Performance

Without Greeks computation - all libraries perform similarly

OpenMP (16T)
30ms
Naive C++
140ms
CoDiPack
140ms
Adept
142ms
CppAD
144ms
autodiff
149ms

Key insight: For price-only computation, AAD libraries have minimal overhead. The cost comes during the reverse pass for Greeks.

AAD Overhead Breakdown

Why open-source AAD is slow for Monte Carlo

OpenMP Bump

75ms Greeks time (16 threads)
  • Native SIMD vectorization
  • Perfect thread scaling
  • 7 evaluations total
  • No tape overhead
📼

Tape-Based AAD

581ms+ Greeks time (best case)
  • Each operation logged to tape
  • Dynamic memory allocation
  • No SIMD vectorization
  • Single-threaded evaluation
🐌

Inefficient AAD

132s Greeks time (autodiff)
  • Excessive allocation overhead
  • Poor cache locality
  • 1,760x slower than OpenMP!
  • Not production viable

AAD Reverse Pass Overhead

CoDiPack Total: 4.2x
+441ms
Adept Total: 14.6x
+1,899ms
CppAD Total: 90x
+12,480ms
autodiff Total: 943x
+132,026ms

Integration Effort

Code changes required for each approach

Bump-and-Revalue Add bump loops
~20 lines
CoDiPack double → Real, Tape mgmt
~40 lines
Adept double → adouble, Stack mgmt
~45 lines
CppAD double → AD<double>
~50 lines
autodiff double → dual
~35 lines

Thread Scaling

OpenMP bump-and-revalue scales near-linearly with threads

1 Thread
986ms
1x
16 Threads BEST
75ms
13x

When AAD Beats Bump-and-Revalue:

  • Many sensitivities: 10+ Greeks per pricing
  • Second-order Greeks: Gamma, cross-gammas
  • High-dimensional inputs: 100s of parameters
  • Tape reuse: Same model, different inputs

For standard Delta/Rho/Vega (3 Greeks), multi-threaded bump-and-revalue is often faster.

Library Comparison

Strengths and considerations for each AAD library

#3

Adept

Compiled library

2.0s Greeks time
~45 lines changed

Strengths

  • Compiled library (smaller binaries)
  • Array expression support
  • Used in meteorology (ECMWF)

Considerations

  • 3.5x slower than CoDiPack
  • 27x slower than 16-thread bump
  • Requires library linking
  • Less active development
#4

CppAD

COIN-OR project

12.6s Greeks time
~50 lines changed

Strengths

  • Mature project (COIN-OR)
  • JIT compilation support (CppADCodeGen)
  • Good for small problems

Considerations

  • 21x slower than CoDiPack
  • 168x slower than 16-thread bump
  • Excessive memory allocation
  • Not suitable for Monte Carlo
Avoid

autodiff

Modern C++17 - Too slow

132s Greeks time
~35 lines changed

Strengths

  • Modern C++17 design
  • Clean API with dual numbers
  • Header-only

Considerations

  • Catastrophically slow: 132 seconds!
  • 227x slower than CoDiPack
  • 1,760x slower than 16-thread bump
  • Not suitable for production use

When to Use Each Approach

🏆

MatLogica AADC

  • Production Monte Carlo
  • Native SIMD + multi-threading + AAD
  • Best of both worlds
  • O(1) scaling with # of Greeks

OpenMP Bump

  • Standard Greeks (Delta/Rho/Vega)
  • Simple to implement
  • Fastest for 3-7 Greeks
  • No library dependencies
🔬

CoDiPack

  • Research / Prototyping
  • When AAD is required
  • Open-source projects
  • Reasonable performance

Avoid for Production:

  • CppAD: 13x slower than bump-and-revalue
  • autodiff: 134x slower - not suitable for any real workload
  • Adept: 2x slower than bump, marginal AAD benefit

Need Production-Grade AAD Performance?

MatLogica AADC combines AAD correctness with native SIMD vectorization and multi-threading - the best of both worlds for Monte Carlo Greeks.

Hardware Specification

All benchmarks executed on identical hardware for fair comparison

System Configuration

CPUIntel Xeon Platinum 8280L @ 2.70GHz
Cores28 cores/socket, 2 sockets, 112 threads
CompilerGCC 12.2.0 with -O3 -std=c++17
OSLinux 6.1.0-13-amd64 (Debian)

Test Parameters

Trades100
Scenarios1,000
Timesteps252
Threads1 (baseline), 16 (parallel)

Asian Option Monte Carlo with GBM dynamics