Python AAD Benchmark

127-450x Faster Greeks
Than Bump-and-Revalue

MatLogica AADC delivers the fastest Greeks computation for Python Monte Carlo, outperforming JAX, PyTorch, Autograd, XAD, and Enzyme-AD.

All benchmarks use 100 trades × 1,000 scenarios × 252 timesteps on identical hardware. AADC scales to 450x+ speedup with 16 threads.

127-450x
Faster than bump-and-revalue
3-300x
Faster than other AAD libs
3-5x
Less memory usage
~10
Lines of code change

Developer Productivity vs Runtime Performance

Asian Option Monte Carlo - 100 trades × 1,000 scenarios × 252 timesteps

Developer Productivity
Runtime Performance
Best
Simple
Complex
Avoid
Py AADC
AADC 348ms – ~10 lines
JAX
JAX 1.07s – ~50 lines
Enzyme
Enzyme-AD 1.05s – ~50 lines
PyTorch
PyTorch 2.04s – ~60 lines
Autograd
Autograd 5.53s – ~10 lines
XAD
XAD 106.33s – ~15 lines
Naive
Bump & Revalue 44.19s – no rewrite
AADC
JAX
Enzyme-AD
PyTorch
Autograd
XAD
Bump & Revalue (baseline)

Greeks Computation Time

100 trades × 1,000 scenarios × 252 timesteps (single-threaded)

Compilation
Execution
AADC BEST
45ms
172ms
348ms 127x
JAX
230ms
710ms
1.07s 41x
JAX CPU
1.47s
1.03s
2.63s 17x
Enzyme-AD
220ms
701ms
1.05s 42x
PyTorch
1.91s
2.04s * 22x
Autograd
5.40s
5.53s 8x
XAD
106s
106s 0.4x
Bump & Revalue BASELINE
44s 1x

* PyTorch vectorized version; per-path AD takes 1361s. Times shown as compilation + execution.

AADC Thread Scaling

AADC demonstrates near-linear scaling with thread count

127x
1 thread 348ms
374x
4 threads 118ms
508x
8 threads 87ms
566x
16 threads 78ms

Key Insight

AADC compilation time (45ms) is constant regardless of thread count. Execution time scales near-linearly, providing 4.5x additional speedup from 1→16 threads.

127x → 566x Speedup range
348ms → 78ms Total time
4.5x Multi-thread boost

Thread scaling measured on Intel Core i9-12900K. Efficiency = (1-thread speedup × threads) / actual speedup × 100.

Integration Effort

Lines of code changed and model rewrite requirements

AADC ✓ No rewrite
~10 lines
JAX ✗ Yes - Complete
~50 lines
Enzyme-AD ✗ Yes - Complete
~50 lines
PyTorch ✗ Yes - Vectorized
~60 lines
Autograd ~ Partial
~10 lines
XAD ✓ No rewrite
~15 lines

Key JAX/PyTorch Integration Issue

JAX does not support native Python control flow inside JIT-compiled functions:

  • for loops must be replaced with jax.lax.fori_loop
  • if/else must be replaced with jax.lax.cond
  • Existing models require significant restructuring

Memory Efficiency vs Runtime Performance

Asian Option Monte Carlo - 100 trades × 1,000 scenarios × 252 timesteps

Memory Efficiency
Runtime Performance
Best
Efficient
Fast
Avoid
Py AADC
AADC 348ms – 52 MB
JAX
JAX 1.07s – 187 MB
Enzyme
Enzyme-AD 1.05s – 259 MB
PyTorch
PyTorch 2.04s – 241 MB
Autograd
Autograd 5.53s – 46 MB
XAD
XAD 106.33s – 32 MB
AADC
JAX
Enzyme-AD
PyTorch
Autograd
XAD

Memory Usage

Peak memory consumption during Greeks computation

XAD
32 MB
Autograd
46 MB
AADC
52 MB
JAX
187 MB
PyTorch
241 MB
Enzyme-AD
259 MB

Key insight: AADC uses 3-5x less memory than JIT-based alternatives (JAX, PyTorch, Enzyme-AD) while delivering faster performance.

Library Comparison

Key factors for each AAD library

Google JAX

Loops require rewrite to jax.lax.fori_loop

1.07s Greeks time
~50 lines changed
187 MB memory

Factors

  • Loops require rewrite to jax.lax.fori_loop
  • NumPy code must convert to jax.numpy
  • No commercial support or SLAs
  • Runtime changes trigger recompilation
  • ML-focused; limited regulatory traceability

Enzyme-AD

Clang-only; requires JAX 0.4.30 (version locked)

1.05s Greeks time
~50 lines changed
259 MB memory

Factors

  • Clang-only; no MSVC or GCC support
  • Requires JAX 0.4.30 (not compatible with JAX 0.8.x)
  • 18-36 month integration; complex LLVM plugin setup
  • External libraries must be recompiled
  • Cryptic LLVM IR-level error messages
  • Experimental; poor regulatory compliance

PyTorch

Heavy compilation overhead; requires tensor rewrite

2.04s Greeks time
~60 lines changed
241 MB memory

Factors

  • Requires torch.exp(), tensor requires_grad
  • 1.9s compilation overhead per model change
  • 2GB+ install; heavy deployment footprint
  • No finance-focused commercial support
  • Built for deep learning, not quant finance
  • Moderate regulatory audit suitability

Harvard Autograd

Last release 2022; effectively unmaintained

5.53s Greeks time
~10 lines changed
46 MB memory

Factors

  • Last release 2022; effectively unmaintained
  • No multithreading support
  • No checkpointing or model serialization
  • Poor regulatory and XVA/CVA suitability
  • Small community; limited production examples
  • No commercial support available

Ready to Accelerate Your Greeks?

See how AADC can deliver 127-450x faster Greeks computation for your Python Monte Carlo models with minimal code changes.

Hardware Specification

All benchmarks executed on identical hardware for fair comparison

CPU Intel Core i9-12900K (16 cores / 24 threads)
RAM 64 GB DDR5
OS Ubuntu 22.04 LTS
Python 3.11.4
NumPy 1.24.3
Threads 1, 4, 8, 16 (for AADC scaling tests)