Python AAD Library Benchmark

Developer Productivity vs Runtime Performance

Asian Option Monte Carlo - 100 trades × 1,000 scenarios × 252 timesteps

Developer Productivity ↑

→ Runtime Performance

Best

Simple

Complex

Avoid

 Py AADC 
 AADC  348ms  – ~10 lines  

JAX

JAX 1.07s – ~50 lines

Enzyme

Enzyme-AD 1.05s – ~50 lines

PyTorch

PyTorch 2.04s – ~60 lines

Autograd

Autograd 5.53s – ~10 lines

XAD

XAD 106.33s – ~15 lines

Naive

Bump & Revalue 44.19s – no rewrite

AADC

JAX

Enzyme-AD

PyTorch

Autograd

XAD

Bump & Revalue (baseline)

The Results

Greeks Computation Time

100 trades × 1,000 scenarios × 252 timesteps (single-threaded)

Compilation

Execution

 AADC BEST 
 45ms 
 172ms 
 348ms  127x

JAX

230ms

710ms

1.07s 41x

JAX CPU

1.47s

1.03s

2.63s 17x

Enzyme-AD

220ms

701ms

1.05s 42x

PyTorch

1.91s

2.04s ^* 22x

Autograd

5.40s

5.53s 8x

XAD

106s

106s 0.4x

Bump & Revalue BASELINE

44s 1x

* PyTorch vectorized version; per-path AD takes 1361s. Times shown as compilation + execution.

Multi-threading

AADC Thread Scaling

AADC demonstrates near-linear scaling with thread count

127x

1 thread 348ms

374x

4 threads 118ms

508x

8 threads 87ms

566x

16 threads 78ms

Key Insight

AADC compilation time (45ms) is constant regardless of thread count. Execution time scales near-linearly, providing 4.5x additional speedup from 1→16 threads.

127x → 566x Speedup range

348ms → 78ms Total time

4.5x Multi-thread boost

Thread scaling measured on Intel Core i9-12900K. Efficiency = (1-thread speedup × threads) / actual speedup × 100.

Developer Experience

Integration Effort

Lines of code changed and model rewrite requirements

 AADC  ✓ No rewrite  
 ~10 lines 

JAX ✗ Yes - Complete

~50 lines

Enzyme-AD ✗ Yes - Complete

~50 lines

PyTorch ✗ Yes - Vectorized

~60 lines

Autograd ~ Partial

~10 lines

XAD ✓ No rewrite

~15 lines

Key JAX/PyTorch Integration Issue

JAX does not support native Python control flow inside JIT-compiled functions:

for loops must be replaced with jax.lax.fori_loop
if/else must be replaced with jax.lax.cond
Existing models require significant restructuring

Memory Efficiency vs Runtime Performance

Asian Option Monte Carlo - 100 trades × 1,000 scenarios × 252 timesteps

Memory Efficiency ↑

→ Runtime Performance

Best

Efficient

Fast

Avoid

 Py AADC 
 AADC  348ms  – 52 MB  

JAX

JAX 1.07s – 187 MB

Enzyme

Enzyme-AD 1.05s – 259 MB

PyTorch

PyTorch 2.04s – 241 MB

Autograd

Autograd 5.53s – 46 MB

XAD

XAD 106.33s – 32 MB

AADC

JAX

Enzyme-AD

PyTorch

Autograd

XAD

Resource Efficiency

Memory Usage

Peak memory consumption during Greeks computation

XAD

32 MB

Autograd

46 MB

AADC
52 MB

JAX

187 MB

PyTorch

241 MB

Enzyme-AD

259 MB

Key insight: AADC uses 3-5x less memory than JIT-based alternatives (JAX, PyTorch, Enzyme-AD) while delivering faster performance.

Detailed Analysis

Library Comparison

Key factors for each AAD library

BEST

MatLogica AADC

Production-grade performance with linear thread scaling

348ms Greeks time

~10 lines changed

52 MB memory

Factors

Minimal code changes; np.exp() syntax works
Linear thread scaling (1→16 threads = 4.5x faster)
3-6 month integration; easy rollback
Full control flow and exception support
Commercial support with finance SLAs
10+ years track record; excellent regulatory fit

Google JAX

Loops require rewrite to jax.lax.fori_loop

1.07s Greeks time

~50 lines changed

187 MB memory

Factors

Loops require rewrite to jax.lax.fori_loop
NumPy code must convert to jax.numpy
No commercial support or SLAs
Runtime changes trigger recompilation
ML-focused; limited regulatory traceability

Enzyme-AD

Clang-only; requires JAX 0.4.30 (version locked)

1.05s Greeks time

~50 lines changed

259 MB memory

Factors

Clang-only; no MSVC or GCC support
Requires JAX 0.4.30 (not compatible with JAX 0.8.x)
18-36 month integration; complex LLVM plugin setup
External libraries must be recompiled
Cryptic LLVM IR-level error messages
Experimental; poor regulatory compliance

PyTorch

Heavy compilation overhead; requires tensor rewrite

2.04s Greeks time

~60 lines changed

241 MB memory

Factors

Requires torch.exp(), tensor requires_grad
1.9s compilation overhead per model change
2GB+ install; heavy deployment footprint
No finance-focused commercial support
Built for deep learning, not quant finance
Moderate regulatory audit suitability

Harvard Autograd

Last release 2022; effectively unmaintained

5.53s Greeks time

~10 lines changed

46 MB memory

Factors

Last release 2022; effectively unmaintained
No multithreading support
No checkpointing or model serialization
Poor regulatory and XVA/CVA suitability
Small community; limited production examples
No commercial support available

Ready to Accelerate Your Greeks?

See how AADC can deliver 127-450x faster Greeks computation for your Python Monte Carlo models with minimal code changes.

Request Demo Contact Us

Test Environment

Hardware Specification

All benchmarks executed on identical hardware for fair comparison

CPU Intel Core i9-12900K (16 cores / 24 threads)

RAM 64 GB DDR5

OS Ubuntu 22.04 LTS

Python 3.11.4

NumPy 1.24.3

Threads 1, 4, 8, 16 (for AADC scaling tests)

127-450x Faster Greeks Than Bump-and-Revalue

Developer Productivity vs Runtime Performance

Greeks Computation Time

AADC Thread Scaling

Key Insight

Integration Effort

Key JAX/PyTorch Integration Issue

Memory Efficiency vs Runtime Performance

Memory Usage

Library Comparison

MatLogica AADC

Factors

Google JAX

Factors

Enzyme-AD

Factors

PyTorch

Factors

Harvard Autograd

Factors

Ready to Accelerate Your Greeks?

Related Solutions

Python Accelerator

Front Office Solutions

Quant Technology

Hardware Specification

127-450x Faster Greeks
Than Bump-and-Revalue