AADC vs Python, NumPy, C++

5.4x Faster Greeks Than Hand-Optimised C++.
Just Python - Accelerated.

MatLogica AADC delivers 420x faster performance than Basic Python, 73x faster Greeks than NumPy, and 5.4x faster Greeks than hand-optimised C++ - while requiring minimal code changes.

For computationally-heavy workloads (which is most of quant finance), AADC delivers superior performance with dramatically less complexity. No low-level expertise needed - AADC handles optimisations and computation of derivatives (AAD) automatically.

All benchmarks are transparent, reproducible, and verifiable. Run the code yourself and confirm the results.

420x
Faster than Basic Python
73x
Faster Greeks than NumPy
5.4x
Faster Greeks than C++
+32%
Greeks overhead (vs +582%)

AADC Breaks the Trade-off

Traditionally, you choose productivity OR performance. AADC gives you both - and more.

Developer Productivity vs Runtime Performance
Asian Option Monte Carlo - 1000 trades x 500K scenarios - Price + Greeks
Developer Productivity
Runtime Performance
Basic
Basic Python
~16 hours* - 739 lines
NumPy
NumPy Vectorized
~2.75 hours* - 745 lines
AVX2
C++ Optimised
731s - 877 lines
Py AADC
AADC Python
136s - 822 lines
Basic Python
NumPy Vectorized
C++ Optimised
AADC Python

Four Approaches to Monte Carlo Pricing

1,000 trades x 500K scenarios x 252 timesteps - Price + Greeks

Basic Python cannot scale for production Monte Carlo. With AADC, your existing Python code runs at C++ speeds with zero SIMD expertise required.

For risk-heavy workloads (which is most of quant finance), AADC delivers superior performance with dramatically less complexity.

Level 1

Basic Python

Simple but cannot scale

~16 hours
execution time
739
total lines
Run
Code
  • Simple, readable Python code
  • Bump & revalue for Greeks (4x)
  • No vectorization
  • Greeks add +302% overhead
Level 2

NumPy Vectorized

~6x faster than Basic

~2.75 hours
execution time
745
total lines
Run
Code
  • NumPy vectorization across scenarios
  • Bump & revalue for Greeks (4x)
  • No Python loops in hot path
  • Greeks add +306% overhead
Level 3

C++ Optimised

Fast but complex

731s
execution time
877
total lines
Run
Code
  • Manual AVX2 SIMD intrinsics
  • Bump & revalue for Greeks (7x)
  • Weeks of expert work
  • Greeks add +582% overhead
See the Difference

The Same Logic, Four Ways

Browse functions like files in VSCode - or ask for source code to run yourself

  • f gbm_constants
  • f simulate_path
  • f price_asian_option
  • f price_with_greeks
basic.py - gbm_constants
~16 hours 739 total lines
def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = math.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt
basic.py - simulate_path
~16 hours 739 total lines
def simulate_path(S0, K, r, T, drift, vol_sqrt_dt, Z_path, num_timesteps):
    """Simulate GBM path and compute discounted payoff."""
    price = S0
    running_sum = 0.0
    for t in range(num_timesteps):
        price = price * math.exp(drift + vol_sqrt_dt * Z_path[t])
        running_sum += price

    average = running_sum / num_timesteps
    payoff = max(average - K, 0.0)
    discount = math.exp(-r * T)
    discounted_payoff = discount * payoff
    return discounted_payoff
basic.py - price_asian_option
~16 hours 739 total lines
def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Price Asian option - loop over all scenarios."""
    dt, sqrt_dt, drift, vol_sqrt_dt = gbm_constants(r, sigma, T, num_timesteps)

    payoff_sum = 0.0
    for scenario in range(num_scenarios):
        payoff_sum += simulate_path(S0, K, r, T, drift, vol_sqrt_dt,
                                    Z[scenario], num_timesteps)

    return payoff_sum / num_scenarios
basic.py - price_with_greeks
~16 hours 739 total lines
def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Compute price and Greeks via bump-and-revalue (4 pricings)."""
    bump = 1e-6
    price = price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps)
    delta = (price_asian_option(S0 + bump, K, r, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    rho   = (price_asian_option(S0, K, r + bump, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    vega  = (price_asian_option(S0, K, r, sigma + bump, T, Z, num_scenarios, num_timesteps) - price) / bump
    return price, delta, rho, vega

# Greeks require 4 full pricings - 4x the compute cost!
numpy_pricer.py - gbm_constants
~2.75 hours 745 total lines
def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = np.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt

# Same as basic - using np.sqrt instead of math.sqrt
numpy_pricer.py - simulate_path
~2.75 hours 745 total lines
def simulate_paths_vectorized(S0, K, r, T, drift, vol_sqrt_dt, Z, num_scenarios, num_timesteps):
    """Simulate all GBM paths at once using NumPy vectorization."""
    # Z is (num_scenarios, num_timesteps)
    log_increments = drift + vol_sqrt_dt * Z  # Vectorized across all scenarios
    log_prices = np.cumsum(log_increments, axis=1)
    prices = S0 * np.exp(log_prices)

    # Running average for Asian option
    running_sum = np.cumsum(prices, axis=1)
    averages = running_sum[:, -1] / num_timesteps

    payoffs = np.maximum(averages - K, 0.0)
    discount = np.exp(-r * T)
    return discount * payoffs

# Vectorized across scenarios - ~6x faster than Basic Python
numpy_pricer.py - price_asian_option
~2.75 hours 745 total lines
def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Price Asian option using NumPy vectorization."""
    dt, sqrt_dt, drift, vol_sqrt_dt = gbm_constants(r, sigma, T, num_timesteps)

    # Single vectorized call - no Python loop over scenarios
    discounted_payoffs = simulate_paths_vectorized(
        S0, K, r, T, drift, vol_sqrt_dt, Z, num_scenarios, num_timesteps
    )

    return np.mean(discounted_payoffs)

# ~6x faster than Basic Python loops
numpy_pricer.py - price_with_greeks
~2.75 hours 745 total lines
def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Compute price and Greeks via bump-and-revalue (4 pricings)."""
    bump = 1e-6
    price = price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps)
    delta = (price_asian_option(S0 + bump, K, r, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    rho   = (price_asian_option(S0, K, r + bump, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    vega  = (price_asian_option(S0, K, r, sigma + bump, T, Z, num_scenarios, num_timesteps) - price) / bump
    return price, delta, rho, vega

# Still 4 pricings needed - Greeks add +274% overhead
aadc_pricer.py - gbm_constants
136s 822 total lines
def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = np.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt

# Identical to naive - AADC works with regular Python code!
aadc_pricer.py - simulate_path
136s 822 total lines
def simulate_path(S0, K, r, T, drift, vol_sqrt_dt, Z_vals, num_timesteps):
    """Simulate GBM path and compute discounted payoff."""
    price = S0
    running_sum = aadc.idouble(0.0)                                  # AADC
    for t in range(num_timesteps):
        price = price * np.exp(drift + vol_sqrt_dt * Z_vals[t])
        running_sum = running_sum + price

    average = running_sum / num_timesteps
    payoff = np.maximum(average - K, 0.0)
    discount = np.exp(-r * T)
    discounted_payoff = discount * payoff
    return discounted_payoff

# Only change: aadc.idouble for running_sum - enables AAD!
aadc_pricer.py - price_asian_option
136s 822 total lines
def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads=4):
    """Price Asian option using AADC - loop over all scenarios."""
    workers = aadc.ThreadPool(num_threads)                           # AADC

    # --- Record computation graph ---
    funcs = aadc.Functions()                                         # AADC
    funcs.start_recording()                                          # AADC

    # Active variables (use idouble instead of float)
    S0_v    = aadc.idouble(S0);    S0_arg    = S0_v.mark_as_input()   # AADC
    r_v     = aadc.idouble(r);     r_arg     = r_v.mark_as_input()    # AADC
    sigma_v = aadc.idouble(sigma); sigma_arg = sigma_v.mark_as_input()# AADC
    K_v     = aadc.idouble(K);     K_arg     = K_v.mark_as_input_no_diff()  # AADC
    T_v     = aadc.idouble(T);     T_arg     = T_v.mark_as_input_no_diff()  # AADC

    # ... record path simulation ...

    payoff_res = discounted_payoff.mark_as_output()                  # AADC
    funcs.stop_recording()                                           # AADC

    # Evaluate vectorized across scenarios
    request = {payoff_res: [S0_arg, r_arg, sigma_arg]}               # AADC
    results = aadc.evaluate(funcs, request, inputs, workers)         # AADC

    return results, payoff_res, S0_arg, r_arg, sigma_arg
aadc_pricer.py - price_with_greeks
136s 822 total lines
def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads=4):
    """Compute price and Greeks via AAD (1 forward + 1 adjoint pass)."""
    results, payoff_res, S0_arg, r_arg, sigma_arg = price_asian_option(
        S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads
    )

    # --- Extract results ---                                        # AADC
    discounted_payoffs = results[0][payoff_res]                      # AADC
    price = float(np.mean(discounted_payoffs))                       # AADC

    # Greeks from single adjoint pass (no extra pricings needed!)    # AADC
    delta = float(np.mean(results[1][payoff_res][S0_arg]))           # AADC
    rho   = float(np.mean(results[1][payoff_res][r_arg]))            # AADC
    vega  = float(np.mean(results[1][payoff_res][sigma_arg]))        # AADC

    return price, delta, rho, vega

# All Greeks computed in ONE pass - +31% overhead vs +593%!
optimised.cpp - gbm_constants
731s 877 total lines
void gbm_constants(double r, double vol, double T, size_t num_timesteps,
                   double& dt, double& sqrt_dt, double& drift, double& vol_sqrt_dt) {
    /**Compute GBM simulation constants.*/
    dt = T / static_cast<double>(num_timesteps);
    sqrt_dt = std::sqrt(dt);
    drift = (r - 0.5 * vol * vol) * dt;
    vol_sqrt_dt = vol * sqrt_dt;
}
optimised.cpp - simulate_path
731s 877 total lines
// Scalar version (portable, works on ARM/Apple Silicon)
double simulate_path_scalar(double S0, double K, double drift, double vol_sqrt_dt,
                            const double* Z_row, size_t num_timesteps) {
    /**Simulate GBM path and compute payoff (scalar version).*/
    double price = S0;
    double running_sum = 0.0;

    for (size_t t = 0; t < num_timesteps; ++t) {
        price = price * std::exp(drift + vol_sqrt_dt * Z_row[t]);
        running_sum += price;
    }

    double average = running_sum / static_cast<double>(num_timesteps);
    double payoff = std::max(average - K, 0.0);
    return payoff;
}

// AVX2 SIMD version also available for x86-64 (~2x faster)
optimised.cpp - price_asian_option
731s 877 total lines
// Core pricing function (auto-selects AVX2 or scalar)
double price_asian_option(
    double S0, double K, double r, double vol, double T,
    const double* Z, size_t num_scenarios, size_t num_timesteps
) {
    double dt, sqrt_dt, drift, vol_sqrt_dt;
    gbm_constants(r, vol, T, num_timesteps, dt, sqrt_dt, drift, vol_sqrt_dt);

#if USE_AVX2
    // Broadcast constants to SIMD registers
    const __m256d drift_vec = _mm256_set1_pd(drift);
    const __m256d vol_sqrt_dt_vec = _mm256_set1_pd(vol_sqrt_dt);
    const __m256d S0_vec = _mm256_set1_pd(S0);
    size_t row_stride = (num_timesteps + SIMD_WIDTH - 1) & ~(SIMD_WIDTH - 1);
#else
    size_t row_stride = num_timesteps;
#endif

    double payoff_sum = 0.0;
    for (size_t scenario = 0; scenario < num_scenarios; ++scenario) {
        const double* Z_row = Z + scenario * row_stride;
#if USE_AVX2
        payoff_sum += simulate_path_avx(S0, K, drift, vol_sqrt_dt, Z_row, num_timesteps,
                                        drift_vec, vol_sqrt_dt_vec, S0_vec);
#else
        payoff_sum += simulate_path_scalar(S0, K, drift, vol_sqrt_dt, Z_row, num_timesteps);
#endif
    }

    return std::exp(-r * T) * (payoff_sum / static_cast<double>(num_scenarios));
}
optimised.cpp - price_with_greeks
731s 877 total lines
// Greeks via bump-and-revalue (requires 4 full pricings)
constexpr double BUMP_SIZE = 1e-6;

void price_with_greeks(
    double S0, double K, double r, double vol, double T,
    const double* Z, size_t num_scenarios, size_t num_timesteps,
    double& price, double& delta, double& rho, double& vega
) {
    price = price_asian_option(S0, K, r, vol, T, Z, num_scenarios, num_timesteps);
    double p_dS = price_asian_option(S0 + BUMP_SIZE, K, r, vol, T, Z, num_scenarios, num_timesteps);
    double p_dr = price_asian_option(S0, K, r + BUMP_SIZE, vol, T, Z, num_scenarios, num_timesteps);
    double p_dv = price_asian_option(S0, K, r, vol + BUMP_SIZE, T, Z, num_scenarios, num_timesteps);
    delta = (p_dS - price) / BUMP_SIZE;
    rho   = (p_dr - price) / BUMP_SIZE;
    vega  = (p_dv - price) / BUMP_SIZE;
}

// Even with AVX2 SIMD, Greeks add +582% overhead (7 evaluations)!

Key insight: AADC delivers 420x faster than Basic Python and 73x faster Greeks than NumPy — outperforming hand-optimised C++ by 5.4x.

Greeks via AAD: 1 forward + 1 adjoint pass - +32% overhead vs +306% for NumPy bump-and-revalue.

Key Insights

Performance: AADC Python delivers 420x faster than Basic Python, 73x faster Greeks than NumPy, and 5.4x faster Greeks than hand-optimised C++.

Greeks overhead: Greeks via AAD require just 1 forward + 1 adjoint pass — +32% overhead vs +306% for NumPy and +582% for C++ bump-and-revalue.

Greeks scaling: NumPy cost grows linearly with each Greek (4 evaluations for 3 Greeks, 11 for 10, 51 for 50), while AADC computes ALL Greeks in constant time — making AADC 73x faster at 3 Greeks, and scaling even better with more Greeks.

Greeks NumPy Evals AADC Evals AADC Advantage
3 4 ~1.3 10x
10 11 ~1.3 27x
50 51 ~1.3 127x

Greeks: Where AADC Shines

AAD vs traditional bump-and-revalue - Risk calculations require Greeks (Delta, Rho, Vega)

Traditional Method Bump & Revalue

Requires 4-7 full pricings per trade for Delta, Rho, Vega sensitivities.

Basic Python
+302%
NumPy
+306%
C++ Optimised
+582%
C++ with 1000 trades: 731 seconds
vs
AADC Method Adjoint Differentiation

Requires just 1 forward + 1 adjoint pass - all Greeks in one sweep.

AADC Python
+32%
AADC with 1000 trades: 136 seconds
5.4x faster than C++

* Greeks overhead = additional time beyond price-only calculation

How It Works

From specification to production benchmark in hours

1

Your Model

Use your model or a model developed by AI with your specification. If using AI, be sure to define all parameters, constraints, do's and don'ts.

2

Enable AADC

The AADC-Agent converts your model (Python or C++) to AADC-enabled versions. Works like any Claude task - ask it to fix issues. Choose your language and hardware.

3

Benchmark &amp; Deploy

Get your benchmark in hours, not weeks. See exactly how AADC performs. If AI can't help, MatLogica is always there for expert support.

i We don't advise developing quant models using AI, but it's a good way to get started prototyping with AADC.

i We don't suggest using AI for production integrations. MatLogica's integration/debugging toolkit should be used for production integrations.

Ready to Try It Yourself?

Get the CLAUDE.md template files and request a demo version of MatLogica AADC. All benchmark code is available to run, validate, and verify.

Hardware Specification

Benchmarks executed on enterprise-grade server hardware

System Configuration

CPU2x Intel Xeon Platinum 8280L @ 2.70GHz
Cores56 physical (28 per socket), 112 threads
Architecturex86_64, Cascade Lake
L3 Cache77 MiB (38.5 MiB per socket)
RAM283 GB
OSLinux kernel 6.1.0-13-amd64 (Debian 64-bit)
CPU Features: AVX-512, AVX2, FMA, AES-NI

Test Configuration

Trades1,000
Scenarios500,000
Timesteps252
Threads16
Total Paths500M

Asian Option Monte Carlo with GBM dynamics

Compilers

GCC12.2.0 (Debian)
Clang14.0.6 (Debian)
Python3.11.2 + AADC

Dual-socket server with AVX-512 vectorization