Can Python really achieve C++ performance with AADC?

Yes. AADC Python achieves 2871x speedup over basic Python, and when computing Greeks, AADC is 1.9x faster than hand-optimised C++. Teams are building production systems in Python with AADC, iterating at speeds that would normally require hand-optimised C++.

What's the recommended path for evaluating AADC?

Prototype in Python, observe performance on a real model, then harden for production with the toolkit. This approach lets you see actual speedup on your specific models before any commitment. Some teams like this enough that they build production systems this way.

How does AADC compare to hand-optimised C++ for Greeks calculation?

AADC is 1.9x faster than hand-optimised C++ for computing Greeks (0.67s vs 1.30s). While C++ may edge out AADC slightly for pricing-only, AADC dramatically outperforms when Greeks are required. Traditional C++ adds +495% overhead for Greeks via bump-and-revalue, while AADC adds only +26%.

Accelerate Python Models 2871x with AADC

Q: How much code change is required for AADC acceleration?

Only +77 lines of code are needed to achieve 2871x speedup. AADC integration handles type annotations, recording setup, and kernel compilation — the boilerplate work that's straightforward to validate.

"This is the evaluation path we recommend for AADC: prototype in Python, observe performance on a real model, then harden for production with the toolkit. Some teams like this approach enough that they build production systems this way — iterating in Python at speeds that would normally require hand-optimised C++. No rewrites, no hidden assumptions. Same logic — just faster."

Dmitri Goloubentsev

CTO, MatLogica

1. Start with Your Model

No rewrite required

Use your current Python code as-is
Monte Carlo, pricing, or risk models

2. Integrate AADC

Minimal code changes

Add AADC wrapper code
AI tools can help generate this

3. Validate & Benchmark

Same results, faster

Verify results match exactly
See actual speedup on your code

Performance Results

Version	Lines of Code	Execution Time	Speedup
Basic Python	775	32 min	1×
NumPy	781	18.8s	102x
C++ Optimised	880	1.30s	1483x
Python + AADC	852 (+77)	0.67s	2871x

Benchmark Configuration

10 trades × 100K scenarios × 252 timesteps with 8 threads — all Greeks (Delta, Rho, Vega) computed.

AADC vs Hand-Optimised C++

For pricing-only, C++ edges out AADC slightly

With Greeks, AADC is 1.9x faster than hand-optimised C++

Traditional bump-and-revalue adds +495% overhead for Greeks

AADC adds only +26% overhead while computing all Greeks

See the Difference

The Same Logic, Four Ways

Browse functions like files in VSCode - or ask for source code to run yourself

basic.py - gbm_constants 32 min 775 total lines 
 def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = math.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt 

basic.py - simulate_path 32 min 775 total lines 
 def simulate_path(S0, K, r, T, drift, vol_sqrt_dt, Z_path, num_timesteps):
    """Simulate GBM path and compute discounted payoff."""
    price = S0
    running_sum = 0.0
    for t in range(num_timesteps):
        price = price * math.exp(drift + vol_sqrt_dt * Z_path[t])
        running_sum += price

    average = running_sum / num_timesteps
    payoff = max(average - K, 0.0)
    discount = math.exp(-r * T)
    discounted_payoff = discount * payoff
    return discounted_payoff 

basic.py - price_asian_option 32 min 775 total lines 
 def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Price Asian option - loop over all scenarios."""
    dt, sqrt_dt, drift, vol_sqrt_dt = gbm_constants(r, sigma, T, num_timesteps)

    payoff_sum = 0.0
    for scenario in range(num_scenarios):
        payoff_sum += simulate_path(S0, K, r, T, drift, vol_sqrt_dt,
                                    Z[scenario], num_timesteps)

    return payoff_sum / num_scenarios 

basic.py - price_with_greeks 32 min 775 total lines 
 def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Compute price and Greeks via bump-and-revalue (4 pricings)."""
    bump = 1e-6
    price = price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps)
    delta = (price_asian_option(S0 + bump, K, r, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    rho   = (price_asian_option(S0, K, r + bump, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    vega  = (price_asian_option(S0, K, r, sigma + bump, T, Z, num_scenarios, num_timesteps) - price) / bump
    return price, delta, rho, vega

# Greeks require 4 full pricings - 4x the compute cost! 

numpy_pricer.py - gbm_constants 18.8s 781 total lines 
 def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = np.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt

# Same as basic - using np.sqrt instead of math.sqrt 

numpy_pricer.py - simulate_path 18.8s 781 total lines 
 def simulate_paths_vectorized(S0, K, r, T, drift, vol_sqrt_dt, Z, num_scenarios, num_timesteps):
    """Simulate all GBM paths at once using NumPy vectorization."""
    # Z is (num_scenarios, num_timesteps)
    log_increments = drift + vol_sqrt_dt * Z  # Vectorized across all scenarios
    log_prices = np.cumsum(log_increments, axis=1)
    prices = S0 * np.exp(log_prices)

    # Running average for Asian option
    running_sum = np.cumsum(prices, axis=1)
    averages = running_sum[:, -1] / num_timesteps

    payoffs = np.maximum(averages - K, 0.0)
    discount = np.exp(-r * T)
    return discount * payoffs

# Vectorized across scenarios - ~6x faster than Basic Python 

numpy_pricer.py - price_asian_option 18.8s 781 total lines 
 def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Price Asian option using NumPy vectorization."""
    dt, sqrt_dt, drift, vol_sqrt_dt = gbm_constants(r, sigma, T, num_timesteps)

    # Single vectorized call - no Python loop over scenarios
    discounted_payoffs = simulate_paths_vectorized(
        S0, K, r, T, drift, vol_sqrt_dt, Z, num_scenarios, num_timesteps
    )

    return np.mean(discounted_payoffs)

# ~6x faster than Basic Python loops 

numpy_pricer.py - price_with_greeks 18.8s 781 total lines 
 def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Compute price and Greeks via bump-and-revalue (4 pricings)."""
    bump = 1e-6
    price = price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps)
    delta = (price_asian_option(S0 + bump, K, r, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    rho   = (price_asian_option(S0, K, r + bump, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    vega  = (price_asian_option(S0, K, r, sigma + bump, T, Z, num_scenarios, num_timesteps) - price) / bump
    return price, delta, rho, vega

# Still 4 pricings needed - Greeks add +274% overhead 

aadc_pricer.py - gbm_constants 0.67s 852 total lines 
 def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = np.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt

# Identical to naive - AADC works with regular Python code! 

aadc_pricer.py - simulate_path 0.67s 852 total lines 
 def simulate_path(S0, K, r, T, drift, vol_sqrt_dt, Z_vals, num_timesteps):
    """Simulate GBM path and compute discounted payoff."""
    price = S0
    running_sum = aadc.idouble(0.0)                                  # AADC
    for t in range(num_timesteps):
        price = price * np.exp(drift + vol_sqrt_dt * Z_vals[t])
        running_sum = running_sum + price

    average = running_sum / num_timesteps
    payoff = np.maximum(average - K, 0.0)
    discount = np.exp(-r * T)
    discounted_payoff = discount * payoff
    return discounted_payoff

# Only change: aadc.idouble for running_sum - enables AAD! 

aadc_pricer.py - price_asian_option 0.67s 852 total lines 
 def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads=4):
    """Price Asian option using AADC - loop over all scenarios."""
    workers = aadc.ThreadPool(num_threads)                           # AADC

    # --- Record computation graph ---
    funcs = aadc.Functions()                                         # AADC
    funcs.start_recording()                                          # AADC

    # Active variables (use idouble instead of float)
    S0_v    = aadc.idouble(S0);    S0_arg    = S0_v.mark_as_input()   # AADC
    r_v     = aadc.idouble(r);     r_arg     = r_v.mark_as_input()    # AADC
    sigma_v = aadc.idouble(sigma); sigma_arg = sigma_v.mark_as_input()# AADC
    K_v     = aadc.idouble(K);     K_arg     = K_v.mark_as_input_no_diff()  # AADC
    T_v     = aadc.idouble(T);     T_arg     = T_v.mark_as_input_no_diff()  # AADC

    # ... record path simulation ...

    payoff_res = discounted_payoff.mark_as_output()                  # AADC
    funcs.stop_recording()                                           # AADC

    # Evaluate vectorized across scenarios
    request = {payoff_res: [S0_arg, r_arg, sigma_arg]}               # AADC
    results = aadc.evaluate(funcs, request, inputs, workers)         # AADC

    return results, payoff_res, S0_arg, r_arg, sigma_arg 

aadc_pricer.py - price_with_greeks 0.67s 852 total lines 
 def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads=4):
    """Compute price and Greeks via AAD (1 forward + 1 adjoint pass)."""
    results, payoff_res, S0_arg, r_arg, sigma_arg = price_asian_option(
        S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads
    )

    # --- Extract results ---                                        # AADC
    discounted_payoffs = results[0][payoff_res]                      # AADC
    price = float(np.mean(discounted_payoffs))                       # AADC

    # Greeks from single adjoint pass (no extra pricings needed!)    # AADC
    delta = float(np.mean(results[1][payoff_res][S0_arg]))           # AADC
    rho   = float(np.mean(results[1][payoff_res][r_arg]))            # AADC
    vega  = float(np.mean(results[1][payoff_res][sigma_arg]))        # AADC

    return price, delta, rho, vega

# All Greeks computed in ONE pass - +31% overhead vs +593%! 

optimised.cpp - gbm_constants 1.30s 880 total lines 
 void gbm_constants(double r, double vol, double T, size_t num_timesteps,
                   double& dt, double& sqrt_dt, double& drift, double& vol_sqrt_dt) {
    /**Compute GBM simulation constants.*/
    dt = T / static_cast<double>(num_timesteps);
    sqrt_dt = std::sqrt(dt);
    drift = (r - 0.5 * vol * vol) * dt;
    vol_sqrt_dt = vol * sqrt_dt;
} 

optimised.cpp - simulate_path 1.30s 880 total lines 
 // Scalar version (portable, works on ARM/Apple Silicon)
double simulate_path_scalar(double S0, double K, double drift, double vol_sqrt_dt,
                            const double* Z_row, size_t num_timesteps) {
    /**Simulate GBM path and compute payoff (scalar version).*/
    double price = S0;
    double running_sum = 0.0;

    for (size_t t = 0; t < num_timesteps; ++t) {
        price = price * std::exp(drift + vol_sqrt_dt * Z_row[t]);
        running_sum += price;
    }

    double average = running_sum / static_cast<double>(num_timesteps);
    double payoff = std::max(average - K, 0.0);
    return payoff;
}

// AVX2 SIMD version also available for x86-64 (~2x faster) 

optimised.cpp - price_asian_option 1.30s 880 total lines 
 // Core pricing function (auto-selects AVX2 or scalar)
double price_asian_option(
    double S0, double K, double r, double vol, double T,
    const double* Z, size_t num_scenarios, size_t num_timesteps
) {
    double dt, sqrt_dt, drift, vol_sqrt_dt;
    gbm_constants(r, vol, T, num_timesteps, dt, sqrt_dt, drift, vol_sqrt_dt);

#if USE_AVX2
    // Broadcast constants to SIMD registers
    const __m256d drift_vec = _mm256_set1_pd(drift);
    const __m256d vol_sqrt_dt_vec = _mm256_set1_pd(vol_sqrt_dt);
    const __m256d S0_vec = _mm256_set1_pd(S0);
    size_t row_stride = (num_timesteps + SIMD_WIDTH - 1) & ~(SIMD_WIDTH - 1);
#else
    size_t row_stride = num_timesteps;
#endif

    double payoff_sum = 0.0;
    for (size_t scenario = 0; scenario < num_scenarios; ++scenario) {
        const double* Z_row = Z + scenario * row_stride;
#if USE_AVX2
        payoff_sum += simulate_path_avx(S0, K, drift, vol_sqrt_dt, Z_row, num_timesteps,
                                        drift_vec, vol_sqrt_dt_vec, S0_vec);
#else
        payoff_sum += simulate_path_scalar(S0, K, drift, vol_sqrt_dt, Z_row, num_timesteps);
#endif
    }

    return std::exp(-r * T) * (payoff_sum / static_cast<double>(num_scenarios));
} 

optimised.cpp - price_with_greeks 1.30s 880 total lines 
 // Greeks via bump-and-revalue (requires 4 full pricings)
constexpr double BUMP_SIZE = 1e-6;

void price_with_greeks(
    double S0, double K, double r, double vol, double T,
    const double* Z, size_t num_scenarios, size_t num_timesteps,
    double& price, double& delta, double& rho, double& vega
) {
    price = price_asian_option(S0, K, r, vol, T, Z, num_scenarios, num_timesteps);
    double p_dS = price_asian_option(S0 + BUMP_SIZE, K, r, vol, T, Z, num_scenarios, num_timesteps);
    double p_dr = price_asian_option(S0, K, r + BUMP_SIZE, vol, T, Z, num_scenarios, num_timesteps);
    double p_dv = price_asian_option(S0, K, r, vol + BUMP_SIZE, T, Z, num_scenarios, num_timesteps);
    delta = (p_dS - price) / BUMP_SIZE;
    rho   = (p_dr - price) / BUMP_SIZE;
    vega  = (p_dv - price) / BUMP_SIZE;
}

// Even with AVX2 SIMD, Greeks add +582% overhead (7 evaluations)! 

Key insight: AADC delivers 2871x faster than Basic Python and 28x faster Greeks than NumPy — outperforming Optimised C++ by 1.9x.

Greeks via AAD: 1 forward + 1 adjoint pass - +26% overhead vs +298% for NumPy bump-and-revalue.

Why This Approach Works

AADC doesn't change your computation. It produces an exact replica, mathematically proven, just accelerated. The integration code handles type annotations, recording setup, and kernel compilation — exactly the kind of repetitive, pattern-based work that's easy to validate.

For Prototyping

AI-assisted integration is excellent for quickly validating potential speedup on your actual models

For Production

Use MatLogica's AADC Toolkit with debugging support and automated scripts

Watch the Tutorial

See the complete workflow from start to finish

Technical Details

Arithmetic average Asian option under GBM
Monte Carlo simulation with Greeks (Delta, Rho, Vega)
Basic Python: 775 lines, 32 min
AADC Python: 852 lines, 0.67s
10 trades × 100K scenarios × 252 timesteps

Business Impact

Rapid model acceleration: integrate AADC in hours, not weeks
Prototype in Python, achieve production performance immediately
Accelerate existing models without rewriting from scratch
Build production systems in Python without performance compromises