Python Acceleration

Accelerate Python Models 420x with AADC

Prototype in Python, achieve C++ performance. Accelerate your existing Python pricing models 420x with minimal code changes — no rewrites required.

420x speedup with just +83 lines of code
Iterate in Python at C++ speeds
AADC 5.4x faster Greeks than hand-optimised C++

Performance Results

VersionLines of CodeExecution TimeSpeedup
Basic Python 739 ~16 hours
NumPy 745 ~2.75 hours 6x
C++ Optimised 877 731s 115x
Python + AADC 822 (+83) 136s 420x

Benchmark Configuration

1000 trades × 500K scenarios × 252 timesteps with 8 threads — all Greeks (Delta, Rho, Vega) computed.

AADC vs Hand-Optimised C++

For pricing-only, C++ edges out AADC slightly
With Greeks, AADC is 5.4x faster than hand-optimised C++
Traditional bump-and-revalue adds +582% overhead for Greeks
AADC adds only +32% overhead while computing all Greeks
See the Difference

The Same Logic, Four Ways

Browse functions like files in VSCode - or ask for source code to run yourself

  • f gbm_constants
  • f simulate_path
  • f price_asian_option
  • f price_with_greeks
basic.py - gbm_constants
~16 hours 739 total lines
def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = math.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt
basic.py - simulate_path
~16 hours 739 total lines
def simulate_path(S0, K, r, T, drift, vol_sqrt_dt, Z_path, num_timesteps):
    """Simulate GBM path and compute discounted payoff."""
    price = S0
    running_sum = 0.0
    for t in range(num_timesteps):
        price = price * math.exp(drift + vol_sqrt_dt * Z_path[t])
        running_sum += price

    average = running_sum / num_timesteps
    payoff = max(average - K, 0.0)
    discount = math.exp(-r * T)
    discounted_payoff = discount * payoff
    return discounted_payoff
basic.py - price_asian_option
~16 hours 739 total lines
def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Price Asian option - loop over all scenarios."""
    dt, sqrt_dt, drift, vol_sqrt_dt = gbm_constants(r, sigma, T, num_timesteps)

    payoff_sum = 0.0
    for scenario in range(num_scenarios):
        payoff_sum += simulate_path(S0, K, r, T, drift, vol_sqrt_dt,
                                    Z[scenario], num_timesteps)

    return payoff_sum / num_scenarios
basic.py - price_with_greeks
~16 hours 739 total lines
def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Compute price and Greeks via bump-and-revalue (4 pricings)."""
    bump = 1e-6
    price = price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps)
    delta = (price_asian_option(S0 + bump, K, r, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    rho   = (price_asian_option(S0, K, r + bump, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    vega  = (price_asian_option(S0, K, r, sigma + bump, T, Z, num_scenarios, num_timesteps) - price) / bump
    return price, delta, rho, vega

# Greeks require 4 full pricings - 4x the compute cost!
numpy_pricer.py - gbm_constants
~2.75 hours 745 total lines
def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = np.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt

# Same as basic - using np.sqrt instead of math.sqrt
numpy_pricer.py - simulate_path
~2.75 hours 745 total lines
def simulate_paths_vectorized(S0, K, r, T, drift, vol_sqrt_dt, Z, num_scenarios, num_timesteps):
    """Simulate all GBM paths at once using NumPy vectorization."""
    # Z is (num_scenarios, num_timesteps)
    log_increments = drift + vol_sqrt_dt * Z  # Vectorized across all scenarios
    log_prices = np.cumsum(log_increments, axis=1)
    prices = S0 * np.exp(log_prices)

    # Running average for Asian option
    running_sum = np.cumsum(prices, axis=1)
    averages = running_sum[:, -1] / num_timesteps

    payoffs = np.maximum(averages - K, 0.0)
    discount = np.exp(-r * T)
    return discount * payoffs

# Vectorized across scenarios - ~6x faster than Basic Python
numpy_pricer.py - price_asian_option
~2.75 hours 745 total lines
def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Price Asian option using NumPy vectorization."""
    dt, sqrt_dt, drift, vol_sqrt_dt = gbm_constants(r, sigma, T, num_timesteps)

    # Single vectorized call - no Python loop over scenarios
    discounted_payoffs = simulate_paths_vectorized(
        S0, K, r, T, drift, vol_sqrt_dt, Z, num_scenarios, num_timesteps
    )

    return np.mean(discounted_payoffs)

# ~6x faster than Basic Python loops
numpy_pricer.py - price_with_greeks
~2.75 hours 745 total lines
def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Compute price and Greeks via bump-and-revalue (4 pricings)."""
    bump = 1e-6
    price = price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps)
    delta = (price_asian_option(S0 + bump, K, r, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    rho   = (price_asian_option(S0, K, r + bump, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    vega  = (price_asian_option(S0, K, r, sigma + bump, T, Z, num_scenarios, num_timesteps) - price) / bump
    return price, delta, rho, vega

# Still 4 pricings needed - Greeks add +274% overhead
aadc_pricer.py - gbm_constants
136s 822 total lines
def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = np.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt

# Identical to naive - AADC works with regular Python code!
aadc_pricer.py - simulate_path
136s 822 total lines
def simulate_path(S0, K, r, T, drift, vol_sqrt_dt, Z_vals, num_timesteps):
    """Simulate GBM path and compute discounted payoff."""
    price = S0
    running_sum = aadc.idouble(0.0)                                  # AADC
    for t in range(num_timesteps):
        price = price * np.exp(drift + vol_sqrt_dt * Z_vals[t])
        running_sum = running_sum + price

    average = running_sum / num_timesteps
    payoff = np.maximum(average - K, 0.0)
    discount = np.exp(-r * T)
    discounted_payoff = discount * payoff
    return discounted_payoff

# Only change: aadc.idouble for running_sum - enables AAD!
aadc_pricer.py - price_asian_option
136s 822 total lines
def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads=4):
    """Price Asian option using AADC - loop over all scenarios."""
    workers = aadc.ThreadPool(num_threads)                           # AADC

    # --- Record computation graph ---
    funcs = aadc.Functions()                                         # AADC
    funcs.start_recording()                                          # AADC

    # Active variables (use idouble instead of float)
    S0_v    = aadc.idouble(S0);    S0_arg    = S0_v.mark_as_input()   # AADC
    r_v     = aadc.idouble(r);     r_arg     = r_v.mark_as_input()    # AADC
    sigma_v = aadc.idouble(sigma); sigma_arg = sigma_v.mark_as_input()# AADC
    K_v     = aadc.idouble(K);     K_arg     = K_v.mark_as_input_no_diff()  # AADC
    T_v     = aadc.idouble(T);     T_arg     = T_v.mark_as_input_no_diff()  # AADC

    # ... record path simulation ...

    payoff_res = discounted_payoff.mark_as_output()                  # AADC
    funcs.stop_recording()                                           # AADC

    # Evaluate vectorized across scenarios
    request = {payoff_res: [S0_arg, r_arg, sigma_arg]}               # AADC
    results = aadc.evaluate(funcs, request, inputs, workers)         # AADC

    return results, payoff_res, S0_arg, r_arg, sigma_arg
aadc_pricer.py - price_with_greeks
136s 822 total lines
def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads=4):
    """Compute price and Greeks via AAD (1 forward + 1 adjoint pass)."""
    results, payoff_res, S0_arg, r_arg, sigma_arg = price_asian_option(
        S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads
    )

    # --- Extract results ---                                        # AADC
    discounted_payoffs = results[0][payoff_res]                      # AADC
    price = float(np.mean(discounted_payoffs))                       # AADC

    # Greeks from single adjoint pass (no extra pricings needed!)    # AADC
    delta = float(np.mean(results[1][payoff_res][S0_arg]))           # AADC
    rho   = float(np.mean(results[1][payoff_res][r_arg]))            # AADC
    vega  = float(np.mean(results[1][payoff_res][sigma_arg]))        # AADC

    return price, delta, rho, vega

# All Greeks computed in ONE pass - +31% overhead vs +593%!
optimised.cpp - gbm_constants
731s 877 total lines
void gbm_constants(double r, double vol, double T, size_t num_timesteps,
                   double& dt, double& sqrt_dt, double& drift, double& vol_sqrt_dt) {
    /**Compute GBM simulation constants.*/
    dt = T / static_cast<double>(num_timesteps);
    sqrt_dt = std::sqrt(dt);
    drift = (r - 0.5 * vol * vol) * dt;
    vol_sqrt_dt = vol * sqrt_dt;
}
optimised.cpp - simulate_path
731s 877 total lines
// Scalar version (portable, works on ARM/Apple Silicon)
double simulate_path_scalar(double S0, double K, double drift, double vol_sqrt_dt,
                            const double* Z_row, size_t num_timesteps) {
    /**Simulate GBM path and compute payoff (scalar version).*/
    double price = S0;
    double running_sum = 0.0;

    for (size_t t = 0; t < num_timesteps; ++t) {
        price = price * std::exp(drift + vol_sqrt_dt * Z_row[t]);
        running_sum += price;
    }

    double average = running_sum / static_cast<double>(num_timesteps);
    double payoff = std::max(average - K, 0.0);
    return payoff;
}

// AVX2 SIMD version also available for x86-64 (~2x faster)
optimised.cpp - price_asian_option
731s 877 total lines
// Core pricing function (auto-selects AVX2 or scalar)
double price_asian_option(
    double S0, double K, double r, double vol, double T,
    const double* Z, size_t num_scenarios, size_t num_timesteps
) {
    double dt, sqrt_dt, drift, vol_sqrt_dt;
    gbm_constants(r, vol, T, num_timesteps, dt, sqrt_dt, drift, vol_sqrt_dt);

#if USE_AVX2
    // Broadcast constants to SIMD registers
    const __m256d drift_vec = _mm256_set1_pd(drift);
    const __m256d vol_sqrt_dt_vec = _mm256_set1_pd(vol_sqrt_dt);
    const __m256d S0_vec = _mm256_set1_pd(S0);
    size_t row_stride = (num_timesteps + SIMD_WIDTH - 1) & ~(SIMD_WIDTH - 1);
#else
    size_t row_stride = num_timesteps;
#endif

    double payoff_sum = 0.0;
    for (size_t scenario = 0; scenario < num_scenarios; ++scenario) {
        const double* Z_row = Z + scenario * row_stride;
#if USE_AVX2
        payoff_sum += simulate_path_avx(S0, K, drift, vol_sqrt_dt, Z_row, num_timesteps,
                                        drift_vec, vol_sqrt_dt_vec, S0_vec);
#else
        payoff_sum += simulate_path_scalar(S0, K, drift, vol_sqrt_dt, Z_row, num_timesteps);
#endif
    }

    return std::exp(-r * T) * (payoff_sum / static_cast<double>(num_scenarios));
}
optimised.cpp - price_with_greeks
731s 877 total lines
// Greeks via bump-and-revalue (requires 4 full pricings)
constexpr double BUMP_SIZE = 1e-6;

void price_with_greeks(
    double S0, double K, double r, double vol, double T,
    const double* Z, size_t num_scenarios, size_t num_timesteps,
    double& price, double& delta, double& rho, double& vega
) {
    price = price_asian_option(S0, K, r, vol, T, Z, num_scenarios, num_timesteps);
    double p_dS = price_asian_option(S0 + BUMP_SIZE, K, r, vol, T, Z, num_scenarios, num_timesteps);
    double p_dr = price_asian_option(S0, K, r + BUMP_SIZE, vol, T, Z, num_scenarios, num_timesteps);
    double p_dv = price_asian_option(S0, K, r, vol + BUMP_SIZE, T, Z, num_scenarios, num_timesteps);
    delta = (p_dS - price) / BUMP_SIZE;
    rho   = (p_dr - price) / BUMP_SIZE;
    vega  = (p_dv - price) / BUMP_SIZE;
}

// Even with AVX2 SIMD, Greeks add +582% overhead (7 evaluations)!

Key insight: AADC delivers 420x faster than Basic Python and 73x faster Greeks than NumPy — outperforming hand-optimised C++ by 5.4x.

Greeks via AAD: 1 forward + 1 adjoint pass - +32% overhead vs +306% for NumPy bump-and-revalue.

Why This Approach Works

AADC doesn't change your computation. It produces an exact replica, mathematically proven, just accelerated. The integration code handles type annotations, recording setup, and kernel compilation — exactly the kind of repetitive, pattern-based work that's easy to validate.

For Prototyping

AI-assisted integration is excellent for quickly validating potential speedup on your actual models

For Production

Use MatLogica's AADC Toolkit with debugging support and automated scripts

Watch the Tutorial

See the complete workflow from start to finish

Technical Details

  • Arithmetic average Asian option under GBM
  • Monte Carlo simulation with Greeks (Delta, Rho, Vega)
  • Basic Python: 739 lines, ~16 hours
  • AADC Python: 822 lines, 136s
  • 1000 trades × 500K scenarios × 252 timesteps

Business Impact

  • Rapid model acceleration: integrate AADC in hours, not weeks
  • Prototype in Python, achieve production performance immediately
  • Accelerate existing models without rewriting from scratch
  • Build production systems in Python without performance compromises

Ready to Accelerate Your Python Models?

See what AADC can do for your specific use case. Schedule a demo or get the Claude configuration files to try it yourself.

Tags: PythonaccelerationMonte CarloAADC integrationperformanceGreeksprototyping

Frequently Asked Questions

Can Python really achieve C++ performance with AADC?
Yes. AADC Python achieves 420x speedup over basic Python, and when computing Greeks, AADC is 5.4x faster than hand-optimised C++. Teams are building production systems in Python with AADC, iterating at speeds that would normally require hand-optimised C++.
What's the recommended path for evaluating AADC?
Prototype in Python, observe performance on a real model, then harden for production with the toolkit. This approach lets you see actual speedup on your specific models before any commitment. Some teams like this enough that they build production systems this way.
How much code change is required for AADC acceleration?
Only +83 lines of code are needed to achieve 420x speedup. AADC integration handles type annotations, recording setup, and kernel compilation — the boilerplate work that's straightforward to validate.
How does AADC compare to hand-optimised C++ for Greeks calculation?
AADC is 5.4x faster than hand-optimised C++ for computing Greeks (136s vs 731s). While C++ may edge out AADC slightly for pricing-only, AADC dramatically outperforms when Greeks are required. Traditional C++ adds +582% overhead for Greeks via bump-and-revalue, while AADC adds only +32%.