Asian Option Monte Carlo Benchmark

The Key Finding

AADC Breaks the Trade-off

Traditionally, you choose productivity OR performance. AADC gives you both - and more.

Developer Productivity vs Runtime Performance

Asian Option Monte Carlo - 1000 trades x 500K scenarios - Price + Greeks

Developer Productivity

Runtime Performance

Basic

Basic Python

~16 hours* - 739 lines

NumPy

NumPy Vectorized

~2.75 hours* - 745 lines

AVX2

C++ Optimised

731s - 877 lines

Py AADC

AADC Python

136s - 822 lines

Basic Python

NumPy Vectorized

C++ Optimised

AADC Python

The Journey

Four Approaches to Monte Carlo Pricing

1,000 trades x 500K scenarios x 252 timesteps - Price + Greeks

Basic Python cannot scale for production Monte Carlo. With AADC, your existing Python code runs at C++ speeds with zero SIMD expertise required.

For risk-heavy workloads (which is most of quant finance), AADC delivers superior performance with dramatically less complexity.

Level 1

Basic Python

Simple but cannot scale

~16 hours

execution time

739

total lines

Run

Code

Simple, readable Python code
Bump & revalue for Greeks (4x)
No vectorization
Greeks add +302% overhead

Level 2

NumPy Vectorized

~6x faster than Basic

~2.75 hours

execution time

745

total lines

Run

Code

NumPy vectorization across scenarios
Bump & revalue for Greeks (4x)
No Python loops in hot path
Greeks add +306% overhead

Level 3

C++ Optimised

Fast but complex

731s

execution time

877

total lines

Run

Code

Manual AVX2 SIMD intrinsics
Bump & revalue for Greeks (7x)
Weeks of expert work
Greeks add +582% overhead

Level 4 ✓

AADC Python

Fast AND simple

136s

execution time

822

total lines

Run

Code

+83 lines added to Basic Python
AAD: 1 forward + 1 adjoint pass
5.4x faster Greeks than C++
Greeks add only +32% overhead

See the Difference

The Same Logic, Four Ways

Browse functions like files in VSCode - or ask for source code to run yourself

basic.py - gbm_constants ~16 hours 739 total lines 
 def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = math.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt 

basic.py - simulate_path ~16 hours 739 total lines 
 def simulate_path(S0, K, r, T, drift, vol_sqrt_dt, Z_path, num_timesteps):
    """Simulate GBM path and compute discounted payoff."""
    price = S0
    running_sum = 0.0
    for t in range(num_timesteps):
        price = price * math.exp(drift + vol_sqrt_dt * Z_path[t])
        running_sum += price

    average = running_sum / num_timesteps
    payoff = max(average - K, 0.0)
    discount = math.exp(-r * T)
    discounted_payoff = discount * payoff
    return discounted_payoff 

basic.py - price_asian_option ~16 hours 739 total lines 
 def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Price Asian option - loop over all scenarios."""
    dt, sqrt_dt, drift, vol_sqrt_dt = gbm_constants(r, sigma, T, num_timesteps)

    payoff_sum = 0.0
    for scenario in range(num_scenarios):
        payoff_sum += simulate_path(S0, K, r, T, drift, vol_sqrt_dt,
                                    Z[scenario], num_timesteps)

    return payoff_sum / num_scenarios 

basic.py - price_with_greeks ~16 hours 739 total lines 
 def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Compute price and Greeks via bump-and-revalue (4 pricings)."""
    bump = 1e-6
    price = price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps)
    delta = (price_asian_option(S0 + bump, K, r, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    rho   = (price_asian_option(S0, K, r + bump, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    vega  = (price_asian_option(S0, K, r, sigma + bump, T, Z, num_scenarios, num_timesteps) - price) / bump
    return price, delta, rho, vega

# Greeks require 4 full pricings - 4x the compute cost! 

numpy_pricer.py - gbm_constants ~2.75 hours 745 total lines 
 def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = np.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt

# Same as basic - using np.sqrt instead of math.sqrt 

numpy_pricer.py - simulate_path ~2.75 hours 745 total lines 
 def simulate_paths_vectorized(S0, K, r, T, drift, vol_sqrt_dt, Z, num_scenarios, num_timesteps):
    """Simulate all GBM paths at once using NumPy vectorization."""
    # Z is (num_scenarios, num_timesteps)
    log_increments = drift + vol_sqrt_dt * Z  # Vectorized across all scenarios
    log_prices = np.cumsum(log_increments, axis=1)
    prices = S0 * np.exp(log_prices)

    # Running average for Asian option
    running_sum = np.cumsum(prices, axis=1)
    averages = running_sum[:, -1] / num_timesteps

    payoffs = np.maximum(averages - K, 0.0)
    discount = np.exp(-r * T)
    return discount * payoffs

# Vectorized across scenarios - ~6x faster than Basic Python 

numpy_pricer.py - price_asian_option ~2.75 hours 745 total lines 
 def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Price Asian option using NumPy vectorization."""
    dt, sqrt_dt, drift, vol_sqrt_dt = gbm_constants(r, sigma, T, num_timesteps)

    # Single vectorized call - no Python loop over scenarios
    discounted_payoffs = simulate_paths_vectorized(
        S0, K, r, T, drift, vol_sqrt_dt, Z, num_scenarios, num_timesteps
    )

    return np.mean(discounted_payoffs)

# ~6x faster than Basic Python loops 

numpy_pricer.py - price_with_greeks ~2.75 hours 745 total lines 
 def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps):
    """Compute price and Greeks via bump-and-revalue (4 pricings)."""
    bump = 1e-6
    price = price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps)
    delta = (price_asian_option(S0 + bump, K, r, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    rho   = (price_asian_option(S0, K, r + bump, sigma, T, Z, num_scenarios, num_timesteps) - price) / bump
    vega  = (price_asian_option(S0, K, r, sigma + bump, T, Z, num_scenarios, num_timesteps) - price) / bump
    return price, delta, rho, vega

# Still 4 pricings needed - Greeks add +274% overhead 

aadc_pricer.py - gbm_constants 136s 822 total lines 
 def gbm_constants(r, sigma, T, num_timesteps):
    """Compute GBM simulation constants."""
    dt = T / num_timesteps
    sqrt_dt = np.sqrt(dt)
    drift = (r - 0.5 * sigma * sigma) * dt
    vol_sqrt_dt = sigma * sqrt_dt
    return dt, sqrt_dt, drift, vol_sqrt_dt

# Identical to naive - AADC works with regular Python code! 

aadc_pricer.py - simulate_path 136s 822 total lines 
 def simulate_path(S0, K, r, T, drift, vol_sqrt_dt, Z_vals, num_timesteps):
    """Simulate GBM path and compute discounted payoff."""
    price = S0
    running_sum = aadc.idouble(0.0)                                  # AADC
    for t in range(num_timesteps):
        price = price * np.exp(drift + vol_sqrt_dt * Z_vals[t])
        running_sum = running_sum + price

    average = running_sum / num_timesteps
    payoff = np.maximum(average - K, 0.0)
    discount = np.exp(-r * T)
    discounted_payoff = discount * payoff
    return discounted_payoff

# Only change: aadc.idouble for running_sum - enables AAD! 

aadc_pricer.py - price_asian_option 136s 822 total lines 
 def price_asian_option(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads=4):
    """Price Asian option using AADC - loop over all scenarios."""
    workers = aadc.ThreadPool(num_threads)                           # AADC

    # --- Record computation graph ---
    funcs = aadc.Functions()                                         # AADC
    funcs.start_recording()                                          # AADC

    # Active variables (use idouble instead of float)
    S0_v    = aadc.idouble(S0);    S0_arg    = S0_v.mark_as_input()   # AADC
    r_v     = aadc.idouble(r);     r_arg     = r_v.mark_as_input()    # AADC
    sigma_v = aadc.idouble(sigma); sigma_arg = sigma_v.mark_as_input()# AADC
    K_v     = aadc.idouble(K);     K_arg     = K_v.mark_as_input_no_diff()  # AADC
    T_v     = aadc.idouble(T);     T_arg     = T_v.mark_as_input_no_diff()  # AADC

    # ... record path simulation ...

    payoff_res = discounted_payoff.mark_as_output()                  # AADC
    funcs.stop_recording()                                           # AADC

    # Evaluate vectorized across scenarios
    request = {payoff_res: [S0_arg, r_arg, sigma_arg]}               # AADC
    results = aadc.evaluate(funcs, request, inputs, workers)         # AADC

    return results, payoff_res, S0_arg, r_arg, sigma_arg 

aadc_pricer.py - price_with_greeks 136s 822 total lines 
 def price_with_greeks(S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads=4):
    """Compute price and Greeks via AAD (1 forward + 1 adjoint pass)."""
    results, payoff_res, S0_arg, r_arg, sigma_arg = price_asian_option(
        S0, K, r, sigma, T, Z, num_scenarios, num_timesteps, num_threads
    )

    # --- Extract results ---                                        # AADC
    discounted_payoffs = results[0][payoff_res]                      # AADC
    price = float(np.mean(discounted_payoffs))                       # AADC

    # Greeks from single adjoint pass (no extra pricings needed!)    # AADC
    delta = float(np.mean(results[1][payoff_res][S0_arg]))           # AADC
    rho   = float(np.mean(results[1][payoff_res][r_arg]))            # AADC
    vega  = float(np.mean(results[1][payoff_res][sigma_arg]))        # AADC

    return price, delta, rho, vega

# All Greeks computed in ONE pass - +31% overhead vs +593%! 

optimised.cpp - gbm_constants 731s 877 total lines 
 void gbm_constants(double r, double vol, double T, size_t num_timesteps,
                   double& dt, double& sqrt_dt, double& drift, double& vol_sqrt_dt) {
    /**Compute GBM simulation constants.*/
    dt = T / static_cast<double>(num_timesteps);
    sqrt_dt = std::sqrt(dt);
    drift = (r - 0.5 * vol * vol) * dt;
    vol_sqrt_dt = vol * sqrt_dt;
} 

optimised.cpp - simulate_path 731s 877 total lines 
 // Scalar version (portable, works on ARM/Apple Silicon)
double simulate_path_scalar(double S0, double K, double drift, double vol_sqrt_dt,
                            const double* Z_row, size_t num_timesteps) {
    /**Simulate GBM path and compute payoff (scalar version).*/
    double price = S0;
    double running_sum = 0.0;

    for (size_t t = 0; t < num_timesteps; ++t) {
        price = price * std::exp(drift + vol_sqrt_dt * Z_row[t]);
        running_sum += price;
    }

    double average = running_sum / static_cast<double>(num_timesteps);
    double payoff = std::max(average - K, 0.0);
    return payoff;
}

// AVX2 SIMD version also available for x86-64 (~2x faster) 

optimised.cpp - price_asian_option 731s 877 total lines 
 // Core pricing function (auto-selects AVX2 or scalar)
double price_asian_option(
    double S0, double K, double r, double vol, double T,
    const double* Z, size_t num_scenarios, size_t num_timesteps
) {
    double dt, sqrt_dt, drift, vol_sqrt_dt;
    gbm_constants(r, vol, T, num_timesteps, dt, sqrt_dt, drift, vol_sqrt_dt);

#if USE_AVX2
    // Broadcast constants to SIMD registers
    const __m256d drift_vec = _mm256_set1_pd(drift);
    const __m256d vol_sqrt_dt_vec = _mm256_set1_pd(vol_sqrt_dt);
    const __m256d S0_vec = _mm256_set1_pd(S0);
    size_t row_stride = (num_timesteps + SIMD_WIDTH - 1) & ~(SIMD_WIDTH - 1);
#else
    size_t row_stride = num_timesteps;
#endif

    double payoff_sum = 0.0;
    for (size_t scenario = 0; scenario < num_scenarios; ++scenario) {
        const double* Z_row = Z + scenario * row_stride;
#if USE_AVX2
        payoff_sum += simulate_path_avx(S0, K, drift, vol_sqrt_dt, Z_row, num_timesteps,
                                        drift_vec, vol_sqrt_dt_vec, S0_vec);
#else
        payoff_sum += simulate_path_scalar(S0, K, drift, vol_sqrt_dt, Z_row, num_timesteps);
#endif
    }

    return std::exp(-r * T) * (payoff_sum / static_cast<double>(num_scenarios));
} 

optimised.cpp - price_with_greeks 731s 877 total lines 
 // Greeks via bump-and-revalue (requires 4 full pricings)
constexpr double BUMP_SIZE = 1e-6;

void price_with_greeks(
    double S0, double K, double r, double vol, double T,
    const double* Z, size_t num_scenarios, size_t num_timesteps,
    double& price, double& delta, double& rho, double& vega
) {
    price = price_asian_option(S0, K, r, vol, T, Z, num_scenarios, num_timesteps);
    double p_dS = price_asian_option(S0 + BUMP_SIZE, K, r, vol, T, Z, num_scenarios, num_timesteps);
    double p_dr = price_asian_option(S0, K, r + BUMP_SIZE, vol, T, Z, num_scenarios, num_timesteps);
    double p_dv = price_asian_option(S0, K, r, vol + BUMP_SIZE, T, Z, num_scenarios, num_timesteps);
    delta = (p_dS - price) / BUMP_SIZE;
    rho   = (p_dr - price) / BUMP_SIZE;
    vega  = (p_dv - price) / BUMP_SIZE;
}

// Even with AVX2 SIMD, Greeks add +582% overhead (7 evaluations)! 

Key insight: AADC delivers 420x faster than Basic Python and 73x faster Greeks than NumPy — outperforming hand-optimised C++ by 5.4x.

Greeks via AAD: 1 forward + 1 adjoint pass - +32% overhead vs +306% for NumPy bump-and-revalue.

Key Insights

Performance: AADC Python delivers 420x faster than Basic Python, 73x faster Greeks than NumPy, and 5.4x faster Greeks than hand-optimised C++.

Greeks overhead: Greeks via AAD require just 1 forward + 1 adjoint pass — +32% overhead vs +306% for NumPy and +582% for C++ bump-and-revalue.

Greeks scaling: NumPy cost grows linearly with each Greek (4 evaluations for 3 Greeks, 11 for 10, 51 for 50), while AADC computes ALL Greeks in constant time — making AADC 73x faster at 3 Greeks, and scaling even better with more Greeks.

Greeks	NumPy Evals	AADC Evals	AADC Advantage
3	4	~1.3	10x
10	11	~1.3	27x
50	51	~1.3	127x

The Greeks Tax

Greeks: Where AADC Shines

AAD vs traditional bump-and-revalue - Risk calculations require Greeks (Delta, Rho, Vega)

Traditional Method Bump & Revalue

Requires 4-7 full pricings per trade for Delta, Rho, Vega sensitivities.

Basic Python

+302%

NumPy

+306%

C++ Optimised

+582%

C++ with 1000 trades: 731 seconds

vs

AADC Method Adjoint Differentiation

Requires just 1 forward + 1 adjoint pass - all Greeks in one sweep.

AADC Python

+32%

AADC with 1000 trades: 136 seconds

5.4x faster than C++

* Greeks overhead = additional time beyond price-only calculation

The Process

How It Works

From specification to production benchmark in hours

1

Your Model

Use your model or a model developed by AI with your specification. If using AI, be sure to define all parameters, constraints, do's and don'ts.

→

2

Enable AADC

The AADC-Agent converts your model (Python or C++) to AADC-enabled versions. Works like any Claude task - ask it to fix issues. Choose your language and hardware.

→

3

Benchmark & Deploy

Get your benchmark in hours, not weeks. See exactly how AADC performs. If AI can't help, MatLogica is always there for expert support.

i We don't advise developing quant models using AI, but it's a good way to get started prototyping with AADC.

i We don't suggest using AI for production integrations. MatLogica's integration/debugging toolkit should be used for production integrations.

Ready to Try It Yourself?

Get the CLAUDE.md template files and request a demo version of MatLogica AADC. All benchmark code is available to run, validate, and verify.

Request Demo Get CLAUDE.md Files Learn More

Test Environment

Hardware Specification

Benchmarks executed on enterprise-grade server hardware

System Configuration

CPU	2x Intel Xeon Platinum 8280L @ 2.70GHz
Cores	56 physical (28 per socket), 112 threads
Architecture	x86_64, Cascade Lake
L3 Cache	77 MiB (38.5 MiB per socket)
RAM	283 GB
OS	Linux kernel 6.1.0-13-amd64 (Debian 64-bit)

CPU Features: AVX-512, AVX2, FMA, AES-NI

Test Configuration

Trades	1,000
Scenarios	500,000
Timesteps	252
Threads	16
Total Paths	500M

Asian Option Monte Carlo with GBM dynamics

Compilers

GCC	12.2.0 (Debian)
Clang	14.0.6 (Debian)
Python	3.11.2 + AADC

Dual-socket server with AVX-512 vectorization

5.4x Faster Greeks Than Hand-Optimised C++.Just Python - Accelerated.

AADC Breaks the Trade-off

Four Approaches to Monte Carlo Pricing

Basic Python

NumPy Vectorized

C++ Optimised

AADC Python

The Same Logic, Four Ways

Key Insights

Greeks: Where AADC Shines

How It Works

Your Model

Enable AADC

Benchmark &amp; Deploy

Ready to Try It Yourself?

Related Solutions

Python Accelerator

Front Office Solutions

Quant Technology

Hardware Specification

System Configuration

Test Configuration

Compilers

5.4x Faster Greeks Than Hand-Optimised C++.
Just Python - Accelerated.

Benchmark & Deploy