Benchmark on interest rate swap portfolios under a Hull-White model with CSA collateral rules. AADC computes all 535 CVA/DVA sensitivities in the time bump & revalue computes fewer than 5.
| Parameter | Value |
|---|---|
| Model | Hull-White 1-factor with mean reversion |
| Instruments | Interest rate swaps (fixed + floating legs) |
| Collateral | CSA rules with threshold and minimum transfer amounts |
| Time horizon | 30 years (10,950 days) |
| Model step | 30 days (365 steps) |
| Pricing frequency | Every 3rd model step (122 pricing times) |
| Risk factors | 535 (r0, sigma, 251 MR curve, 141+141 survival curves) |
| Hardware | CPU with AVX-256 support, 16 threads |
Note: This is an isolated benchmark example using a single-factor Hull-White model with interest rate swaps. Production XVA libraries typically involve additional complexity — multi-factor models, exotic instruments, cross-currency exposures, MVA/KVA, and regulatory constraints — all of which are well-understood by MatLogica. Production deployments also benefit from additional optimisations (kernel partitioning, incremental recompilation, warm caching across trading desks) not exercised here. As a result, real-world speedups will vary by implementation, but remain comfortably in the 10× and above range even for the most complex portfolios.
Full sensitivity set — 535 risk factors, all computed simultaneously
N+1 full MC re-simulations for N risk factors. Each sensitivity requires a complete Monte Carlo reprice.
1 forward + 1 reverse sweep per MC path. All 535 gradients computed simultaneously — cost independent of N.
Log scale — includes all CVA + DVA Greeks
| Phase | 5 Trades | 50 Trades | 200 Trades |
|---|---|---|---|
| Kernel compilation (one-time) | 0.77s | 1.87s | 8.10s |
| Evaluation (prices + 535 Greeks) | 0.024s | ~0.20s | ~0.88s |
| Relative perf vs primal | 18.75% | 0.78% | 0.70% |
Relative performance = (AADC forward + 2 reverse sweeps) / (single double forward pass). Values below 100% mean the full gradient computation costs less than a single repricing.
| Portfolio | Primal (1 run) | 535 Bumps |
|---|---|---|
| 5 trades | 0.129s | ~69s |
| 50 trades | 25.8s | ~3.8 hr |
| 200 trades | 125.0s | ~18.6 hr |
Total time = (N + 1) × single MC time. Cost grows linearly with the number of risk factors.
Bump & revalue cost grows linearly — AADC stays constant at ~2× primal
Warm run: When only market data changes (rate curves, survival probabilities), AADC reuses the compiled kernel — 0s compilation overhead. For 50 trades at 10K paths: cold start 18.2s → warm run 13.9s (1.3× faster).
The O(N) vs O(1) scaling difference at production scale
AADC's full-year cloud spend is less than the cost of a single bump & revalue run.
| Risk Factors (N) | B&R Annual (spot) | AADC Annual (spot) | B&R / AADC |
|---|---|---|---|
| 50 | ~$1.1M | ~$4.8K | 229× |
| 535 | ~$11.7M | ~$4.8K | 2,438× |
| 2,000 | ~$43.7M | ~$4.8K | 9,104× |
Estimates assume linear scaling of B&R compute with risk factor count. AADC evaluation cost is independent of N. Infrastructure overhead excluded. Actual costs vary with instance selection, region, and reserved pricing. GPU instances can reduce B&R wall-clock time but do not change the O(N) scaling.
AADC produces mathematically exact derivatives — to machine precision
| Metric | AADC | Bump & Revalue (1e-8) |
|---|---|---|
| CVA match to primal | Exact (< 1e-14) | N/A |
| DVA match to primal | Exact (< 1e-14) | N/A |
| dCVA/d(sigma) agreement | Reference | ~6 digits |
| dCVA/d(r0) agreement | Reference | ~6 digits |
Compute CVA/DVA + all 535 sensitivities for the entire portfolio.
Rate curves or survival probabilities change — re-compute CVA + sensitivities.
A new trade of a type already in the portfolio is added.
aadc::idouble replaces double directly — no templates required
double rate = 0.05; double price = notional * exp(-rate * T); double cva = max(price, 0.0) * (1.0 - survival_prob);
aadc::idouble rate = 0.05; aadc::idouble price = notional * exp(-rate * T); aadc::idouble cva = max(price, 0.0) * (1.0 - survival_prob);
double → aadc::idouble in the valuation path
startRecording() / stopRecording() around computation
markAsDiff() for inputs, markAsOutput() for outputs
forward() + reverse() → all sensitivities
No refactoring needed. aadc::idouble overloads all arithmetic operators and math functions.
The template approach (XVAProblem<T>) used in this benchmark is an engineering convenience for maintaining both code paths side-by-side — it is not an AADC requirement.
Full source from the benchmark — same business logic runs with double or aadc::idouble
// XVAProblem.h — same code for both double and aadc::idouble template<typename T> class XVAProblem { public: void simulatePath(const std::vector<T>& random_vec) { m_model->initT0(m_model_times[0]); HullWhiteIrMarket<T> market = m_model->getIrMarket(); m_CSA_object->initCollateral(); for (int i = 0; i < m_model_times.size(); i++) { if (i > 0) m_model->nextQT(random_vec[i], m_model_times[i]); if (m_is_pricing[i]) { market = m_model->getIrMarket(); m_total_price = 0; for (int ti = 0; ti < numTrades; ++ti) { m_total_price += m_fixed_leg[ti]->getPrice(market, t); m_total_price += m_float_leg[ti]->getPrice(market, t); } m_CSA_object->next(m_total_price); T CSA_total_price = m_total_price - m_CSA_object->getCollateral(); m_PEE[price_time_index] += max(CSA_total_price, 0.); m_NEE[price_time_index] += min(0., CSA_total_price); } } } void computeXVAMeasures() { // Trapezoidal rule for CVA/DVA integration for (int i = 0; i < i_end - 1; i++) { m_CVA += (m_PEE[i] + m_PEE[i+1]) * (ctrparty_prev - ctrparty_next) * 0.5; m_DVA += (m_NEE[i] + m_NEE[i+1]) * (company_prev - company_next) * 0.5; } } };
The template parameter T is the only difference. Supporting classes (HullWhiteMarketModel, CSARules, FixedLeg, FloatLeg) are all templated on T.
// XVAJobRequest.h — bumpAndRevalue() void bumpAndRevalue(const json& request_data) { double bump_size = 1e-8; // For EACH mean reversion curve point (251 points): for (int i = 0; i < xva.getModel()->getMeanRev()->getVals().size(); i++) { json bump_data(request_data); bump_data["Currencies"]["EUR"]["HWMeanReversionCurve"]["bump_index"] = i; bump_data["Currencies"]["EUR"]["HWMeanReversionCurve"]["bump_size"] = bump_size; XVAProblem<double> bumped_xva; bumped_xva.initData(bump_data); // Re-initialize with bumped parameter // Run FULL Monte Carlo simulation again calcRiskByBump(risk_results, base_results, bumped_xva, "MeanRev", i, bump_size, mc_iterations, randoms); } // For EACH survival curve point (141 + 141 points): for (int i = 0; i < xva.getCtrpSurvCurv()->getZeroRatesVector().size(); i++) { json bump_data(request_data); bump_data["CounterPartySurvivalCurve"]["bump_index"] = i; bump_data["CounterPartySurvivalCurve"]["bump_size"] = bump_size; // ... full MC re-simulation ... } // ... same for company survival, sigma, r0 ... } // Each bump requires the full MC loop: void calcRiskByBump(json& risk_results, const json& base_results, XVAProblem<double>& XVA, ...) { XVA.prepareSimulations(); for (int mc_i = 0; mc_i < mc_iterations; mc_i++) { XVA.simulatePath(randoms[mc_i]); // Full path simulation } XVA.computeXVAMeasures(); // Finite difference: (bumped_CVA - base_CVA) / bump_size risk_results["CVA"][risk_id][risk_index] = (XVA.getCVA() / mc_iterations - base_results["CVA"].get<double>()) / bump_size; }
Total work: (535 + 1) full MC simulations = 536× the cost of a single pricing run. Each sensitivity is independent but requires a complete reprice.
// XVAJobRequest.h — compileAADFunction() void compileAADFunction(const json& request_data) { // Random variables as input (no derivatives needed for these) std::vector<idouble> aad_random_vec(XVA.numberOfRandomVars()); m_aad_funcs->startRecording(); // Mark random inputs markVectorAsInput(m_random_arg, aad_random_vec, false); // Mark market data as variable inputs (can change without recompilation) markVariableInputs(getArgumentsMap(), true); // Create the XVA problem with aadc::idouble — same code as double version XVAProblem<idouble> aad_XVA; aad_XVA.initData(m_data); idouble::CheckPoint(); // Mark ALL risk factors for differentiation (one-time setup) m_xva_diff_args.ir_crvs[0].r0 = aad_XVA.getModel()->getR0().markAsDiff(); m_xva_diff_args.ir_crvs[0].sigma = aad_XVA.getModel()->getSigma().markAsDiff(); for (int i = 0; i < meanRevSize; i++) m_xva_diff_args.ir_crvs[0].mr_crv.push_back( aad_XVA.getModel()->getMeanRev()->getVals()[i].markAsDiff()); for (int i = 0; i < survivalCurveSize; i++) m_xva_diff_args.company_surv_crv.default_rates.push_back( aad_XVA.getCompSurvCurv()->getZeroRatesVector()[i].markAsDiff()); // ... counterparty survival curve ... // Record ONE forward pass (same simulatePath + computeXVAMeasures) aad_XVA.prepareSimulations(); aad_XVA.simulatePath(aad_random_vec); aad_XVA.computeXVAMeasures(); // Mark outputs m_res_args.CVA = aad_XVA.getCVA().markAsOutput(); m_res_args.DVA = aad_XVA.getDVA().markAsOutput(); m_aad_funcs->stopRecording(); // Kernel is now compiled to optimized AVX2/AVX-512 machine code }
markAsDiff() tells AADC which inputs need gradients. All 535 risk factors are marked once during recording — the compiled kernel computes all their gradients on every reverse sweep.
// XVAJobRequest.h — aADExecution() void aADExecution(const json& request_data, int threads_num) { auto threadWorker = [&](int th_i) { // Each thread gets its own AADC workspace auto ws = m_aad_funcs->createWorkSpace(); // Set market data inputs (can update without recompilation) for (auto& [path, arg] : request_variable_inputs) ws->val(arg.second) = mmSetConst<mmType>(request_data[path]); m_aad_funcs->forward(*ws, 0, 0); // Initialize once for (int mc_i = 0; mc_i < AVX_iterations; mc_i++) { // Set random numbers (4 paths at once with AVX-256) setAVXVector(*ws, m_random_arg, mm_randoms[mc_i + AVX_iterations * th_i]); // Forward sweep: computes prices for 4 MC paths simultaneously m_aad_funcs->forward(*ws, 1, -1); // Reverse sweep for CVA sensitivities (ALL 535 gradients at once) ws->resetDiff(); ws->diff(m_res_args.CVA) = mmSetConst<mmType>(1); m_aad_funcs->reverse(*ws, 1, -1); // Reverse sweep for DVA sensitivities (ALL 535 gradients at once) ws->resetDiff(); ws->diff(m_res_args.DVA) = mmSetConst<mmType>(1); m_aad_funcs->reverse(*ws, 1, -1); } }; // Launch threads — each processes its share of MC paths for (int i = 0; i < threads_num; i++) threads.push_back(new std::thread(threadWorker, i)); for (auto& t : threads) t->join(); } // Kernel caching: zero-cost market data updates void processRequest(const json& request_data, ...) { auto cached = find_in_cache(m_data); if (cached != cache.end()) { m_aad_funcs = cached->aad_funcs; // REUSE: ~0ms } else { compileAADFunction(request_data); // COMPILE: ~1-8s cache.push_back(new_kernel); } aADExecution(request_data, threads_num); }
AVX-256 processes 4 MC paths per SIMD instruction. Each thread works independently on its share of paths. Market data updates reuse the cached kernel with zero recompilation overhead.
| Aspect | Bump & Revalue | AADC |
|---|---|---|
| Type change | None (native double) | double → aadc::idouble |
| Business logic | Unchanged | Unchanged |
| Sensitivity loop | 536 full MC re-runs | 0 (reverse sweep) |
| One-time setup | None | Tape + compile (~1-8s) |
| Per-path work | 1 forward pass | 1 fwd + 2 reverse |
| Parallelism | Single-threaded | Multi-threaded + AVX |
| Market data update | Re-run everything | No recompilation |
AADC reverse sweeps — cost independent of N risk factors
Over bump & revalue for 535 parameters
Machine precision vs ~6 digits for finite differences
Cloud cost reduction at 500K-trade production scale
| Dimension | Bump & Revalue | AADC |
|---|---|---|
| Code changes | None (native double) | double → aadc::idouble + tape recording |
| Sensitivity cost | O(N) full MC re-simulations | O(1) reverse sweeps |
| Speedup (535 params) | Baseline | 360–3,300× |
| Accuracy | ~6 digits (bump-dependent) | Machine precision (<1e-14) |
| Market data update | Full re-computation | Kernel reuse (0s compilation) |
| Cloud cost (500K trades) | ~$11.7M/yr (spot) | ~$4.8K/yr (spot) |
| SIMD vectorization | Not applicable | AVX-256/512 (4–8 paths/cycle) |
| Multi-threading | Parallel but N× work | Parallel with 1× work |
AADC transforms XVA risk computation from an O(N) problem into an O(1) problem with respect to the number of risk factors, while maintaining mathematical exactness and requiring minimal code changes to the existing valuation library.
| Risk Factor | Count | Description |
|---|---|---|
| r0 (short rate) | 1 | Hull-White initial rate |
| sigma (volatility) | 1 | Hull-White volatility |
| Mean reversion curve | 251 | Piecewise-linear, 50yr horizon, 0.2yr step |
| Counterparty survival curve | 141 | 14,000-day horizon, 100-day step |
| Company survival curve | 141 | 14,000-day horizon, 100-day step |
| Total | 535 |
The benchmark: Hull-White 1-factor model with IR swaps, CSA collateral rules, 535 risk factors (r0, sigma, mean reversion curve, counterparty and company survival curves), 30-year horizon.
Important context: This benchmark was produced with Intel in 2019 to demonstrate AADC's capabilities on a clean, well-understood model. It deliberately uses vanilla IR swaps to isolate the AAD performance gains from model complexity. It doesn't cover more complex exotics like Bermudans, TARFs, or multi-factor hybrids.
If there's a specific asset class that would make this more convincing for your book, we'd be happy to extend the benchmark — let us know what would resonate.
Cloud cost estimates extrapolated linearly from benchmark baselines. Actual costs vary with instance selection, region, and pricing model.
AADC achieves 360-3,300× speedup over bump & revalue for computing all 535 CVA/DVA sensitivities. For 200 trades, AADC computes all sensitivities in ~0.88 seconds vs ~18.6 hours for bump & revalue. The speedup scales with the number of risk factors because AADC cost is O(M) while bump & revalue is O(N × M).
At production scale (500,000 IR swaps, 50K MC paths, 535 risk factors), AADC reduces annual cloud spend from ~$11.7M to ~$4,800 — a 2,438× cost reduction. AADC needs 7 c5.9xlarge nodes to complete within a 4-hour batch window, compared to ~21,100 nodes for bump & revalue.
AADC produces mathematically exact derivatives to machine precision (<1e-14 error vs primal). Bump & revalue achieves approximately 6 digits of accuracy and is sensitive to bump-size selection. AADC eliminates subtractive cancellation errors inherent in finite-difference methods.
AADC is a drop-in replacement: change 'double' to 'aadc::idouble' in the valuation path. No business logic changes are required. Integration steps: replace types, record tape with startRecording()/stopRecording(), mark inputs/outputs, then call forward()+reverse() for all sensitivities.
No. When only market data changes (rate curves, survival probabilities), AADC reuses the compiled kernel with zero compilation overhead. For 50 trades at 10K paths, warm run time drops from 18.2s (cold start) to 13.9s. Only structural changes (new trade types) require recompilation.
The benchmark computes sensitivities to 535 risk factors: r0 (Hull-White initial rate), sigma (volatility), 251 mean-reversion curve points (50yr horizon, 0.2yr step), 141 counterparty survival curve points, and 141 company survival curve points (14,000-day horizon, 100-day step).
AADC evaluation time scales linearly with portfolio size: 0.024s for 5 trades, ~0.20s for 50 trades, ~0.88s for 200 trades. Kernel compilation is a one-time cost (0.77s to 8.1s) amortized across all subsequent evaluations. Relative performance improves with size: from 18.75% at 5 trades to 0.70% at 200 trades.
AADC supports three main XVA workflows: (A) End-of-day full risk reports computing all 535 sensitivities at ~2× primal cost + compilation, (B) Intraday market data updates with kernel reuse (no recompilation), and (C) New trade additions requiring only recompile + evaluation.