This document provides mappings and instructions for updating benchmark data from execution results.
# Regenerate benchmark_data.json after any changes
python3 scripts/update_benchmarks.py
Key Files:
scripts/update_benchmarks.py - Main generator scriptpublic/data/benchmarks/benchmark_data.json - Generated output (DO NOT EDIT)src/components/InteractiveBenchmark.astro - Display componentIMPORTANT: Always use “AADC Python” (not “Python AADC”)
| Model Name in Execution Log | Display Name | Notes |
|---|---|---|
gbm_asian_AADC_scalar_py | AADC Python | Main Python implementation (scalar kernel v6.0.0) |
gbm_asian_AADC_cpp | AADC C++ | C++ implementation |
gbm_asian_optimised_safe_cpp | Optimised C++ | Hand-optimised C++ baseline |
The system tracks whether data is actual (from execution log) or estimated (extrapolated):
Performance Comparison table:
* Estimate markerKey Findings section:
Missing implementations:
# Estimate detection
if not has_actual_data:
impl["is_estimate"] = True
impl["estimate_note"] = f"Estimated from {closest_trades}×{closest_scenarios}"
impl["closest_trades"] = closest_trades
impl["closest_scenarios"] = closest_scenarios
Key Insights now update based on selected configuration (no longer static).
| Category | Insights |
|---|---|
aad_tools | AADC vs Enzyme-AD/CoDiPack speedups, open-source alternative recommendation |
ml_libraries | AADC vs JAX/PyTorch speedups, JIT overhead warning |
python_implementations | AADC vs NumPy/Basic Python speedups |
languages | AADC C++ vs Optimised C++, Python vs C++ ratio |
Performance: AADC Python (668.7 ms) is 9x faster than Enzyme-AD (6.27s)
and 13x faster than CoDiPack (8.85s) at 100K scenarios
Open-Source Choice: If commercial AADC isn't an option, Enzyme-AD (6.27s)
is the fastest open-source alternative but 9x slower than AADC.
AADC C++ and Python have very different batch models - the component now shows separate explanations:
| Implementation | Batch Size | Reason |
|---|---|---|
| AADC Python | 65,536 scenarios | Large NumPy-style batches |
| AADC C++ | 4-8 scenarios | AVX vector width (AVX2=4, AVX512=8) |
Example at 1000 trades × 100K scenarios:
The total_eval_time_sec column is the MOST IMPORTANT metric for production benchmarks.
total_eval_time_sec = eval_time_sec + recording_time_sec
| Implementation | eval_time | recording_time | total_eval_time |
|---|---|---|---|
| AADC Python | 2.7s | 0.15s | 2.85s |
| JAX | 6.5s | 12.5s | 19.0s |
Key Insight: JAX’s eval_time (6.5s) is only 2.4× slower than AADC, but total_eval_time is 6.7× slower (19s vs 2.85s) due to JIT compilation overhead.
At 10 trades × 1K scenarios, JAX’s 12.5s compilation might seem acceptable. But this isn’t production-representative. In production:
Always use total_eval_time_sec for production performance comparisons.
Always show both eval_time and total_eval_time in benchmark displays.
Before using execution log data, be aware that different model types have different column counts:
| Model Type | Columns | Issue |
|---|---|---|
| Python AADC, NumPy, etc. | 39-40 | ✓ Correct |
| Julia, Haskell | 37 | Missing recording_time, kernel_execution_time |
| Old C++ (adept, codipack, etc.) | 34 | Missing timing and memory breakdown fields |
Always use execution_log_clean_16Jan.csv which has been corrected with fix_column_mapping.py.
See /home/natashamanito/Asian Options Benchmark/AI Asian Pricer/data/CLAUDE.md for full details.
New calculated column: total_eval_time_sec = eval_time_sec + recording_time_sec
There are two data files that must be kept in sync:
Scatter Plot Data: src/content/data/benchmark-scatter-data.md
BenchmarkScatterPlot.astro componentAsian Option Benchmark Stats: src/content/resources/benchmarks/asian-option-monte-carlo.md
accelerate-python-models.astro, CodeViewer componentbenchmarkStats object with speedup comparisons and formatted values| CSV Field | scatter-data.md Property | Notes |
|---|---|---|
eval_time_sec | greeks_time | Evaluation time (EXCLUDES recording for AADC) |
recording_time_sec | recording_time | Kernel recording time (AADC only) |
steady_state_time_sec | steady_state_time | Steady-state eval time (all kernel calls) |
total_eval_time_sec | N/A (calculated in component) | eval_time + recording_time |
memory_mb | memory | Peak memory usage |
model_total_lines | code_lines | Total lines changed for integration |
recording_time_sec or kernel_execution_time_sec | compile_time | JIT/kernel compilation time (for scatter plot) |
| N/A (count Greeks: delta, rho, vega) | num_greeks | Usually 3 (Delta, Rho, Vega) |
Important Timing Relationship:
For JIT/Recording implementations (AADC, JAX, Enzyme):
greeks_time (eval_time) = first_eval + steady_state (excludes recording/JIT)
total eval = greeks_time + recording_time (or compile_time)
For traditional implementations (NumPy, PyTorch, Basic Python, etc.):
greeks_time = total evaluation time
Note: The component uses both recording_time and compile_time fields interchangeably.
Implementations with JIT compilation (JAX, Enzyme) have compile_time in scatter data.
benchmarkStats| CSV Field | benchmarkStats Property | Notes |
|---|---|---|
eval_time_sec (AADC) | aadcTime | Format as “Xs” or “~X hours” |
eval_time_sec (C++) | cppTime | Format as “Xs” or “~X hours” |
eval_time_sec (Basic Python) | basicPythonTime | Format as “~X hours” |
eval_time_sec (NumPy) | numpyTime | Format as “~X hours” |
model_total_lines (AADC) | aadcLines | Raw line count |
model_total_lines (Basic Python) | basicPythonLines | Raw line count |
model_total_lines (AADC) - model_total_lines (Basic Python) | linesAdded | Format as “+X” |
| CSV Field | benchmarkStats Property |
|---|---|
num_trades | Part of testConfig |
num_scenarios | Part of testConfig |
num_timesteps | Part of testConfig |
num_threads | threads |
testConfig Format: "{num_trades} trades × {num_scenarios/1000}K scenarios × {num_timesteps} timesteps"
# Speedup vs Baseline
speedup = baseline_time / implementation_time
# Format: "Xx" (e.g., "420x")
aadcVsBasicPython = basicPythonTime / aadcTime
aadcVsNumpy = numpyTime / aadcTime
aadcGreeksVsCpp = cppGreeksTime / aadcGreeksTime
cppVsBasicPython = basicPythonTime / cppTime
numpyVsBasicPython = basicPythonTime / numpyTime
# Greeks overhead = how much slower Greeks is vs price-only
# Format: "+X%" (e.g., "+42%")
overhead_pct = ((greeks_time - priceonly_time) / priceonly_time) * 100
# Example:
# AADC: priceonly = 67s, greeks = 95s
# overhead = ((95 - 67) / 67) * 100 = 41.8% → "+42%"
The progress bar uses logarithmic scale to handle wide time ranges:
// All times in same unit (seconds)
const logMin = Math.log10(Math.max(minTime, 0.001));
const logMax = Math.log10(Math.max(maxTime, 0.001));
const logVal = Math.log10(Math.max(currentTime, 0.001));
const logRange = logMax - logMin;
// Colored bar represents time: slowest = 100% fill, fastest = smallest fill
const barWidthPercent = logRange > 0
? ((logVal - logMin) / logRange) * 100
: 50;
# Baseline = slowest implementation in the group (typically Basic Python or autodiff)
speedup = slowest_time / implementation_time
# Format: "X.Xx" (e.g., "420.0x", "1.0x" for baseline)
Each implementation in benchmark-scatter-data.md:
- id: aadc_python # Unique identifier
name: "AADC Python" # Display name
shortName: "Py AADC" # Short name for legends
category: "ml-libraries" # One of: aad-tools, ml-libraries, python-implementations, languages
color: "#22d3ee" # Hex color for chart
language: "Python" # Programming language
# Metrics (from CSV)
greeks_time: 0.348 # eval_time_sec (seconds)
memory: 52 # memory_mb
code_lines: 176 # model_total_lines or delta from baseline
compile_time: 0.095 # recording_time_sec or kernel_execution_time_sec
num_greeks: 3 # Number of Greeks computed
recommended: true # Optional: highlight as recommended
The resources/publications/accelerate-python-models.astro page should use the same data from asian-option-monte-carlo.md. Currently it does via:
const benchmark = await getEntry('resources', 'benchmarks/asian-option-monte-carlo');
const benchmarkStats = benchmark?.data?.benchmarkStats;
Update asian-option-monte-carlo.md with new benchmarkStats:
aadcVsBasicPython, aadcVsNumpy, aadcGreeksVsCppaadcTime, cppTime, basicPythonTime, numpyTimeaadcLines, basicPythonLines, linesAddedaadOverhead, bumpOverhead, numpyGreeksOverheadtestConfig, threadsUpdate benchmark-scatter-data.md with new metrics:
greeks_time, memory, code_lines, compile_timeThe accelerate-python-models page will automatically reflect the changes because it reads from asian-option-monte-carlo.md.
| Display Location | benchmarkStats Field | Format |
|---|---|---|
| Hero headline | aadcVsBasicPython | ”420x” |
| Key findings | aadcVsBasicPython, linesAdded, aadcGreeksVsCpp | ”420x speedup with +176 lines” |
| Performance table | basicPythonLines, numpyLines, aadcLines, cppLines | Raw numbers |
| Performance table | basicPythonTime, numpyTime, aadcTime, cppTime | ”~X hours” or “Xs” |
| Comparison box | aadcGreeksVsCpp, bumpOverhead, aadOverhead | ”5.4x”, “+582%“ |
| Scale note | testConfig, threads | ”1000 trades × 500K scenarios…” |
| FAQ answers | Various speedups and overheads | Dynamic substitution |
Extract these fields for each implementation:
model_name, language, eval_time_sec, memory_mb, model_total_lines,
recording_time_sec, num_trades, num_scenarios, num_timesteps, num_threads
# For each implementation pair
speedup = baseline_time / impl_time
overhead = ((greeks_time - price_time) / price_time) * 100
lines_added = aadc_lines - baseline_lines
implementations:
- id: aadc_python
greeks_time: <eval_time_sec>
memory: <memory_mb>
code_lines: <model_total_lines>
compile_time: <recording_time_sec>
benchmarkStats:
aadcVsBasicPython: "<calculated>x"
aadcTime: "<eval_time_sec>s"
linesAdded: "+<calculated>"
# ... etc
Run npm run build and verify:
When updating benchmark data:
(greeks - price) / price * 100testConfig matches actual benchmark configurationlinesAdded = AADC lines - baseline linesbenchmark-scatter-data.md and asian-option-monte-carlo.md are updatednpm run build to verify no errorsFor AADC and similar kernel-recording implementations, the timing fields have specific meanings:
| CSV Field | Display Name | Meaning |
|---|---|---|
recording_time_sec | Recording | One-time cost to build the computation graph (kernel). Happens once per model. |
eval_time_sec (greeks_time) | Evaluation | Time to run the recorded kernel. This excludes recording time. |
| (calculated) | Cold Start | recording_time + eval_time. Total time if starting fresh without a recorded kernel. |
IMPORTANT: eval_time_sec (displayed as “Greeks Time” in main table) does NOT include recording time. It measures only the kernel evaluation.
Key Insight:
The “Per Kernel” column shows the time for a SINGLE kernel execution. This is calculated as:
Per Kernel = steady_state_time / num_kernel_calls_steady_state
Where num_kernel_calls depends on the implementation:
| Implementation | Batch Size | Kernel Calls Calculation |
|---|---|---|
| AADC Python | scenario_batch_size (65536) | (num_trades - 1) × ceil(num_scenarios / batch_size) |
| AADC C++ | vector_size (4=AVX2, 8=AVX512) | (num_trades - 1) × ceil(num_scenarios / vector_size) |
Example (10 trades × 1K scenarios):
| Implementation | Batch Size | Kernel Calls (steady) | Steady State | Per Kernel |
|---|---|---|---|---|
| AADC C++ (AVX512) | 8 | 9 × 125 = 1,125 | 12.8 ms | 11.4 μs |
| AADC Python | 65536 | 9 × 1 = 9 | 35.5 ms | 3.94 ms |
Key insight: C++ AADC processes 8 scenarios per kernel call (AVX512), while Python AADC processes up to 65536 scenarios per call. This explains the large difference in per-kernel time but similar total evaluation time.
Full timing breakdown (Python AADC at 10 trades × 1K scenarios):
For non-kernel-recording implementations:
| CSV Field | Display Name | Meaning |
|---|---|---|
first_run_time_sec | JIT Warmup | First-run time including JIT compilation (if applicable) |
steady_state_time_sec | Evaluation | Steady-state per-run evaluation time |
eval_time_sec | Total Eval Time | Total wall-clock time |
Traditional implementations re-compute everything each evaluation - no amortization benefit.
These fields are available but not currently mapped:
| CSV Field | Potential Use |
|---|---|
portfolio_value | Validation - should match across implementations |
avg_option_value | Validation |
avg_delta, avg_rho, avg_vega | Validation - should match across implementations |
first_run_time_sec | Cold start time (JIT warmup) |
steady_state_time_sec | Warmed-up performance |
data_memory_mb, kernel_memory_mb | Memory breakdown |
batch_size | Vectorization info |
total_generic_ops, total_exp_ops, etc. | Operation counts for complexity analysis |
model_math_lines | Lines with mathematical operations |
src/
├── content/
│ ├── data/
│ │ └── benchmark-scatter-data.md # Scatter plot metrics
│ └── resources/
│ ├── benchmarks/
│ │ └── asian-option-monte-carlo.md # benchmarkStats (single source)
│ └── articles/
│ └── accelerate-python-models.md # References benchmarkStats
├── pages/
│ ├── technology/
│ │ └── benchmarks/
│ │ └── interactive-benchmarks/
│ │ ├── CLAUDE.md (this file)
│ │ ├── aad-tools.astro
│ │ ├── ml-libraries.astro
│ │ ├── python-implementations.astro
│ │ ├── languages.astro
│ │ └── production.astro
│ └── resources/
│ └── publications/
│ └── accelerate-python-models.astro # Uses benchmarkStats
└── components/
└── benchmark/
└── BenchmarkScatterPlot.astro # Uses scatter-data.md