Embed a neural policy inside your simulation, record it on an AADC tape, and train in seconds. 100-400x faster than reinforcement learning, with all sensitivities from the same backward pass.
You operate a system that evolves over time. At each step you make a decision: inject or withdraw gas, charge or discharge a battery, adjust a drug dosage, allocate between asset classes, release water from a dam, steer a robot arm. Prices move, weather changes, patients respond differently. You need the policy that maximizes (or minimizes) an objective across thousands of possible futures.
Exponential in state dimensions. 3 coupled assets = 150^3 = 3.4M states. 10 assets = impossible.
Requires millions of episodes, hours of training, and still produces noisy policies with no sensitivities.
Need N+1 simulations to get N sensitivities. 365 daily deltas = 366 full simulations. 1,000 robot control variables = 1,001 simulations.
Leave money on the table. No way to know how much.
We differentiate through the entire simulation in one backward pass and get the exact gradient of the objective with respect to every parameter simultaneously.
A small neural policy sits inside a Monte Carlo simulation. The entire computation is recorded on an AADC tape. One forward pass simulates thousands of scenarios; one backward pass computes exact gradients of the objective with respect to every policy weight and every market/model parameter.
The pattern is the same across every domain. Only the physics changes.
Record once Replay millions of times
┌─────────────────────┐ ┌──────────────────────────────┐
│ Physics model │ │ Forward pass (price/value) │
│ + Neural policy │ ───► │ + Backward pass (all grads) │
│ + Constraints │ AADC │ + All sensitivities: FREE │
│ + Objective │ tape │ AVX2/512 vectorized │
└─────────────────────┘ │ Multi-threaded │
└──────────────────────────────┘
Hard limits (tank full, pipe max pressure, SOC bounds) are replaced with calibrated smooth sigmoids so gradients can flow through every constraint.
The decision network lives inside the AADC tape, not outside it. d(objective) / d(all policy weights) in one pass.
One backward pass computes gradients with respect to all inputs. Whether you have 77 or 2,237 weights, the cost is the same.
Every application follows the same three steps:
Simulate one forward pass on AADC tape. Physics, neural policy, constraints, objective: all recorded.
60 ms to 4 sec (once)Run forward + backward through recorded tape. Adam optimizer updates policy weights using exact gradients. Repeat 500-2,000 times.
1.4 to 95 secondsReplay the tape with new market/sensor data. All sensitivities come free from the backward pass. AVX-vectorized, multi-threaded.
Microseconds per evaluationThe physics model (50-500 lines of domain-specific code).
The tape recording, gradient computation, optimizer, smooth constraint library, validation framework.
Robotics note: For robotics, the pattern is the same but the physics comes from an existing engine (MuJoCo) rather than custom code.
Daily injection/withdrawal decisions for natural gas storage. Maximize value under stochastic prices and contractual demand obligations.
The problem with current methods: Works for a single facility with one price factor. Add a second facility or demand obligations and the state space explodes.
| Metric | Our approach | Best alternative | Improvement |
|---|---|---|---|
| Training time | 40 seconds | RL (SAC): 14,163 sec | 363x faster |
| Value (with demand obligations) | $3.95M | DP: $2.36M | +67% |
| 365 forward curve deltas | 0 ms (adjoint, free) | DP bump-and-revalue: 5,110 ms | Infinite |
| 3-facility portfolio | 145 sec (linear) | DP: 150^3 = intractable | Tractable |
| Eval vs PyTorch | 0.1 ms / 1,024 paths | 245 ms | 2,500x |
Key insight: Demand obligations add state dimensions. The neural policy sees demand state as another input, so there is no grid explosion.
Architecture: 6 inputs, 5+5 hidden, 2 outputs = 77 neural weights + 730 daily biases = 807 total parameters.
| Application | Our approach | RL (PPO/SAC) | Speedup | DP/SDP | Speedup |
|---|---|---|---|---|---|
| Gas storage | 40 sec | 14,163 sec | 363x | N/A (no policy) | -- |
| Hydropower | 29 sec | Failed (-$5.8M) | -- | FD gradient: 2.4 hr | 298x |
| Boiler NOx | 40 sec | 3.7 hours | 328x | N/A | -- |
| Pharma (7-unit) | 7.2 sec | N/A | -- | FD: 5 min | 44x |
| Weather | 1.4 sec | N/A | -- | N/A | -- |
| Insurance ALM | 93 sec | N/A | -- | N/A | -- |
| Climate NGFS matrix (18 cells) | < 1 sec | N/A | Bump-and-revalue: days | Orders of magnitude | |
| Battery (single) | 25 sec | Hours (est.) | ~400x | DP: feasible but no Greeks | |
| Robot trajectory (200 iter) | 57 ms | N/A | FD: 9,000 ms | 158x | |
| Sepsis policy | 9 sec | N/A | N/A |
| Application | Parameters | Adjoint (1 pass) | Finite differences | Speedup |
|---|---|---|---|---|
| Gas storage | 365 daily deltas | 0 ms (free) | 5,110 ms | Free |
| Hydropower | 5 Greeks | 12 ms | 86 ms | 7.2x |
| Insurance ALM | 200 risk factors | 191 ms | 38,356 ms | 200x |
| Pharma | 70 CPPs | 149 ms | 70 bumped runs | ~70x |
| Climate risk (20 factors) | 20 climate factors | 1 adjoint pass | 20 re-simulations | ~20x |
| Robot (Drake, 1,004 vars) | 1,004 controls | 25.2 ms | 760 ms | 30x |
| Sepsis triage | 7 interventions | Included in training | 7 bumped simulations | Free |
| Dimension | DP complexity | Our approach | Result |
|---|---|---|---|
| 1 facility | 150 states | 807 params, 40 sec | DP feasible, we're faster |
| 3 facilities | 150^3 = 3.4M states | 2,456 params, 145 sec | DP intractable |
| 10 reservoirs (coupled) | Exponential | 797 params, 80.5 sec | Linear scaling |
| 30 reservoirs | Impossible | 2,237 params, 244 sec | Still linear |
| 10 batteries (VPP) | Exponential | 202 params (shared), 25 sec | Linear scaling |
| 1,000 batteries (VPP) | Impossible | 202 params (shared), ~30 sec train | Still linear |
Anything where you have a simulation, controls at each time step, constraints, uncertainty, and an objective. The simulation needs to be differentiable or smoothly approximable. The domain doesn't matter much: gas storage, insulin dosing, hydropower, and climate risk analytics (NGFS stress testing, carbon sensitivity) all use the same machinery.
RL estimates gradients from noisy reward signals over millions of episodes. We compute exact gradients in a single backward pass through the simulation. In practice that means 100-400x faster training, deterministic policies, and all sensitivities from the same computation.
DP discretizes the state space into a grid and works backward through time. That scales exponentially with the number of state variables. We parameterize the policy with a neural network, so adding state dimensions adds inputs to the network but doesn't explode the computation. 10 coupled reservoirs train in 80 seconds.
The adjoint pass computes the gradient of the objective with respect to every input in one backward pass. 5 risk factors or 365 daily price deltas, same cost. Roughly equal to one forward evaluation.
We validate every model with a cohort mirror test: the same computation in standard floating-point and in AADC must agree to machine precision (< 1e-12 relative error). Adjoint gradients are checked against finite differences on independent code paths.
Any modern x86 CPU with AVX2. No GPU. All benchmarks on this page ran on a single CPU. Multi-threading across 8+ cores is supported.
We implemented the same gas storage algorithm in PyTorch. It was 16x slower for training and 2,500x slower for evaluation. The difference is AADC's JIT compilation to AVX machine code vs Python interpreter overhead.
Yes. Write your simulation as a templated C++ function or use AADC's idouble type in Python. We handle the recording, optimization, and sensitivity computation.
No. The neural policy, optimizer, and training loop are pre-built. You write the simulation and define the constraints.
PPO captured 65% of the gas storage optimum after 3.7 hours. SAC failed on hydropower (-$5.8M loss). RL estimates gradients from noisy rewards; we compute them exactly. More training time doesn't close that gap.
Isaac Gym uses simplified rigid-body physics on GPU with forward-mode AD. AADC differentiates through the real MuJoCo C code on CPU using reverse-mode AD. Isaac Gym is faster for massively parallel RL training. AADC is better for trajectory optimization, MPC, and system identification where physics fidelity and exact gradients matter more than throughput. Also runs on any CPU, no GPU clusters required.
JAX re-traces the computation graph on each call. AADC records once and replays the compiled kernel indefinitely. For a 50-step MuJoCo trajectory, AADC completes 200 iterations in 57 ms. JAX-based alternatives (DiffMJX, Brax) get roughly 10x over FD. AADC gets 158x. And AADC differentiates through the real MuJoCo C code; JAX alternatives reimplement the physics.
The entire climate-financial chain goes on one AADC tape: carbon price dynamics, transition risk (PD/LGD from carbon costs), physical risk (temperature-dependent damage), and climate-adjusted XVA. One adjoint pass gives d(expected_loss)/d(carbon_price) and d(expected_loss)/d(temperature) simultaneously. The full 6-scenario x 3-horizon NGFS capital matrix computes in under a second. EIOPA compliance deadline is January 2027.
Five so far: ECG cardiac diagnostics (5-class, 0.914 beat-level AUC with exact adjoint interpretability), EEG seizure detection (per-patient channel selection in 1.3 seconds), ICU sepsis triage (12-state ODE with 7 intervention controls), insulin dosing (99.1% time-in-range), and pharma manufacturing (70 ICH Q8 sensitivities in 149 ms). All train on CPU in seconds to minutes.
If you have a sequential decision problem with a simulation behind it, we can probably make it faster. Let's find out.