Product

Optimal decisions. Exact gradients. One pass.

Embed a neural policy inside your simulation, record it on an AADC tape, and train in seconds. 100-400x faster than reinforcement learning, with all sensitivities from the same backward pass.

363x
Training speedup vs RL
2,500x
Eval speedup vs PyTorch
0 ms
Sensitivity overhead (free)
Solved
Curse of dimensionality

Every industry has the same hard problem

You operate a system that evolves over time. At each step you make a decision: inject or withdraw gas, charge or discharge a battery, adjust a drug dosage, allocate between asset classes, release water from a dam, steer a robot arm. Prices move, weather changes, patients respond differently. You need the policy that maximizes (or minimizes) an objective across thousands of possible futures.

Traditional approaches all hit walls:

Dynamic Programming

Exponential in state dimensions. 3 coupled assets = 150^3 = 3.4M states. 10 assets = impossible.

Reinforcement Learning (PPO, SAC)

Requires millions of episodes, hours of training, and still produces noisy policies with no sensitivities.

Finite Differences

Need N+1 simulations to get N sensitivities. 365 daily deltas = 366 full simulations. 1,000 robot control variables = 1,001 simulations.

Heuristic rules

Leave money on the table. No way to know how much.

We differentiate through the entire simulation in one backward pass and get the exact gradient of the objective with respect to every parameter simultaneously.

Differentiable Optimal Control

A small neural policy sits inside a Monte Carlo simulation. The entire computation is recorded on an AADC tape. One forward pass simulates thousands of scenarios; one backward pass computes exact gradients of the objective with respect to every policy weight and every market/model parameter.

The pattern is the same across every domain. Only the physics changes.

      Record once                    Replay millions of times
┌─────────────────────┐         ┌──────────────────────────────┐
│  Physics model      │         │  Forward pass (price/value)  │
│  + Neural policy    │  ───►   │  + Backward pass (all grads) │
│  + Constraints      │  AADC   │  + All sensitivities: FREE   │
│  + Objective        │  tape   │  AVX2/512 vectorized         │
└─────────────────────┘         │  Multi-threaded              │
                                └──────────────────────────────┘
          

Three properties make this work:

1

Smooth constraints

Hard limits (tank full, pipe max pressure, SOC bounds) are replaced with calibrated smooth sigmoids so gradients can flow through every constraint.

2

Neural policy on tape

The decision network lives inside the AADC tape, not outside it. d(objective) / d(all policy weights) in one pass.

3

Adjoint mode

One backward pass computes gradients with respect to all inputs. Whether you have 77 or 2,237 weights, the cost is the same.

The Pattern

Every application follows the same three steps:

1

Record

Simulate one forward pass on AADC tape. Physics, neural policy, constraints, objective: all recorded.

60 ms to 4 sec (once)
2

Train

Run forward + backward through recorded tape. Adam optimizer updates policy weights using exact gradients. Repeat 500-2,000 times.

1.4 to 95 seconds
3

Deploy

Replay the tape with new market/sensor data. All sensitivities come free from the backward pass. AVX-vectorized, multi-threaded.

Microseconds per evaluation

What changes between applications:

The physics model (50-500 lines of domain-specific code).

What stays the same:

The tape recording, gradient computation, optimizer, smooth constraint library, validation framework.

Robotics note: For robotics, the pattern is the same but the physics comes from an existing engine (MuJoCo) rather than custom code.

Applications

Gas Storage Optimization

Energy & Commodities

Daily injection/withdrawal decisions for natural gas storage. Maximize value under stochastic prices and contractual demand obligations.

The problem with current methods: Works for a single facility with one price factor. Add a second facility or demand obligations and the state space explodes.

Metric Our approach Best alternative Improvement
Training time 40 seconds RL (SAC): 14,163 sec 363x faster
Value (with demand obligations) $3.95M DP: $2.36M +67%
365 forward curve deltas 0 ms (adjoint, free) DP bump-and-revalue: 5,110 ms Infinite
3-facility portfolio 145 sec (linear) DP: 150^3 = intractable Tractable
Eval vs PyTorch 0.1 ms / 1,024 paths 245 ms 2,500x

Key insight: Demand obligations add state dimensions. The neural policy sees demand state as another input, so there is no grid explosion.

Architecture: 6 inputs, 5+5 hidden, 2 outputs = 77 neural weights + 730 daily biases = 807 total parameters.

Performance Summary

Training Speed (vs Alternatives)

Application Our approach RL (PPO/SAC) Speedup DP/SDP Speedup
Gas storage 40 sec 14,163 sec 363x N/A (no policy) --
Hydropower 29 sec Failed (-$5.8M) -- FD gradient: 2.4 hr 298x
Boiler NOx 40 sec 3.7 hours 328x N/A --
Pharma (7-unit) 7.2 sec N/A -- FD: 5 min 44x
Weather 1.4 sec N/A -- N/A --
Insurance ALM 93 sec N/A -- N/A --
Climate NGFS matrix (18 cells) < 1 sec N/A Bump-and-revalue: days Orders of magnitude
Battery (single) 25 sec Hours (est.) ~400x DP: feasible but no Greeks
Robot trajectory (200 iter) 57 ms N/A FD: 9,000 ms 158x
Sepsis policy 9 sec N/A N/A

Sensitivity Computation (vs Finite Differences)

Application Parameters Adjoint (1 pass) Finite differences Speedup
Gas storage 365 daily deltas 0 ms (free) 5,110 ms Free
Hydropower 5 Greeks 12 ms 86 ms 7.2x
Insurance ALM 200 risk factors 191 ms 38,356 ms 200x
Pharma 70 CPPs 149 ms 70 bumped runs ~70x
Climate risk (20 factors) 20 climate factors 1 adjoint pass 20 re-simulations ~20x
Robot (Drake, 1,004 vars) 1,004 controls 25.2 ms 760 ms 30x
Sepsis triage 7 interventions Included in training 7 bumped simulations Free

Scaling (vs Dynamic Programming)

Dimension DP complexity Our approach Result
1 facility 150 states 807 params, 40 sec DP feasible, we're faster
3 facilities 150^3 = 3.4M states 2,456 params, 145 sec DP intractable
10 reservoirs (coupled) Exponential 797 params, 80.5 sec Linear scaling
30 reservoirs Impossible 2,237 params, 244 sec Still linear
10 batteries (VPP) Exponential 202 params (shared), 25 sec Linear scaling
1,000 batteries (VPP) Impossible 202 params (shared), ~30 sec train Still linear

Frequently Asked Questions

What kind of problems can this solve?

+

Anything where you have a simulation, controls at each time step, constraints, uncertainty, and an objective. The simulation needs to be differentiable or smoothly approximable. The domain doesn't matter much: gas storage, insulin dosing, hydropower, and climate risk analytics (NGFS stress testing, carbon sensitivity) all use the same machinery.

How is this different from reinforcement learning?

+

RL estimates gradients from noisy reward signals over millions of episodes. We compute exact gradients in a single backward pass through the simulation. In practice that means 100-400x faster training, deterministic policies, and all sensitivities from the same computation.

How is this different from dynamic programming?

+

DP discretizes the state space into a grid and works backward through time. That scales exponentially with the number of state variables. We parameterize the policy with a neural network, so adding state dimensions adds inputs to the network but doesn't explode the computation. 10 coupled reservoirs train in 80 seconds.

What do you mean by 'sensitivities for free'?

+

The adjoint pass computes the gradient of the objective with respect to every input in one backward pass. 5 risk factors or 365 daily price deltas, same cost. Roughly equal to one forward evaluation.

What accuracy guarantees do you provide?

+

We validate every model with a cohort mirror test: the same computation in standard floating-point and in AADC must agree to machine precision (< 1e-12 relative error). Adjoint gradients are checked against finite differences on independent code paths.

What hardware do I need?

+

Any modern x86 CPU with AVX2. No GPU. All benchmarks on this page ran on a single CPU. Multi-threading across 8+ cores is supported.

How does this compare to PyTorch autograd?

+

We implemented the same gas storage algorithm in PyTorch. It was 16x slower for training and 2,500x slower for evaluation. The difference is AADC's JIT compilation to AVX machine code vs Python interpreter overhead.

Can I use my own simulation code?

+

Yes. Write your simulation as a templated C++ function or use AADC's idouble type in Python. We handle the recording, optimization, and sensitivity computation.

Do I need machine learning expertise?

+

No. The neural policy, optimizer, and training loop are pre-built. You write the simulation and define the constraints.

Can RL eventually match this with enough training?

+

PPO captured 65% of the gas storage optimum after 3.7 hours. SAC failed on hydropower (-$5.8M loss). RL estimates gradients from noisy rewards; we compute them exactly. More training time doesn't close that gap.

How does AADC compare to NVIDIA Isaac Gym for robotics?

+

Isaac Gym uses simplified rigid-body physics on GPU with forward-mode AD. AADC differentiates through the real MuJoCo C code on CPU using reverse-mode AD. Isaac Gym is faster for massively parallel RL training. AADC is better for trajectory optimization, MPC, and system identification where physics fidelity and exact gradients matter more than throughput. Also runs on any CPU, no GPU clusters required.

How does AADC compare to JAX for physics simulation?

+

JAX re-traces the computation graph on each call. AADC records once and replays the compiled kernel indefinitely. For a 50-step MuJoCo trajectory, AADC completes 200 iterations in 57 ms. JAX-based alternatives (DiffMJX, Brax) get roughly 10x over FD. AADC gets 158x. And AADC differentiates through the real MuJoCo C code; JAX alternatives reimplement the physics.

How does this handle climate risk and NGFS scenarios?

+

The entire climate-financial chain goes on one AADC tape: carbon price dynamics, transition risk (PD/LGD from carbon costs), physical risk (temperature-dependent damage), and climate-adjusted XVA. One adjoint pass gives d(expected_loss)/d(carbon_price) and d(expected_loss)/d(temperature) simultaneously. The full 6-scenario x 3-horizon NGFS capital matrix computes in under a second. EIOPA compliance deadline is January 2027.

What medical applications does this support?

+

Five so far: ECG cardiac diagnostics (5-class, 0.914 beat-level AUC with exact adjoint interpretability), EEG seizure detection (per-patient channel selection in 1.3 seconds), ICU sepsis triage (12-state ODE with 7 intervention controls), insulin dosing (99.1% time-in-range), and pharma manufacturing (70 ICH Q8 sensitivities in 149 ms). All train on CPU in seconds to minutes.

Talk to us about your problem

If you have a sequential decision problem with a simulation behind it, we can probably make it faster. Let's find out.