Technology Deep Dive

How AADC Works

AADC extracts a computational graph (DAG) from your existing analytics at runtime, then compiles it to optimized machine code. It is not a source code compiler—it uses operator overloading to capture operations, then generates vectorized x86-64 binary that eliminates interpreter overhead, object-oriented abstractions, and memory bandwidth bottlenecks.

6-1000×
Performance Improvement
<1
Adjoint Factor
~200×
Compilation Speed
90%
Cloud Cost Reduction

How It Works

How AADC transforms code into optimized kernels

Code Generation AAD™ uses operator overloading to trace the exact sequence of operations performed by your object-oriented code (C++, C#, or Python) at runtime, extracting a computational graph (DAG). This DAG is then compiled to optimized machine code kernels. AADC is not a source code compiler—it preserves your coding style and abstractions while eliminating their runtime penalties, delivering native performance on standard CPUs with minimal code changes (replacing double with idouble in hot sections).

The generated kernels are reusable for repeated executions (e.g., thousands or millions of simulation paths, scenarios, or model calibrations), achieving 6–1000× speedups for both forward evaluations and first- or higher-order derivatives.

Designed for cross-platform execution, Code Generation AAD™ works across scientific simulations (PDEs, geophysics), quantitative finance, machine learning workflows (time series, recurrent networks), and other domains requiring repetitive high-performance computation.

Kernel creation time is pivotal, as it forms part of overall execution. Using off-the-shelf compilers (e.g., LLVM or C++) can take time equivalent to over 10,000 executions of the original code, making it prohibitive for smaller or dynamic simulations. This renders traditional code generation sufficient for testing but impractical for production, where minimizing compilation overhead is essential.

MatLogica's Innovation: Combining Operator Overloading with On-the-Fly Code Generation

Traditional high-performance computing often forces a difficult trade-off: write in productive, high-level object-oriented languages (Python, C++, C#) and accept runtime overhead, or drop to low-level assembly/machine code for maximum speed — sacrificing readability and development velocity.

Highly Optimized Code Generation

Fast compilation speed enables practical performance gains in production systems.

No External Dependencies

Avoids slow off-the-shelf tools, ensuring reliable integration across platforms.

MatLogica productivity comparison

AADC Workflow

How AADC transforms your code into optimized kernels

Phase 0

Normal Execution

Regular program execution with I/O, setup, and data loading

1
Program initialization

Load data, read configuration, initialize objects

2
I/O operations

File reading, network calls, database queries

3
Non-numerical setup

Any operations outside the hot computational section

Timing: Use native double (no overhead) or idouble (~2% overhead) for zero-cost transition to Phase 1

Phase 1

Recording/Compiling

Capture hot section and generate optimized machine code

1
Active types capture operations

Replace double with idouble. Outside of recording, idouble has ~2% overhead vs native double.

2
Computational graph construction

Sequence of elementary operations automatically captured into DAG

3
Code generation

Forward and adjoint passes generated in parallel; adjoint machine code generated backwards

4
Graph-level optimizations

Constant folding, dead code elimination, optimal register allocation

Recording/Compiling

Timing: Recording/compilation adds overhead compared to a single execution. One-time cost amortized over kernel executions.

Phase 2

Kernel Execution

Multi-threaded execution even if original code isn't thread-safe

1
Multi-threaded execution

Run kernels across multiple threads even if original code isn't thread-safe

2
Values + first-order sensitivities

Single kernel execution returns value and all first-order sensitivities

3
Higher-order derivatives

Bump-and-revalue (finite difference) of first-order derivatives provides best results

4
Cloud deployment

Kernels are serializable - deploy to cloud for elastic compute without exposing source code

Performance: 6-1000x faster than original code, with adjoint factor <1

Phase 3

Kernel Reuse (Optional)

Reuse kernels for real-time and remote computation services

1
Serialize kernels

Save compiled kernels to disk for later reuse without recompilation

2
Cross-platform deployment

Kernels are compatible between Windows and Linux

3
Real-time services

Deploy kernels for always-on computation and real-time processing

4
Remote execution

Distribute kernels to remote compute nodes without exposing source code

Timing: Optional phase for production deployment scenarios

Technical Benefits

Not just AAD—all calculations benefit from JIT compilation

4-8x

Automatic Vectorization

AVX2/AVX512 SIMD processes 4-8 doubles per CPU cycle. What would require manual assembly coding happens automatically.

10-50x on multi-core

Automatic Multi-Threading

Multi-threaded execution even if original code isn't thread-safe. Scales linearly with cores.

2-3x

Cache Optimization

Optimal memory layout for modern CPU architecture. Data arranged for maximum cache efficiency and minimal bandwidth.

Additional efficiency gains

Code Compression

Better instruction cache usage through optimized code generation. Eliminates object-oriented overhead like virtual functions.

10-100x for Python

Python Interpreter Bypass

Python code compiled to native machine code, bypassing the interpreter entirely. Makes Python production-ready.

C++ speedup: 6-100x

Object-Oriented Overhead Elimination

No virtual functions, no pointer chasing, no abstraction layers. Direct machine code for maximum performance.

Key Capabilities

What makes AADC uniquely powerful

Adjoint Factor < 1

AADC achieves the remarkable result where computing value AND all first-order sensitivities is faster than original code computing just the value—breaking the theoretical 'primal barrier'.

Value + Sensitivities Faster Than Original

Traditional AAD has adjoint factor 2-5x. AADC achieves <1x relative to original code.

Scenario Acceleration

20-50x speedup when running multiple scenarios through the same kernel.

How is this possible?

While AAD fundamentally reduces sensitivity computation from O(n) bumps to O(1) adjoint pass, AADC's JIT compilation, vectorization, and multi-threading optimizations make the combined value+sensitivities calculation faster than the original interpreted value-only calculation.

What Determines Your Speedup?

AADC delivers speedups ranging from 6× to over 1000× depending on your specific situation. The actual performance gain depends on several factors:

Original Code Design

High-level implementations see the largest gains; already low-level optimized code sees smaller but still significant improvements

Programming Language

Python code typically sees larger speedups than C++ due to interpreter overhead elimination

Existing Optimizations

Code without vectorization or multithreading benefits more from AADC's automatic optimizations

Model Complexity

Complex models with many operations benefit from kernel compilation and memory optimization

Computation Repetition

Scenarios run many times (Monte Carlo, batch evaluation) amortize recording overhead for maximum gains

Number of Sensitivities

More sensitivities = larger advantage from AAD's O(1) adjoint pass vs O(n) bump-and-recompute

Contact us for a benchmark with your specific code to understand your expected performance improvement.

AADC Developer Documentation

Complete API reference, integration guides, and code examples

View Documentation

See AADC in Action

Now that you understand how it works, explore what you can build with it