Technology Deep Dive

How AADC Works

AADC extracts a computational graph (DAG) from your existing analytics at runtime, then compiles it to optimized machine code. It is not a source code compiler—it uses operator overloading to capture operations, then generates vectorized x86-64 binary that eliminates interpreter overhead, object-oriented abstractions, and memory bandwidth bottlenecks.

6-1000×

Performance Improvement

Adjoint Factor

~200×

Compilation Speed

90%

Cloud Cost Reduction

How It Works

How AADC transforms code into optimized kernels

Code Generation AAD™ uses operator overloading to trace the exact sequence of operations performed by your object-oriented code (C++, C#, or Python) at runtime, extracting a computational graph (DAG). This DAG is then compiled to optimized machine code kernels. AADC is not a source code compiler—it preserves your coding style and abstractions while eliminating their runtime penalties, delivering native performance on standard CPUs with minimal code changes (replacing double with idouble in hot sections).

The generated kernels are reusable for repeated executions (e.g., thousands or millions of simulation paths, scenarios, or model calibrations), achieving 6–1000× speedups for both forward evaluations and first- or higher-order derivatives.

Designed for cross-platform execution, Code Generation AAD™ works across scientific simulations (PDEs, geophysics), quantitative finance, machine learning workflows (time series, recurrent networks), and other domains requiring repetitive high-performance computation.

Kernel creation time is pivotal, as it forms part of overall execution. Using off-the-shelf compilers (e.g., LLVM or C++) can take time equivalent to over 10,000 executions of the original code, making it prohibitive for smaller or dynamic simulations. This renders traditional code generation sufficient for testing but impractical for production, where minimizing compilation overhead is essential.

MatLogica's Innovation: Combining Operator Overloading with On-the-Fly Code Generation

Traditional high-performance computing often forces a difficult trade-off: write in productive, high-level object-oriented languages (Python, C++, C#) and accept runtime overhead, or drop to low-level assembly/machine code for maximum speed — sacrificing readability and development velocity.

Highly Optimized Code Generation

Fast compilation speed enables practical performance gains in production systems.

No External Dependencies

Avoids slow off-the-shelf tools, ensuring reliable integration across platforms.

AADC Workflow

How AADC transforms your code into optimized kernels

Phase 0

Normal Execution

Regular program execution with I/O, setup, and data loading

Program initialization

Load data, read configuration, initialize objects

I/O operations

File reading, network calls, database queries

Non-numerical setup

Any operations outside the hot computational section

Timing: Use native double (no overhead) or idouble (~2% overhead) for zero-cost transition to Phase 1

Phase 1

Recording/Compiling

Capture hot section and generate optimized machine code

Active types capture operations

Replace double with idouble. Outside of recording, idouble has ~2% overhead vs native double.

Computational graph construction

Sequence of elementary operations automatically captured into DAG

Code generation

Forward and adjoint passes generated in parallel; adjoint machine code generated backwards

Graph-level optimizations

Constant folding, dead code elimination, optimal register allocation

Timing: Recording/compilation adds overhead compared to a single execution. One-time cost amortized over kernel executions.

Phase 2

Kernel Execution

Multi-threaded execution even if original code isn't thread-safe

Multi-threaded execution

Run kernels across multiple threads even if original code isn't thread-safe

Values + first-order sensitivities

Single kernel execution returns value and all first-order sensitivities

Higher-order derivatives

Bump-and-revalue (finite difference) of first-order derivatives provides best results

Cloud deployment

Kernels are serializable - deploy to cloud for elastic compute without exposing source code

Performance: 6-1000x faster than original code, with adjoint factor <1

Phase 3

Kernel Reuse (Optional)

Reuse kernels for real-time and remote computation services

Serialize kernels

Save compiled kernels to disk for later reuse without recompilation

Cross-platform deployment

Kernels are compatible between Windows and Linux

Real-time services

Deploy kernels for always-on computation and real-time processing

Remote execution

Distribute kernels to remote compute nodes without exposing source code

Timing: Optional phase for production deployment scenarios

Technical Benefits

Not just AAD—all calculations benefit from JIT compilation

4-8x

Automatic Vectorization

AVX2/AVX512 SIMD processes 4-8 doubles per CPU cycle. What would require manual assembly coding happens automatically.

10-50x on multi-core

Automatic Multi-Threading

Multi-threaded execution even if original code isn't thread-safe. Scales linearly with cores.

2-3x

Cache Optimization

Optimal memory layout for modern CPU architecture. Data arranged for maximum cache efficiency and minimal bandwidth.

Additional efficiency gains

Code Compression

Better instruction cache usage through optimized code generation. Eliminates object-oriented overhead like virtual functions.

10-100x for Python

Python Interpreter Bypass

Python code compiled to native machine code, bypassing the interpreter entirely. Makes Python production-ready.

C++ speedup: 6-100x

Object-Oriented Overhead Elimination

No virtual functions, no pointer chasing, no abstraction layers. Direct machine code for maximum performance.

Key Capabilities

What makes AADC uniquely powerful

Deployment Strategies

AADC binary kernels run anywhere - from individual developer machines to enterprise data centers to cloud infrastructure. Choose the deployment strategy that matches your security requirements, performance needs, and existing infrastructure.

›Full flexibility - deploy anywhere your infrastructure allows
›No vendor lock-in - same kernels work across all environments
›Up to 99% cloud cost reduction when using cloud deployment
›Source code not exposed - only binary kernels deployed

Learn More About Cloud Execution

Mix-Mode Execution

Revolutionary Python accelerator delivering 1000x performance improvements for quantitative modeling and Monte Carlo simulations with automatic differentiation. Proven 10x+ faster than JAX, PyTorch, and TensorFlow for quantitative finance workloads.

›1000x faster Python for quantitative workloads
›10x+ faster than JAX, PyTorch, TensorFlow
›Native C++ and Python two-way integration
›90% cloud cost reduction

Learn More About Mix-Mode Execution

Adjoint Factor < 1

AADC achieves the remarkable result where computing value AND all first-order sensitivities is faster than original code computing just the value—breaking the theoretical 'primal barrier'.

Value + Sensitivities Faster Than Original

Traditional AAD has adjoint factor 2-5x. AADC achieves <1x relative to original code.

Scenario Acceleration

20-50x speedup when running multiple scenarios through the same kernel.

How is this possible?

While AAD fundamentally reduces sensitivity computation from O(n) bumps to O(1) adjoint pass, AADC's JIT compilation, vectorization, and multi-threading optimizations make the combined value+sensitivities calculation faster than the original interpreted value-only calculation.

Read the technical paper: Breaking the Primal Barrier

What Determines Your Speedup?

AADC delivers speedups ranging from 6× to over 1000× depending on your specific situation. The actual performance gain depends on several factors:

Original Code Design

High-level implementations see the largest gains; already low-level optimized code sees smaller but still significant improvements

Programming Language

Python code typically sees larger speedups than C++ due to interpreter overhead elimination

Existing Optimizations

Code without vectorization or multithreading benefits more from AADC's automatic optimizations

Model Complexity

Complex models with many operations benefit from kernel compilation and memory optimization

Computation Repetition

Scenarios run many times (Monte Carlo, batch evaluation) amortize recording overhead for maximum gains

Number of Sensitivities

More sensitivities = larger advantage from AAD's O(1) adjoint pass vs O(n) bump-and-recompute

Request Custom Benchmark

AADC Developer Documentation

Complete API reference, integration guides, and code examples

View Documentation

Explore the Technical Breakthroughs

Performance Benchmarks

Independent benchmarks demonstrating AADC's performance advantages over traditional AAD approaches, GPU implementations, and ML frameworks.

Explore

AADC C++: 28x Thread Scaling with Linear Predictability

AADC C++ achieves near-linear scaling across threads (28.5x at 32 threads), trades (100x for 10-1000), and scenarios (50x for 10K-500K). Memory scales only with scenarios (~1.9 MB per 1K). Predictable performance enables accurate capacity planning.

Explore

Deployment Strategies for AADC

Explore

MatLogica AADC Quantitative Development Toolkit

Combine ease of integration with high performance. A toolkit that delivers 6-1000x speedups and automatic derivatives with <1% code changes. Focus on your models—AADC handles performance and AAD automatically.

Explore

See AADC in Action

Now that you understand how it works, explore what you can build with it

AADC Toolkit Python Accelerator

Frequently Asked Questions

How does AADC work?

AADC is a specialized JIT compiler that: (1) Uses active types (idouble) to analyze your code's computational operations at runtime, (2) Constructs optimized computational graphs with graph-level optimizations like constant folding and dead code elimination, (3) Generates highly optimized AVX2/AVX512 machine code kernels in milliseconds. The resulting kernels can be executed thousands of times with different inputs, achieving 6-1000x performance improvements with automatic derivatives included.

What is Code Generation AAD?

Code Generation AAD™ extracts a computational graph (DAG) from object-oriented code (C++, C#, or Python) at runtime and compiles it to optimized machine code. Unlike traditional AAD which adds overhead, Code Generation AAD generates optimized binary kernels with fast compilation speeds that make it practical for production use.

How is AADC similar to TensorFlow and PyTorch?

AADC applies the same computational graph techniques that power ML frameworks. Like TensorFlow and PyTorch, AADC records a computational graph once, optimizes it globally, then executes it many times. The key difference is AADC is optimized for scalar operations (common in scientific computing) and includes automatic adjoint differentiation for sensitivity computation.

How fast is AADC's JIT compilation?

AADC's proprietary JIT compiler achieves compilation in milliseconds, compared to seconds or minutes with traditional compilers like LLVM/Clang. This makes real-time compilation practical for applications where graphs may need to be recompiled frequently.

What performance improvements can I expect from AADC?

AADC delivers speedups ranging from 6× to over 1000× depending on several factors: your original code design, programming language (Python vs C++), existing optimizations (vectorization, multithreading), model complexity, how often computations are repeated (e.g., Monte Carlo scenarios), and the number of sensitivities required. All cases include automatic differentiation with adjoint factor <1, meaning you get all first-order sensitivities faster than just computing the value. Contact us for a custom benchmark with your specific code.

What are binary kernel properties?

AADC's compiled binary kernels are: multi-thread safe (even if original code isn't), vectorized with AVX2/AVX512 SIMD, serializable for cloud deployment, language-agnostic (callable from C++, Python, C#, Excel), and NUMA-aware for optimal memory access. Source code is not included in kernels.

How does AADC enable cloud deployment?

AADC supports hybrid cloud architectures where sensitive operations (model development, kernel recording) happen on-premises while computation-heavy execution runs in the cloud. Only binary kernels are deployed to the cloud - source code is not included.