What is AADC?
AADC (Automatic Adjoint Differentiation Compiler) is a just-in-time compiler that transforms mathematical computations into highly optimized machine code. While automatic differentiation is a core capability, AADC provides two distinct benefits:
Dual Purpose: Acceleration and Differentiation
Performance Acceleration: AADC can significantly speed up repeated mathematical computations, even when derivatives aren’t needed. It compiles your calculations once and generates optimized binary code that leverages modern CPU features like SIMD vectorization (AVX2/AVX512) and multi-threading. This can provide substantial speedups for computationally intensive code.
Automatic Differentiation: When derivatives are required, AADC computes both the original function and all its partial derivatives in a single execution pass. The reverse-mode automatic differentiation, when combined with JIT acceleration, often computes gradients faster than the original code computes just the function value.
How AADC Achieves “Faster than Original Code” Performance
Traditional automatic differentiation theory suggests that computing derivatives should require at least as much computational work as computing the original function, with reverse-mode AD typically requiring 2-4x the computation time. AADC breaks this theoretical barrier through several key innovations:
JIT Compilation Acceleration
When AADC records your computation, it doesn’t just build a computational graph—it compiles optimized machine code. This compiled version often runs significantly faster than the original code because:
- Eliminated overhead: Function call overhead, loop overhead, and conditional checks are minimized
- Optimized instruction sequences: Mathematical operations are compiled into efficient CPU instruction patterns
- Memory access optimization: Data layout and access patterns are optimized for cache efficiency and CPU register use
SIMD Vectorization
AADC automatically vectorizes operations to process multiple values simultaneously:
- AVX2: Processes 4 double-precision values in parallel
- AVX512: Processes 8 double-precision values in parallel
- Minimal code changes required: Existing scalar code can be implemented using scalar operations while automatically benefiting from vectorization
Combined Forward and Reverse Efficiency
During the reverse pass, AADC:
- Reuses forward computations: Intermediate values computed during the forward pass are efficiently reused
- Optimized adjoint propagation: The reverse sweep is compiled to minimize memory access and maximize instruction throughput
- Eliminates redundant operations: Common subexpressions and redundant calculations are eliminated
Practical Example
Consider a Monte Carlo option pricing calculation:
Original code (1000 paths): 100ms
AADC forward only: 15ms (6.7x speedup)
AADC forward + all gradients: 20ms (5x speedup vs original)
In this example, computing the function value plus all derivatives with AADC takes only 20% of the time required by the original code to compute just the function value.
When AADC Works Well
AADC is particularly effective for computational patterns common in quantitative finance and scientific computing, which differ significantly from machine learning workloads:
AADC’s Sweet Spot: Computational graphs with very large numbers of “light” scalar operations (millions of simple mathematical operations like additions, multiplications, function calls). Traditional Machine Learning frameworks are not designed for this pattern and perform poorly with such fine-grained operations.
Machine Learning Tools’ Sweet Spot: Small numbers of “heavy” tensor operations (matrix multiplications, convolutions on large arrays). Machine Learning frameworks excel at these coarse-grained operations but have significant overhead for scalar computations.
Specific Applications Where AADC Excels:
- Quantitative finance models: Option pricing, risk calculations, portfolio optimization
- Machine learning: Gradient-based optimization, neural network training
- Scientific computing: Numerical optimization, parameter estimation
- Monte Carlo simulations: Especially those with continuous mathematical operations
- Analytical functions: Problems dominated by continuous mathematical expressions
- Legacy code modernization: Improving and accelerating complex legacy analytics code, with AADC successfully applied to multi-million line of code projects
These domains typically involve smooth, differentiable functions with relatively straightforward control flow.
When AADC Is Not Suitable
AADC has limitations and isn’t appropriate for all computational problems:
- Heavily branching algorithms: Code with extensive conditional logic based on computed values
- Discrete state models: Algorithms that rely on complex discrete decision trees
- String processing or symbolic computation: Non-numerical operations
- Database operations or I/O intensive tasks: Operations outside mathematical computation
The key limitation is that AADC requires the computational graph to be relatively continuous and predictable. Applications with significant algorithmic branching or discrete state transitions may still benefit from AAD capabilities for parameter calibration and sensitivity analysis, but often require careful analytical modifications such as smoothing techniques to transform discrete problems into continuous ones.
Language Support
AADC currently supports three programming languages:
- C++: Full-featured implementation with complete access to all capabilities
- Python: Native integration allowing seamless use within Python workflows
- Mixed C++/Python: User analytics can be implemented across both languages, with AADC recording computations seamlessly regardless of which language defines each component
- C#: .NET-compatible implementation for enterprise and Windows environments
Each language binding provides appropriate interfaces while maintaining the same underlying compilation and optimization engine.
Technical Approach
Unlike traditional automatic differentiation tools that build computational tapes, AADC directly compiles native binary kernels optimized for your specific hardware and specific run-time problem definition. Using available information at runtime significantly boosts performance by essentially forming a custom program for the specific task configuration. This approach enables both the performance acceleration and efficient derivative computation, but requires that your mathematical operations can be effectively represented in this compiled form.
The “recording” phase happens once when you define your computation, after which the generated kernels can be executed repeatedly with different input values, making it particularly efficient for scenarios requiring many function evaluations.
Performance Expectations
While AADC can provide dramatic speedups for suitable problems, performance gains depend on several factors:
- Mathematical complexity: More complex functions benefit more from compilation
- Repetition count: Benefits increase with more function evaluations
- Hardware features: Modern CPUs with AVX512 see larger gains
- Code structure: AADC can transform complex object-oriented code with irregular memory access patterns into highly cache-efficient compiled kernels
- Function continuity: AADC handles two types of branches differently: (1) “stochastic” branches that depend on function inputs require analytical changes to linearize computations, bringing both differentiability and performance gains; (2) “static” branches that depend on problem configuration are effectively eliminated during the recording stage
For typical quantitative applications, speedups of 10-100x are common, with the “faster than primal” differentiation providing additional value for optimization and sensitivity analysis workflows.