AADCFunctions(aka Kernel)

The AADCFunctions class is the central orchestrator of AADC’s JIT compilation and automatic differentiation capabilities. It manages the complete lifecycle from recording computational graphs to executing optimized kernels for both forward evaluation and reverse-mode automatic differentiation.

Overview

AADCFunctions is a templated class defined in aadc/aadc.h that controls:

Recording management: Starting and stopping computational graph capture
Kernel compilation: JIT compilation of recorded operations into optimized machine code
Workspace creation: Providing execution contexts for compiled kernels
Forward/reverse execution: Computing function values and derivatives

#include <aadc/aadc.h>

typedef __m256d mmType;  // AVX2 vectorization (4 doubles)
aadc::AADCFunctions<mmType> aadc_kernel;

Template Parameters

Vectorization Types

The template parameter specifies the SIMD instruction set for vectorized execution:

Type	Instruction Set	Elements	CPU Requirement
`__m256d`	AVX2	4 doubles	Intel Haswell+, AMD Zen+
`__m512d`	AVX512	8 doubles	Intel Skylake-X+, AMD Zen4+

Performance Note: Vectorized types enable processing multiple input scenarios simultaneously, providing significant performance improvements for Monte Carlo simulations and sensitivity analysis.

Scalar Support: In current AADC versions, scalar execution is not supported. You can use vector types with a single element (e.g., _mm256_set1_pd(value)). Contact support for scalar execution if it’s a critical requirement.

64bit only: AADC currently supports only 64-bit architectures and 64-bit types in compiled kernels. I.e. internally floating point types are double, integer types are int64_t and bool types are uint64_t bitmasks.

Core Workflow

1. Recording Phase

The recording phase captures the computational graph of your mathematical operations:

// Start recording operations
aadc_kernel.startRecording();

// Mark input variables. These are leaf nodes in the computational graph.
aadc::AADCArgument spot_arg = spot.markAsInput();
aadc::AADCArgument vol_arg = volatility.markAsInput();
aadc::AADCArgument rate_arg = rate.markAsDiff();  // Parameter for derivatives

// Execute mathematical operations
idouble option_price = black_scholes_formula(spot, strike, vol, rate, maturity);

// Mark output variables
aadc::AADCResult price_result = option_price.markAsOutput();

// Stop recording and compile
aadc_kernel.stopRecording();

Important: Only operations executed between startRecording() and stopRecording() are captured in the computational graph.

After stopping the recording, the kernel is compiled into optimized machine code. The resulting kernel is immutable, can be executed multiple times and is safe for concurrent use across threads.

2. Workspace Creation

Workspaces provide execution context and memory management for compiled kernels:

std::shared_ptr<aadc::AADCWorkSpace<mmType>> workspace = aadc_kernel.createWorkSpace();

Workspace hold input values, intermediate variables, and output results. For adjoint derivative calculations, they also manage adjoint values as well as “stack” memory for intermediate values needed during reverse execution.

Multiple workspaces can share the same compiled kernel, enabling:

Thread-safe parallel execution: Each thread uses its own workspace
Different input scenarios: Multiple parameter sets can be evaluated simultaneously
Memory isolation: Workspaces maintain separate state. Making them thread-local avoids synchronization issues.

3. Forward Execution

The forward pass computes function values at specified input points:

// Set input values (vectorized for 4 scenarios)
workspace->setVal(spot_arg, _mm256_set_pd(95.0, 100.0, 105.0, 110.0));
workspace->setVal(vol_arg, 0.25);  // Same value for all scenarios
workspace->setVal(rate_arg, _mm256_set_pd(0.02, 0.03, 0.04, 0.05));

// Execute forward pass
aadc_kernel.forward(*workspace);

// Access results
for (int i = 0; i < 4; ++i) {
    std::cout << "Option price[" << i << "] = " 
              << workspace->valp(price_result)[i] << std::endl;
}

4. Reverse(Adjoint) Execution

The reverse pass computes derivatives using automatic differentiation also known as backpropagation:

// Set adjoint seeds (typically 1.0 for derivatives)
workspace->setDiff(price_result, 1.0);

// Execute reverse pass
aadc_kernel.reverse(*workspace);

// Access derivatives
for (int i = 0; i < 4; ++i) {
    std::cout << "Delta[" << i << "] = " << workspace->diffp(spot_arg)[i] << std::endl;
    std::cout << "Rho[" << i << "] = " << workspace->diffp(rate_arg)[i] << std::endl;
}

Recording Control Methods

startRecording()

void startRecording();

Begins computational graph capture. All operations on active types (idouble, ibool, iint) after this call are recorded for compilation.

stopRecording()

void stopRecording();

Ends recording and triggers JIT compilation. The resulting kernel is optimized and ready for execution. After this call, the kernel becomes immutable.

Recording State Query

class idouble {
public:
    static bool isRecording();  // Check if any kernel is currently recording
};

Execution Methods

forward()

void forward(AADCWorkSpace<mmType>& workspace);

Executes the compiled forward kernel, computing function values at input points specified in the workspace.

Performance: Forward execution typically runs 5-100x faster than the original code due to JIT optimization and vectorization.

reverse()

void reverse(AADCWorkSpace<mmType>& workspace);

Executes the compiled reverse kernel, computing derivatives via automatic differentiation. Must be called after forward().

Performance: Combined forward+reverse execution often runs faster than the original code computing just function values.

Workspace Management

createWorkSpace()

std::shared_ptr<AADCWorkSpace<mmType>> createWorkSpace();

Creates a new workspace associated with this kernel. Workspaces must be thread local and provide isolated execution contexts.

Memory Requirements

Workspace memory usage depends on:

Maximum number of intermediate variables: Determined by computational complexity
Vectorization width: 4x memory for AVX2, 8x for AVX512
Differentiation requirements: Additional memory for adjoint values and required intermediate values

Advanced Features

Conversion Monitoring

Track active-to-passive conversions during recording for debugging:

aadc_kernel.stopRecording();

// Check for problematic conversions
auto warnings = aadc_kernel.getPassiveWarnings();
if (!warnings.empty()) {
    std::cout << "Found " << warnings.size() << " A2P conversions" << std::endl;
    aadc_kernel.printPassiveExtractLocations(std::cout, "MyKernel");
}

Configuration Options

// Enable debugger break on stochastic conversions
aadc_kernel.setOption(aadc::AADC_BreakOnActiveBoolConversion, 1);

Available options include debugging aids, optimization settings, and error handling configurations.

// TODO: Add options table

Performance Characteristics

Compilation Time

Initial overhead: Recording and compilation add one-time cost
Amortization: Cost amortized over multiple kernel executions
Typical range: Milliseconds to seconds depending on complexity

Execution Performance

Forward acceleration: 5-100x speedup over original code
Derivative computation: Often faster than finite differences
Memory efficiency: Optimized memory access patterns
Vectorization: Automatic utilization of SIMD instructions

Memory Usage

Kernel storage: Compiled machine code (typically KBs to MBs)
Workspace memory: Proportional to problem size and vectorization width
Stack usage: Minimal due to optimized use of “tape” memory for adjoint calculations

Thread Safety

Kernel Objects

Immutable after compilation: Kernels are read-only after stopRecording()
Thread-safe sharing: Multiple threads can share compiled kernels
No synchronization needed: For kernel execution

Workspace Objects

Thread-local: Each thread should use its own workspace
Not thread-safe: Individual workspaces should not be shared between threads
Independent state: Workspaces maintain separate execution state

More advanced discussion of thread safety can be found in the Multi-threading Contract section.

Recommended Pattern

// Global or shared kernel (thread-safe)
aadc::AADCFunctions<mmType> shared_kernel = compile_pricing_kernel();

// Thread-local execution
void thread_worker() {
    auto workspace = shared_kernel.createWorkSpace();  // Thread-local
    
    // Safe to use workspace in this thread only
    execute_pricing_scenarios(shared_kernel, *workspace);
}