Escape GPU vendor lock-in with minimal code changes. Run your CUDA code on scalable CPUs with comparable performance, plus the added benefit of Automatic Adjoint Differentiation (AAD). Make informed hardware decisions for your quantitative finance applications.
Why CPU has caught up with GPU
Many financial institutions committed to CUDA/GPU years ago, pursuing proclaimed 100-1000x performance gains against CPU. The substantial investment in transitioning analytics to CUDA seemed justified at the time.
Since then, CPU-based systems have made a quantum leap in parallel compute capacity. Modern CPUs are now comparable to, and sometimes exceed, GPU systems when total cost of ownership is properly accounted for.
Organizations with existing CUDA projects can now assess performance gain or loss for transitioning from GPU to modern CPU systems. With minimal code changes, existing GPU-only code can be adapted to run on CPU or GPU simultaneously.
Cloud cost analysis reveals the truth about GPU vs CPU pricing
According to the most trustworthy and impartial benchmark (STAC-A2), when hardware manufacturers invest maximum software development effort to extract top performance, CPU and GPU go neck-and-neck.
| Feature | GPU (V100) | CPU (Xeon) |
|---|---|---|
| Number of cores | 5,120 | 56 x 2 |
| Clock frequency | 877 MHz | 2.6 GHz |
| Operations per clock (float precision) | 1 | 32 |
| FMA | 2 | 2 |
| TFLOPS (all of above multiplied) | 8.98 | 18.64 |
| Approx monthly cost (GCP) | $1,300 | $3,416 |
| Approx monthly cost per TFLOP | $145 | $183 |
The average cost of a CPU TFLOP is ~30% higher than GPU. Therefore, the maximum theoretical saving is about 30%, not 1000x! When factoring in development costs, maintenance, specialist developers, and vendor lock-in, the true economics often favor CPU.
What many banks discovered after their CUDA migration
Until now, technology similar to CUDA for safe multithreading was unavailable on CPU. MatLogica's AADC changes this. Unlike CUDA, AADC uses existing C++ object-oriented code to generate optimized kernels for scalable CPU execution with minimal developer effort.
Run your GPU code on scalable CPUs with minimal changes
AADC can simply reuse existing CUDA analytics implemented for GPU and run them on scalable CPUs instead. With minimal changes, existing CUDA code can be adapted for AADC and executed using multi-threading and vectorization on CPU to achieve top performance.
Unlike GPU, CPU has plenty of memory to solve large problems and natively supports AAD!
CUDA mainly uses C++ syntax with extensions for parallel programming and GPU management. The AADC approach records scalable CPU kernels by executing original user code for one data sample (e.g., one Monte Carlo path).
More complex problems such as American Monte Carlo pricing and xVA calculations can be handled with a similar approach, with only modest increases in code complexity.
Step-by-step guide to enable CPU execution of CUDA code
To run existing CUDA code with AADC on the CPU, we disable CUDA extensions to make the code compatible with standard C++ compiler and ready for AADC kernel compilation.
Change native types to active AADC types
Disable GPU-specific directives and API calls
Reference your existing CUDA kernel code
Clean up preprocessor definitions
Compile and execute with AADC
Run simulation across multiple CPU cores
#define double idouble
#define bool ibool
// Override CUDA extensions:
#define __global__
void __syncthreads() {};
struct { int x = 0; } threadIdx;
struct { int x = 0; } blockIdx;
struct { int x = 0; } blockDim;
#include "kernel.cu" // Original user CUDA kernel
// Revert back overrides:
#undef double
#undef bool
#undef __global__
// Normal C++ code follows here with AADC compilation
In real projects, this code can be wrapped for simplified use. The explicit approach is shown here for demonstration purposes.
Real-world equity derivative pricing comparison
| Machine | Price | Time | Ratio |
|---|---|---|---|
| NVIDIA V100 (GPU) | $1,300 USD/month | 10.2 ms | 7.8 ms/$1000 |
| CPU, 30 threads + AVX512 | $915 USD/month | 13.5 ms | 14.8 ms/$1000 |
Performance Difference: 32% slower, but 30% cheaper - plus AAD support impossible on GPU
Results are preliminary and being validated by hardware vendors
Practical steps for organizations with existing CUDA investments
Understand your CUDA investment and constraints
Test AADC with representative workload
Maintain both GPU and CPU capabilities
Migrate workloads strategically
With minimal changes, it's possible to run CUDA code on scalable 64-bit CPU and take advantage of AAD as an additional benefit.
We've demonstrated it's reasonably simple to support existing CUDA projects for dual CPU and GPU builds.
The performance gained from transitioning CPU to GPU often comes from the shift to matrix-vector multiplication paradigm, not just hardware - benefits that AADC brings to CPU.
Get a comprehensive benchmark of your CUDA code running on modern CPUs with AADC. Experience comparable performance with the added benefits of AAD support and larger memory capacity - capabilities impossible on GPU.