Performance & Hardware
Hardware-acceleratedNo Python, no native runtimes. SIMD-optimized pure .NET
AiDotNet is built for performance from the ground up. SIMD Vector<T> operations for CPU-level optimization throughout the codebase. GPU acceleration via CUDA and OpenCL. AOT compilation support for instant startup. Zero Python dependency means no GIL bottleneck, no interop overhead, and deployable anywhere .NET runs.
SIMD & CPU Vectorization
Leverage CPU vector instructions for parallel numeric computation.
Vector<T> Operations
.NET hardware-intrinsic SIMD for tensor operations throughout the codebase.
AVX/AVX2/AVX-512
Automatic use of widest available SIMD instruction set.
ARM NEON
ARM SIMD support for mobile and edge deployment.
Span<T> Fast Paths
Zero-allocation memory access with Span<T> throughout the codebase.
Memory<T> Pooling
Pooled memory allocations reducing GC pressure during training.
GPU Acceleration
Offload compute-intensive operations to GPU hardware.
CUDA Support
NVIDIA GPU acceleration for training and inference.
OpenCL Support
Cross-vendor GPU compute for AMD, Intel, and NVIDIA.
Mixed Precision
FP16 and BF16 computation for 2x GPU throughput.
GPU Memory Management
Smart memory pooling and gradient checkpointing for large models.
Compilation & Deployment
Compile and deploy models with maximum efficiency.
NativeAOT
Ahead-of-time compilation for instant startup and small binaries.
.NET 10
Full support for the latest .NET with cutting-edge optimizations.
.NET Framework 4.7.1
Backward compatibility with legacy .NET Framework applications.
Trimming Compatible
IL trimming support for minimal deployment size.
Zero Dependencies
No Python runtime, no native libraries, no interop overhead.
No Python Required
Unlike TorchSharp (700MB LibTorch) or TF.NET, zero native runtime needed.
No GIL Bottleneck
.NET true multi-threading without Python Global Interpreter Lock.
Cross-Platform
Windows, Linux, macOS, ARM - anywhere .NET runs.
Self-Contained Deploy
Single-file deployment with everything included.
Hardware-accelerated training with AiModelBuilder
using AiDotNet;
// AiModelBuilder uses SIMD and hardware acceleration automatically
var result = await new AiModelBuilder<float, float[], float>()
.ConfigureModel(new NeuralNetwork<float>(
inputSize: 1024, hiddenSize: 512, outputSize: 10))
.ConfigureOptimizer(new AdamOptimizer<float>())
.ConfigurePreprocessing()
.BuildAsync(features, labels);
// All tensor operations are SIMD-vectorized via Vector<T>
// GPU acceleration, mixed precision, and AOT compilation
// are enabled automatically based on available hardware
var prediction = result.Predict(newSample); Start building with Performance & Hardware
All Hardware-accelerated implementations are included free under Apache 2.0.