Performance & Hardware

Hardware-accelerated

No Python, no native runtimes. SIMD-optimized pure .NET

AiDotNet is built for performance from the ground up. SIMD Vector<T> operations for CPU-level optimization throughout the codebase. GPU acceleration via CUDA and OpenCL. AOT compilation support for instant startup. Zero Python dependency means no GIL bottleneck, no interop overhead, and deployable anywhere .NET runs.

Real-Time Inference Edge Deployment Embedded Systems Cloud Computing Mobile AI IoT Devices Gaming AI Latency-Critical Apps

SIMD & CPU Vectorization

Leverage CPU vector instructions for parallel numeric computation.

Vector<T> Operations

.NET hardware-intrinsic SIMD for tensor operations throughout the codebase.

AVX/AVX2/AVX-512

Automatic use of widest available SIMD instruction set.

ARM NEON

ARM SIMD support for mobile and edge deployment.

Span<T> Fast Paths

Zero-allocation memory access with Span<T> throughout the codebase.

Memory<T> Pooling

Pooled memory allocations reducing GC pressure during training.

GPU Acceleration

Offload compute-intensive operations to GPU hardware.

CUDA Support

NVIDIA GPU acceleration for training and inference.

OpenCL Support

Cross-vendor GPU compute for AMD, Intel, and NVIDIA.

Mixed Precision

FP16 and BF16 computation for 2x GPU throughput.

GPU Memory Management

Smart memory pooling and gradient checkpointing for large models.

Compilation & Deployment

Compile and deploy models with maximum efficiency.

NativeAOT

Ahead-of-time compilation for instant startup and small binaries.

.NET 10

Full support for the latest .NET with cutting-edge optimizations.

.NET Framework 4.7.1

Backward compatibility with legacy .NET Framework applications.

Trimming Compatible

IL trimming support for minimal deployment size.

Zero Dependencies

No Python runtime, no native libraries, no interop overhead.

No Python Required

Unlike TorchSharp (700MB LibTorch) or TF.NET, zero native runtime needed.

No GIL Bottleneck

.NET true multi-threading without Python Global Interpreter Lock.

Cross-Platform

Windows, Linux, macOS, ARM - anywhere .NET runs.

Self-Contained Deploy

Single-file deployment with everything included.

Hardware-accelerated training with AiModelBuilder

C#
using AiDotNet;

// AiModelBuilder uses SIMD and hardware acceleration automatically
var result = await new AiModelBuilder<float, float[], float>()
    .ConfigureModel(new NeuralNetwork<float>(
        inputSize: 1024, hiddenSize: 512, outputSize: 10))
    .ConfigureOptimizer(new AdamOptimizer<float>())
    .ConfigurePreprocessing()
    .BuildAsync(features, labels);

// All tensor operations are SIMD-vectorized via Vector<T>
// GPU acceleration, mixed precision, and AOT compilation
// are enabled automatically based on available hardware
var prediction = result.Predict(newSample);

Start building with Performance & Hardware

All Hardware-accelerated implementations are included free under Apache 2.0.