Reinforcement Learning

50+

From classic Q-learning to transformer-based agents

The most comprehensive RL toolkit for .NET. 50+ agent implementations spanning policy gradient, off-policy, model-based, offline, and multi-agent methods. Includes modern architectures like Decision Transformer and MuZero alongside proven algorithms like PPO and SAC.

Game AI Robotics Control Trading Strategies Resource Optimization Autonomous Driving Recommendation Systems Network Routing Supply Chain

Policy Gradient Methods

Directly optimize the policy for stable, sample-efficient training.

PPO

Proximal Policy Optimization with clipped surrogate objective. The workhorse of modern RL.

TRPO

Trust Region Policy Optimization with KL divergence constraint.

A2C / A3C

Advantage Actor-Critic with synchronous and asynchronous variants.

REINFORCE

Classic policy gradient with Monte Carlo returns.

GRPO

Group Relative Policy Optimization for language model alignment.

DAPO

Dynamic Alignment Policy Optimization for RLHF fine-tuning.

Off-Policy Methods

Learn from replay buffers for maximum sample efficiency.

SAC

Soft Actor-Critic with entropy regularization for maximum exploration.

TD3

Twin Delayed DDPG addressing overestimation bias.

DDPG

Deep Deterministic Policy Gradient for continuous action spaces.

DQN

Deep Q-Network with experience replay and target network.

Rainbow DQN

Combining 6 DQN improvements: Double, Dueling, PER, C51, NoisyNets, n-step.

IQN

Implicit Quantile Networks for full return distribution modeling.

Model-Based Methods

Learn world models for planning and imagination-based training.

MuZero

Learn dynamics model for planning without environment rules (AlphaGo evolution).

AlphaZero

Self-play with MCTS for perfect information games.

Dreamer / DreamerV3

World model with latent imagination for sample-efficient learning.

MBPO

Model-Based Policy Optimization with short model rollouts.

World Models

VAE + RNN world model for learning in imagination.

Offline & Transformer RL

Learn from fixed datasets and use sequence modeling for decision-making.

Decision Transformer

Cast RL as sequence modeling with return-conditioned generation.

CQL

Conservative Q-Learning preventing overestimation on unseen actions.

IQL

Implicit Q-Learning without querying out-of-distribution actions.

TD3+BC

TD3 with behavior cloning regularization for offline RL.

Multi-Agent RL

Coordinate multiple agents in cooperative and competitive environments.

MAPPO

Multi-Agent PPO with centralized training, decentralized execution.

MADDPG

Multi-Agent DDPG for mixed cooperative-competitive scenarios.

QMIX

Monotonic value factorization for cooperative multi-agent tasks.

COMA

Counterfactual Multi-Agent policy gradients.

RL agent with AiModelBuilder

C#
using AiDotNet;

// Train an RL agent with AiModelBuilder
var result = await new AiModelBuilder<float, float[], float>()
    .ConfigureModel(new PPOAgent<float>(
        stateDim: 4, actionDim: 2,
        clipRange: 0.2f))
    .ConfigureReinforcementLearning(new RLOptions(
        totalTimesteps: 100_000))
    .ConfigureOptimizer(new AdamOptimizer<float>(lr: 3e-4f))
    .BuildAsync();

var action = result.Predict(observation);

Start building with Reinforcement Learning

All 50+ implementations are included free under Apache 2.0.