Reinforcement Learning
50+From classic Q-learning to transformer-based agents
The most comprehensive RL toolkit for .NET. 50+ agent implementations spanning policy gradient, off-policy, model-based, offline, and multi-agent methods. Includes modern architectures like Decision Transformer and MuZero alongside proven algorithms like PPO and SAC.
Policy Gradient Methods
Directly optimize the policy for stable, sample-efficient training.
PPO
Proximal Policy Optimization with clipped surrogate objective. The workhorse of modern RL.
TRPO
Trust Region Policy Optimization with KL divergence constraint.
A2C / A3C
Advantage Actor-Critic with synchronous and asynchronous variants.
REINFORCE
Classic policy gradient with Monte Carlo returns.
GRPO
Group Relative Policy Optimization for language model alignment.
DAPO
Dynamic Alignment Policy Optimization for RLHF fine-tuning.
Off-Policy Methods
Learn from replay buffers for maximum sample efficiency.
SAC
Soft Actor-Critic with entropy regularization for maximum exploration.
TD3
Twin Delayed DDPG addressing overestimation bias.
DDPG
Deep Deterministic Policy Gradient for continuous action spaces.
DQN
Deep Q-Network with experience replay and target network.
Rainbow DQN
Combining 6 DQN improvements: Double, Dueling, PER, C51, NoisyNets, n-step.
IQN
Implicit Quantile Networks for full return distribution modeling.
Model-Based Methods
Learn world models for planning and imagination-based training.
MuZero
Learn dynamics model for planning without environment rules (AlphaGo evolution).
AlphaZero
Self-play with MCTS for perfect information games.
Dreamer / DreamerV3
World model with latent imagination for sample-efficient learning.
MBPO
Model-Based Policy Optimization with short model rollouts.
World Models
VAE + RNN world model for learning in imagination.
Offline & Transformer RL
Learn from fixed datasets and use sequence modeling for decision-making.
Decision Transformer
Cast RL as sequence modeling with return-conditioned generation.
CQL
Conservative Q-Learning preventing overestimation on unseen actions.
IQL
Implicit Q-Learning without querying out-of-distribution actions.
TD3+BC
TD3 with behavior cloning regularization for offline RL.
Multi-Agent RL
Coordinate multiple agents in cooperative and competitive environments.
MAPPO
Multi-Agent PPO with centralized training, decentralized execution.
MADDPG
Multi-Agent DDPG for mixed cooperative-competitive scenarios.
QMIX
Monotonic value factorization for cooperative multi-agent tasks.
COMA
Counterfactual Multi-Agent policy gradients.
RL agent with AiModelBuilder
using AiDotNet;
// Train an RL agent with AiModelBuilder
var result = await new AiModelBuilder<float, float[], float>()
.ConfigureModel(new PPOAgent<float>(
stateDim: 4, actionDim: 2,
clipRange: 0.2f))
.ConfigureReinforcementLearning(new RLOptions(
totalTimesteps: 100_000))
.ConfigureOptimizer(new AdamOptimizer<float>(lr: 3e-4f))
.BuildAsync();
var action = result.Predict(observation); Start building with Reinforcement Learning
All 50+ implementations are included free under Apache 2.0.