Computer Vision
115+End-to-end computer vision pipelines from detection to recognition
Build complete computer vision applications in C#. From YOLO and DETR for object detection, to SAM and Mask2Former for segmentation, to PaddleOCR and TrOCR for text recognition. All models include pre-processing, inference, and post-processing pipelines.
Object Detection
Locate and classify objects in images with bounding boxes and confidence scores.
YOLOv8
Ultralytics YOLO v8 with anchor-free design and mosaic augmentation.
YOLOv9
Programmable Gradient Information and GELAN architecture.
YOLOv10
NMS-free design with consistent dual assignments for real-time detection.
YOLOv11
Latest YOLO with improved C3k2 blocks and extended task support.
DETR
DEtection TRansformer - end-to-end detection without NMS or anchors.
Deformable-DETR
DETR with deformable attention for faster convergence.
DINO
DETR with Improved deNoising anchor boxes for state-of-the-art detection.
RT-DETR
Real-Time DETR with hybrid encoder for speed-accuracy balance.
Co-DETR
Collaborative DETR with multiple auxiliary heads.
EfficientDet
Compound-scaled detection with BiFPN feature fusion.
FasterRCNN
Two-stage detector with Region Proposal Network.
CascadeRCNN
Multi-stage detection with progressively refined proposals.
Instance & Semantic Segmentation
Pixel-level classification and instance separation for precise scene understanding.
SAM (Segment Anything)
Promptable segmentation model for any object in any image.
SAM 2
Segment Anything in images and videos with memory-based tracking.
EfficientSAM
Lightweight SAM variant with SAMI distillation.
FastSAM
Real-time segment anything using YOLO-based architecture.
MobileSAM
Mobile-optimized SAM with lightweight image encoder.
Mask2Former
Unified architecture for panoptic, instance, and semantic segmentation.
OneFormer
One transformer for all segmentation tasks with task-guided queries.
SegFormer
Simple and efficient transformer-based segmentation.
MedSAM
SAM fine-tuned for medical image segmentation.
Grounded-SAM
Combine Grounding DINO with SAM for text-prompted segmentation.
U-Net
Encoder-decoder with skip connections for biomedical segmentation.
DeepLabV3
Atrous spatial pyramid pooling for multi-scale segmentation.
OCR & Text Detection
Detect and recognize text in natural scenes, documents, and handwriting.
PaddleOCR
Lightweight OCR with detection (DB), recognition (SVTR), and structure analysis.
EasyOCR
Multi-language OCR with CRAFT text detection and CRNN recognition.
TrOCR
Transformer-based OCR combining ViT encoder and text decoder.
SVTR
Scene Visual Text Recognition with single visual model.
DBNet
Differentiable Binarization for fast, accurate text detection.
CRAFT
Character Region Awareness for Text detection at character level.
Depth Estimation & 3D
Estimate depth, reconstruct 3D scenes, and generate 3D assets from images.
Depth Anything
Foundation model for monocular depth estimation with zero-shot generalization.
MiDaS
Multi-dataset depth estimation with robust cross-domain performance.
ZoeDepth
Zero-shot metric depth estimation combining relative and metric depth.
Marigold
Diffusion-based depth estimation leveraging generative priors.
NeRF
Neural Radiance Fields for novel view synthesis from sparse images.
Gaussian Splatting
3D Gaussian Splatting for real-time radiance field rendering.
Image Classification
Classify images into categories with state-of-the-art backbone architectures.
ResNet
Residual networks with skip connections for very deep architectures.
EfficientNet
Compound-scaled CNN with balanced depth, width, and resolution.
ViT
Vision Transformer applying self-attention to image patches.
Swin Transformer
Hierarchical vision transformer with shifted window attention.
ConvNeXt
Modernized ConvNet matching transformer performance.
DINOv2
Self-supervised ViT with strong visual features for any task.
Computer vision with AiModelBuilder
using AiDotNet;
// Train a computer vision model with AiModelBuilder
var result = await new AiModelBuilder<float, float[], float>()
.ConfigureModel(new YOLOv11<float>(
modelPath: "yolov11n.weights",
classes: CocoClasses.Names))
.ConfigureOptimizer(new AdamOptimizer<float>())
.ConfigurePreprocessing()
.BuildAsync(imageData, annotations);
var detections = result.Predict(newImage); Start building with Computer Vision
All 115+ implementations are included free under Apache 2.0.