Computer Vision

115+

End-to-end computer vision pipelines from detection to recognition

Build complete computer vision applications in C#. From YOLO and DETR for object detection, to SAM and Mask2Former for segmentation, to PaddleOCR and TrOCR for text recognition. All models include pre-processing, inference, and post-processing pipelines.

Autonomous Vehicles Medical Imaging Quality Inspection Document Digitization Security & Surveillance Retail Analytics Agricultural Monitoring AR/VR

Object Detection

Locate and classify objects in images with bounding boxes and confidence scores.

YOLOv8

Ultralytics YOLO v8 with anchor-free design and mosaic augmentation.

YOLOv9

Programmable Gradient Information and GELAN architecture.

YOLOv10

NMS-free design with consistent dual assignments for real-time detection.

YOLOv11

Latest YOLO with improved C3k2 blocks and extended task support.

DETR

DEtection TRansformer - end-to-end detection without NMS or anchors.

Deformable-DETR

DETR with deformable attention for faster convergence.

DINO

DETR with Improved deNoising anchor boxes for state-of-the-art detection.

RT-DETR

Real-Time DETR with hybrid encoder for speed-accuracy balance.

Co-DETR

Collaborative DETR with multiple auxiliary heads.

EfficientDet

Compound-scaled detection with BiFPN feature fusion.

FasterRCNN

Two-stage detector with Region Proposal Network.

CascadeRCNN

Multi-stage detection with progressively refined proposals.

Instance & Semantic Segmentation

Pixel-level classification and instance separation for precise scene understanding.

SAM (Segment Anything)

Promptable segmentation model for any object in any image.

SAM 2

Segment Anything in images and videos with memory-based tracking.

EfficientSAM

Lightweight SAM variant with SAMI distillation.

FastSAM

Real-time segment anything using YOLO-based architecture.

MobileSAM

Mobile-optimized SAM with lightweight image encoder.

Mask2Former

Unified architecture for panoptic, instance, and semantic segmentation.

OneFormer

One transformer for all segmentation tasks with task-guided queries.

SegFormer

Simple and efficient transformer-based segmentation.

MedSAM

SAM fine-tuned for medical image segmentation.

Grounded-SAM

Combine Grounding DINO with SAM for text-prompted segmentation.

U-Net

Encoder-decoder with skip connections for biomedical segmentation.

DeepLabV3

Atrous spatial pyramid pooling for multi-scale segmentation.

OCR & Text Detection

Detect and recognize text in natural scenes, documents, and handwriting.

PaddleOCR

Lightweight OCR with detection (DB), recognition (SVTR), and structure analysis.

EasyOCR

Multi-language OCR with CRAFT text detection and CRNN recognition.

TrOCR

Transformer-based OCR combining ViT encoder and text decoder.

SVTR

Scene Visual Text Recognition with single visual model.

DBNet

Differentiable Binarization for fast, accurate text detection.

CRAFT

Character Region Awareness for Text detection at character level.

Depth Estimation & 3D

Estimate depth, reconstruct 3D scenes, and generate 3D assets from images.

Depth Anything

Foundation model for monocular depth estimation with zero-shot generalization.

MiDaS

Multi-dataset depth estimation with robust cross-domain performance.

ZoeDepth

Zero-shot metric depth estimation combining relative and metric depth.

Marigold

Diffusion-based depth estimation leveraging generative priors.

NeRF

Neural Radiance Fields for novel view synthesis from sparse images.

Gaussian Splatting

3D Gaussian Splatting for real-time radiance field rendering.

Image Classification

Classify images into categories with state-of-the-art backbone architectures.

ResNet

Residual networks with skip connections for very deep architectures.

EfficientNet

Compound-scaled CNN with balanced depth, width, and resolution.

ViT

Vision Transformer applying self-attention to image patches.

Swin Transformer

Hierarchical vision transformer with shifted window attention.

ConvNeXt

Modernized ConvNet matching transformer performance.

DINOv2

Self-supervised ViT with strong visual features for any task.

Computer vision with AiModelBuilder

using AiDotNet;

// Train a computer vision model with AiModelBuilder
var result = await new AiModelBuilder<float, float[], float>()
    .ConfigureModel(new YOLOv11<float>(
        modelPath: "yolov11n.weights",
        classes: CocoClasses.Names))
    .ConfigureOptimizer(new AdamOptimizer<float>())
    .ConfigurePreprocessing()
    .BuildAsync(imageData, annotations);

var detections = result.Predict(newImage);

Start building with Computer Vision

All 115+ implementations are included free under Apache 2.0.

Install AiDotNet Browse All Features