LLM Fine-tuning

Fine-tune large language models efficiently with LoRA and QLoRA.

Overview

AiDotNet provides 34 LoRA adapter implementations for efficient fine-tuning:

CategoryAdapters
CoreStandardLoRAAdapter, QLoRAAdapter, DoRAAdapter, AdaLoRAAdapter
Low-ParameterVeRAAdapter, LoRAXSAdapter, NOLAAdapter, VBLoRAAdapter, LoRAFAAdapter
CompositionLoHaAdapter, LoKrAdapter, DenseLoRAAdapter, GLoRAAdapter, MultiLoRAAdapter, ChainLoRAAdapter
EfficiencyDyLoRAAdapter, FloraAdapter, SLoRAAdapter, LoftQAdapter, PiSSAAdapter
AdvancedMoRAAdapter, DVoRAAdapter, DeltaLoRAAdapter, HRAAdapter, RoSAAdapter
SpecializedLongLoRAAdapter, GraphConvolutionalLoRAAdapter, LoRETTAAdapter, ReLoRAAdapter
ScalingLoRAPlusAdapter, LoRADropAdapter, XLoRAAdapter, QALoRAAdapter, TiedLoRAAdapter

Why LoRA?

MethodMemory (7B Model)Parameters Trained
Full Fine-tune28+ GB100%
LoRA (r=8)8-12 GB~0.1%
QLoRA (4-bit)4-6 GB~0.1%

Basic LoRA with AiModelBuilder

All LoRA fine-tuning uses the standard AiModelBuilder pattern with .ConfigureLoRA():

using AiDotNet;
using AiDotNet.LoRA;

// Configure LoRA via ILoRAConfiguration
var loraConfig = new DefaultLoRAConfiguration<float>(
    rank: 8,
    alpha: 16.0f,
    dropout: 0.05f
);

// Build model with LoRA using AiModelBuilder
var builder = new AiModelBuilder<float, Tensor<float>, Tensor<float>>();
var result = await builder
    .ConfigureModel(model)
    .ConfigureLoRA(loraConfig)
    .ConfigureOptimizer(new AdamWOptimizer<float>(learningRate: 1e-4f))
    .BuildAsync(trainingData, trainingLabels);

// Make predictions
var predictions = builder.Predict(testData, result);

Training Configuration

// Prepare training data (use realistic dataset sizes)
var trainingData = LoadTrainingFeatures();   // Tensor<float> of shape [numSamples, inputDim]
var trainingLabels = LoadTrainingLabels();   // Tensor<float> of shape [numSamples, outputDim]

// Configure LoRA
var loraConfig = new DefaultLoRAConfiguration<float>(
    rank: 8,
    alpha: 16.0f,
    dropout: 0.05f
);

// Train with AiModelBuilder
var builder = new AiModelBuilder<float, Tensor<float>, Tensor<float>>();
var result = await builder
    .ConfigureModel(model)
    .ConfigureLoRA(loraConfig)
    .ConfigureOptimizer(new AdamWOptimizer<float>(
        learningRate: 1e-4f,
        weightDecay: 0.01f))
    .ConfigureLearningRateScheduler(new CosineAnnealingLR(tMax: 100))
    .ConfigurePreprocessing()
    .BuildAsync(trainingData, trainingLabels);

Console.WriteLine($"Training Loss: {result.TrainingLoss:F4}");
Console.WriteLine($"Validation Loss: {result.ValidationLoss:F4}");

QLoRA (4-bit Quantized)

QLoRA applies LoRA on top of a quantized model to reduce memory usage:

using AiDotNet;
using AiDotNet.LoRA;

// QLoRA uses the QLoRAAdapter directly
var architecture = new NeuralNetworkArchitecture<float>(
    inputType: InputType.OneDimensional,
    taskType: NeuralNetworkTaskType.MultiClassClassification,
    inputSize: inputDim,
    outputSize: numClasses
);

var loraConfig = new DefaultLoRAConfiguration<float>(
    rank: 8,
    alpha: 16.0f,
    dropout: 0.05f
);

var builder = new AiModelBuilder<float, Tensor<float>, Tensor<float>>();
var result = await builder
    .ConfigureModel(model)
    .ConfigureLoRA(loraConfig)
    .ConfigureQuantization(new QuantizationConfig { Bits = 4 })
    .ConfigureOptimizer(new AdamWOptimizer<float>(learningRate: 1e-4f))
    .BuildAsync(trainingData, trainingLabels);

Saving and Loading Models

// Save trained model in AIMF format (includes LoRA weights)
builder.SaveModel(result, "lora-finetuned.aimf");

// Save with encryption (requires license key)
builder.SaveModel(result, "lora-finetuned-encrypted.aimf", encrypt: true);

// Load and use later
var loadedResult = builder.LoadModel("lora-finetuned.aimf");
var predictions = builder.Predict(testData, loadedResult);

LoRA Adapter Variants

AiDotNet provides specialized adapter implementations for different scenarios. Each adapter extends LoRAAdapterBase<T> and implements ILoRAAdapter<T>.

DoRA (Weight-Decomposed LoRA)

Decomposes weight updates into magnitude and direction for better convergence:

// DoRAAdapter decomposes updates into magnitude and direction
var doraAdapter = new DoRAAdapter<float>(
    inputSize: 256,
    outputSize: 256,
    rank: 8,
    alpha: 16.0f
);

AdaLoRA (Adaptive Rank)

Dynamically adjusts rank per layer based on importance:

var adaLoraAdapter = new AdaLoRAAdapter<float>(
    inputSize: 256,
    outputSize: 256,
    initialRank: 12,
    targetRank: 8,
    alpha: 16.0f
);

VeRA (Vector-based LoRA)

Uses shared random matrices with learned scaling vectors for extreme parameter efficiency:

var veraAdapter = new VeRAAdapter<float>(
    inputSize: 256,
    outputSize: 256,
    rank: 256,
    alpha: 16.0f
);

Best Practices

Rank Selection

TaskRecommended Rank
Simple tasks4-8
Complex tasks16-32
Multi-task32-64

Training Tips

  1. Start small: Try rank 4-8 first, increase if underfitting
  2. Learning rate: Use lower rates than full fine-tuning (1e-4 to 5e-5)
  3. Gradient accumulation: Use gradient accumulation if batch size is memory-limited
  4. Regularization: Apply dropout (0.05-0.1) to prevent overfitting
  5. Validation monitoring: Split data into train/validation sets and track validation loss to detect overfitting early
// Example with validation monitoring via AiModelBuilder
var result = await builder
    .ConfigureModel(model)
    .ConfigureLoRA(loraConfig)
    .ConfigureOptimizer(new AdamWOptimizer<float>(learningRate: 1e-4f))
    .ConfigurePreprocessing()
    .BuildAsync(trainingData, trainingLabels);

// Check for overfitting
if (result.ValidationLoss > result.TrainingLoss * 1.5)
{
    Console.WriteLine("Warning: model may be overfitting. Consider increasing dropout or reducing rank.");
}

Memory Comparison

ModelFull FTLoRAQLoRA
7B28+ GB10 GB5 GB
13B52+ GB18 GB8 GB
70BOOM90 GB24 GB

Next Steps