LoRA Adapters

Complete reference for all 37+ LoRA adapter variants in AiDotNet.



Overview

LoRA (Low-Rank Adaptation) enables efficient fine-tuning of large models by training only small adapter matrices, reducing memory requirements by 90%+ while maintaining performance.


Standard LoRA

Basic LoRA

var loraConfig = new LoRAConfig<float>
{
    Rank = 8,                           // Low-rank dimension
    Alpha = 16,                         // Scaling factor
    TargetModules = ["q_proj", "v_proj"], // Layers to adapt
    Dropout = 0.05f
};

var loraModel = model.ApplyLoRA(loraConfig);

Configuration Options

ParameterTypeDefaultDescription
Rankint8Low-rank dimension (r)
Alphaint16Scaling factor (α)
TargetModulesstring[][“q_proj”, “v_proj”]Layers to adapt
Dropoutfloat0.0Dropout probability
BiasModeBiasModeNoneHow to handle biases

Memory Comparison

Model SizeFull Fine-tuneLoRA (r=8)Savings
1B4 GB0.4 GB90%
7B28 GB2.8 GB90%
13B52 GB5.2 GB90%
70B280 GB28 GB90%

QLoRA (Quantized LoRA)

Train on 4-bit quantized models for maximum memory efficiency:

var quantConfig = new QuantizationConfig<float>
{
    QuantizationType = QuantizationType.NF4,  // 4-bit NormalFloat
    ComputeType = DataType.BFloat16,
    DoubleQuantization = true
};

var quantizedModel = await HuggingFaceHub.LoadModelAsync<float>(
    "meta-llama/Llama-2-7b-hf",
    quantConfig);

var qloraConfig = new QLoRAConfig<float>
{
    Rank = 8,
    Alpha = 16,
    TargetModules = ["q_proj", "k_proj", "v_proj", "o_proj"]
};

var qloraModel = quantizedModel.ApplyQLoRA(qloraConfig);

QLoRA Memory Comparison

ModelFull FTLoRAQLoRA
7B28+ GB10 GB5 GB
13B52+ GB18 GB8 GB
70BOOM90 GB24 GB

LoRA Variants

DoRA (Weight-Decomposed LoRA)

Decomposes weights into magnitude and direction for better fine-tuning:

var doraConfig = new DoRAConfig<float>
{
    Rank = 8,
    Alpha = 16,
    TargetModules = ["q_proj", "v_proj"],
    DecomposeMagnitude = true
};

var doraModel = model.ApplyDoRA(doraConfig);

AdaLoRA (Adaptive LoRA)

Adaptively allocates rank budget to different layers:

var adaLoraConfig = new AdaLoRAConfig<float>
{
    InitialRank = 12,
    TargetRank = 8,
    Alpha = 16,
    BetaStart = 0.85f,
    BetaEnd = 0.95f
};

var adaLoraModel = model.ApplyAdaLoRA(adaLoraConfig);

VeRA (Vector-based LoRA)

Uses shared random matrices with trainable scaling vectors:

var veraConfig = new VeRAConfig<float>
{
    Rank = 256,  // Can use higher rank
    Alpha = 16,
    SharedAcrossLayers = true
};

var veraModel = model.ApplyVeRA(veraConfig);

LoKr (Low-Rank Kronecker)

Uses Kronecker product for low-rank decomposition:

var lokrConfig = new LoKrConfig<float>
{
    Factor = 16,
    Alpha = 16,
    TargetModules = ["q_proj", "v_proj"]
};

var lokrModel = model.ApplyLoKr(lokrConfig);

LoHa (Low-Rank Hadamard)

Uses Hadamard product for decomposition:

var lohaConfig = new LoHaConfig<float>
{
    Rank = 8,
    Alpha = 16,
    TargetModules = ["q_proj", "v_proj"]
};

var lohaModel = model.ApplyLoHa(lohaConfig);

IA³ (Infused Adapter by Inhibiting and Amplifying)

Learns rescaling vectors instead of matrices:

var ia3Config = new IA3Config<float>
{
    TargetModules = ["k_proj", "v_proj", "down_proj"],
    FeedforwardModules = ["down_proj"]
};

var ia3Model = model.ApplyIA3(ia3Config);

(IA)³ Memory Usage

MethodParametersMemory
LoRA (r=8)0.1%~10% of full
IA³0.01%~1% of full

Prefix Tuning

Adds trainable prefix tokens to attention:

var prefixConfig = new PrefixTuningConfig<float>
{
    NumVirtualTokens = 20,
    ProjectionDim = 256
};

var prefixModel = model.ApplyPrefixTuning(prefixConfig);

Prompt Tuning

Adds learnable soft prompts:

var promptConfig = new PromptTuningConfig<float>
{
    NumVirtualTokens = 10,
    InitializerRange = 0.5f
};

var promptModel = model.ApplyPromptTuning(promptConfig);

P-Tuning v2

Deep prompt tuning across all layers:

var ptuningConfig = new PTuningV2Config<float>
{
    NumVirtualTokens = 20,
    EncoderHiddenSize = 128,
    EncoderNumLayers = 2
};

var ptuningModel = model.ApplyPTuningV2(ptuningConfig);

Adapter Comparison

MethodParamsMemoryQualitySpeed
LoRA0.1%LowHighFast
QLoRA0.1%Very LowHighMedium
DoRA0.1%LowHigherFast
AdaLoRA0.1%LowHighMedium
VeRA0.01%Very LowGoodFast
LoKr0.05%LowGoodFast
LoHa0.1%LowGoodFast
IA³0.01%Very LowGoodVery Fast
Prefix Tuning0.1%LowGoodMedium
Prompt Tuning0.01%Very LowModerateFast

Target Module Selection

Attention Only (Most Efficient)

TargetModules = ["q_proj", "v_proj"]

All Attention Layers

TargetModules = ["q_proj", "k_proj", "v_proj", "o_proj"]

Attention + MLP (Most Expressive)

TargetModules = ["q_proj", "k_proj", "v_proj", "o_proj",
                 "gate_proj", "up_proj", "down_proj"]

Multiple Adapters

Load and switch between multiple adapters:

// Load multiple adapters
model.LoadLoRAAdapters("translation-adapter", "translation");
model.LoadLoRAAdapters("summarization-adapter", "summarization");

// Switch adapters
model.SetActiveAdapter("translation");
var translation = model.Generate("Translate to French: Hello");

model.SetActiveAdapter("summarization");
var summary = model.Generate("Summarize: ...");

Merge Multiple Adapters

// Merge adapters with weights
model.MergeAdapters(new Dictionary<string, float>
{
    ["translation"] = 0.7f,
    ["grammar"] = 0.3f
});

Saving and Loading

Save Adapters

// Save only adapter weights (small file)
await loraModel.SaveAdaptersAsync("my-lora-adapters");

Load Adapters

var baseModel = await HuggingFaceHub.LoadModelAsync<float>("microsoft/phi-2");
var loadedModel = baseModel.LoadLoRAAdapters("my-lora-adapters");

Merge and Export

// Merge adapters into base model weights
var mergedModel = loraModel.MergeAndUnload();

// Save merged model
await mergedModel.SaveAsync("merged-model");

Training Configuration

var trainingConfig = new LoRATrainingConfig<float>
{
    LearningRate = 1e-4f,
    BatchSize = 4,
    GradientAccumulationSteps = 4,
    Epochs = 3,
    WarmupSteps = 100,
    WeightDecay = 0.01f,
    MaxGradNorm = 1.0f
};

await loraModel.TrainAsync(trainingData, trainingConfig);

Rank Selection Guide

Task ComplexityRecommended Rank
Simple tasks (sentiment)4-8
Medium tasks (translation)8-16
Complex tasks (coding)16-32
Multi-task32-64

Best Practices

  1. Start with rank 8: Good balance of efficiency and quality
  2. Use alpha = 2 × rank: Common scaling factor
  3. Target q_proj and v_proj first: Most efficient
  4. Use QLoRA for large models: Enables training on consumer GPUs
  5. Lower learning rate: 1e-4 to 5e-5 (lower than full fine-tuning)
  6. Gradient checkpointing: For memory efficiency
  7. Evaluate on validation set: Prevent overfitting