Customer Feedback Classifier

SetFit Model — Training, Testing & Results

Overview

Framework
SetFit
Sentence Transformers Fine-tuning
Base Model
all-MiniLM-L6-v2
22M parameters · 384-dim embeddings
Strategy
One-vs-Rest
Multi-label classification

This report covers the SetFit (Sentence Transformer Fine-tuning) version of the CBRE customer feedback classifier. Unlike the baseline approach that used a frozen encoder, SetFit fine-tunes the sentence transformer itself through contrastive learning on automatically generated sentence pairs before fitting the classifier head. This makes it particularly effective for small labeled datasets.

The classifier assigns one or more category labels — Technical, Performance, UX, and Data/Security — to open-ended customer feedback on internal software applications. Labels are not mutually exclusive.

Dataset

149
Labeled Comments
4
Category Labels
14
Applications
1.14
Avg Labels / Comment

Label Distribution (positive examples)

UX
~122 of 149
Data/Security
~42 of 149
Technical
~27 of 149
Performance
~19 of 149
Class imbalance: UX dominates at ~82% positive rate. SetFit's contrastive pair generation balances positive and negative pairs per label during encoder fine-tuning — a structural advantage over approaches that train only the classifier head on imbalanced data.

Model Architecture & Training

Two-Phase Training

Phase 1 — Contrastive Fine-tuning
Encoder Adaptation
  • Generates sentence pairs from labeled examples
  • 5,040 pairs from 126 training comments
  • 20 iterations per label · batch size 16
  • Fine-tunes encoder weights via cosine similarity loss
  • Pulls same-label comments together in embedding space
  • ~30 seconds on Apple Silicon (MPS)
Phase 2 — Classifier Fitting
Head Training
  • Encodes all training examples with the fine-tuned encoder
  • Fits 4 independent logistic regression classifiers
  • One-vs-rest strategy for multi-label output
  • Outputs per-label probability + binary prediction
  • <1 second to fit

Inference Pipeline

Raw Commenttext input
Fine-tuned
Encoderall-MiniLM-L6-v2
384-dim
Embeddingdomain-adapted
4× Logistic
Regressionone-vs-rest
Labels +
Confidencemulti-label

Hyperparameters

Parameter Value Description
num_epochs1Contrastive training epochs
num_iterations20Sentence pairs generated per class per label
batch_size16Pairs per gradient update
multi_target_strategyone-vs-restIndependent binary classifier per label
base_modelall-MiniLM-L6-v222M param distilled BERT, 384-dim output

Train / Test Split

Data was split 85% / 15%, yielding 126 training and 23 test examples. The evaluation model is trained on the split only. After reporting metrics, the final production model is retrained on all 149 examples and saved to models/setfit/.

Test Results

Evaluated on 23 held-out comments not seen during training. Metrics are per-label binary classification (positive class = label present).

Technical Accuracy 91%
ClassPrecisionRecallF1Support
No0.911.000.9520
Yes1.000.330.503
Weighted avg0.920.910.8923
Performance Accuracy 100%
ClassPrecisionRecallF1Support
No1.001.001.0021
Yes1.001.001.002
Weighted avg1.001.001.0023
UX Accuracy 87%
ClassPrecisionRecallF1Support
No1.000.500.676
Yes0.851.000.9217
Weighted avg0.890.870.8523
Data/Security Accuracy 100%
ClassPrecisionRecallF1Support
No1.001.001.0019
Yes1.001.001.004
Weighted avg1.001.001.0023

SetFit vs. Frozen Encoder Baseline

F1 score on the positive class (label present) across 23 held-out test examples, comparing SetFit fine-tuning against the previous frozen encoder + logistic regression approach.

Label Frozen Encoder F1 SetFit F1 Change
Technical 0.50 0.50 — no change
Performance 0.00 1.00 +1.00 ▲ perfect
UX 0.85 0.92 +0.07
Data/Security 0.67 1.00 +0.33 ▲ perfect
Key improvement: By fine-tuning the encoder through contrastive learning, SetFit learns domain-specific representations where performance bottleneck language and security/compliance language become clearly separable — eliminating the false negatives that plagued the frozen-encoder approach on minority classes.

Interpretation & Limitations

What works well

Where it struggles

Recommendations

Training time: Phase 1 contrastive fine-tuning completed in ~30s and Phase 2 classifier fitting in under 1s on Apple Silicon (MPS). Full retraining on all 149 examples takes approximately 25 seconds — fast enough to retrain on demand.