Text2SQL — Fine-tuning a ≤3B LLM for SQL Generation

End-to-end pipeline for fine-tuning a small (≤3B) open-source LLM to convert plain-English questions into executable SQL. Fine-tuned Qwen2.5-Coder-1.5B with QLoRA via Unsloth on the BIRD benchmark, evaluated by execution accuracy on real SQLite databases. Lifted the valid-SQL rate from 40% → 73.5% and execution accuracy from 14.0% → 15.5% on 200 BIRD-dev questions — all on a free Kaggle T4 GPU. Includes dataset exploration, preprocessing, SFT training, inference, and execution-based evaluation notebooks, plus a written report and 3 ablation experiments.

Technologies

Python, QLoRA, Unsloth, TRL, HuggingFace, BIRD, Qwen2.5-Coder, Google Colab, SQLite