Enhancing LLM Performance in Specialized Spanish Domains Using RAG and PEFT QLoRA

Badillo-Rangel, Erick

Ver/

TOG-ErickBadillo-741550-vf4.1.pdf (1.850Mb)

Fecha

2024-11

Autor

Badillo-Rangel, Erick

Metadatos

Mostrar el registro completo del ítem

Descripción

This project explores improving the performance of large language models (LLMs) in Spanish legal domains by combining Retrieval-Augmented Generation (RAG) with Parameter-Efficient Fine-Tuning (PEFT) using the QLoRA technique. Four experiments were conducted to evaluate zero-shot performance across open-ended, closed-ended, and summarization tasks. These included a vanilla baseline, a RAG-enhanced version, and two fine-tuned models (with and without RAG). The training and retrieval data were synthetically generated through a cloud-based, serverless ETL process aligned with medallion architecture principles. Experiments focused on the Ley de Impuesto sobre la Renta 2024. Evaluation used BERTScore, ROUGE, and BLEU metrics to assess semantic similarity, n-gram overlap, and linguistic precision.
ITESO, A. C.

Colecciones

Documentos - ITESO

Excepto si se señala otra cosa, la licencia del ítem se describe como https://creativecommons.org/licenses/by-nc-nd/4.0/deed.es