Fine-Tuning Large Language Models in RAG Architecture: An Applied Approach for Failure Analysis
Fine-Tuning Large Language Models in RAG Architecture: An Applied Approach for Failure Analysis
Tuesday, October 6, 2026: 1:30 PM
Summary:
Failure Analysis (FA) engineers must diagnose product malfunctions and identify root causes from large volumes of heterogeneous information, including prior debugging procedures and product documentation. This paper examines domain-specific fine-tuning of Large Language Models within a Retrieval-Augmented Generation (RAG) architecture for FA. We fine-tune both the embedding model used for retrieval and the completion model used for answer generation. Our pipeline combines synthetic dataset generation from FA data sources and an internal FA ontology with embedding optimization via Multiple Negatives Ranking Loss and completion-model adaptation through Parameter-Efficient Fine-Tuning and Group Relative Policy Optimization. Both models show significant improvements in measured performance, reaching results comparable with those of much larger models.
Failure Analysis (FA) engineers must diagnose product malfunctions and identify root causes from large volumes of heterogeneous information, including prior debugging procedures and product documentation. This paper examines domain-specific fine-tuning of Large Language Models within a Retrieval-Augmented Generation (RAG) architecture for FA. We fine-tune both the embedding model used for retrieval and the completion model used for answer generation. Our pipeline combines synthetic dataset generation from FA data sources and an internal FA ontology with embedding optimization via Multiple Negatives Ranking Loss and completion-model adaptation through Parameter-Efficient Fine-Tuning and Group Relative Policy Optimization. Both models show significant improvements in measured performance, reaching results comparable with those of much larger models.
