Contents
A case study of the implementation of a QA framework in the SAP Joule Agent for RAG-based answer generation.
The presentation covers the initial problem statement and the 1.5-year development process from manual to fully automated evaluation of the different QA steps including state-of-the-art statistical methods and customized quality metrics derived from industry standards. We will be addressing questions regarding subjective and objective QA criteria, using LLM-as-judge metrics and repeatability of tests through automation and standardization. In addition, we will focus on a hybrid approach of Human-in-the-Loop and Full Automation, which we will showcase in the different phases of the QA process.
Takeaways
The audience will learn how AI-based answers can be evaluated using a “human-only” versus a “human-LLM-hybrid” approach to run at scale in a business software context.
Prior knowledge
Keine