Evaluation·9 min
Evaluation Without Ground Truth
By C.W. Jameson · Published 28 July 2025 · Last reviewed 28 July 2025
LLM-as-judge is surprisingly reliable for quality dimensions. It is completely unreliable for factual accuracy.
How to measure LLM output quality when you don't have a reference answer: LLM-as-judge, pairwise comparison, and rubric-based scoring.
Related dispatches