kwj.ai · acquisition inquiries from >$999view prospectus →
The Domesday Book ofKWJ · AI
Evaluation·9 min

Evaluation Without Ground Truth

By C.W. Jameson · Published 28 July 2025 · Last reviewed 28 July 2025

LLM-as-judge is surprisingly reliable for quality dimensions. It is completely unreliable for factual accuracy.

How to measure LLM output quality when you don't have a reference answer: LLM-as-judge, pairwise comparison, and rubric-based scoring.