X @@EXM7777 · May 30, 2026
Full analysis by SuperBM
How To Fix AI Slop (Using Hermes)
5/10 Mixed
Guide to building an eval loop in Hermes to fix AI slop by scoring outputs.
Key Insights
- Shift from input optimization to output verification reframes quality as a systems problem.
- Explicit rubric criteria make abstract taste testable and automatable.
- Continuous production monitoring catches degradation before it reaches users.
Caveats & Flags
- Boasts a universal diagnosis of AI slop but relies entirely on anecdotal patterns, not systematic evidence.
- Promotes Hermes as the solution while offering no comparative benchmarks against other tools or manual methods.
- Claims 'better prompts can't fix this' but ignores well-documented prompt engineering improvements from research.
Valid Points
- Eval loops provide a structured way to measure output quality before shipping.
- Separating generation from verification mirrors established quality control in manufacturing.
- Non-deterministic model behaviour means the same prompt can produce varying quality.
Counterpoints
- Better prompts and models have measurably reduced slop in published research and real-world use.
- Eval loops add overhead and can be gamed or mis-calibrated without careful rubric design.
- Anecdotal framing ignores that many teams already use automated evals inside their pipelines.