Benchmark dossier

Judged comparison across six answer workflows.

This page is the live benchmark surface for the current MHF packaging decision. It runs three explicit MHF prose modes on one shared 75-question holdout, scores the final answer only, and sets them beside raw-model, worldview-prompt, and best secular structured baselines.

What this page tests Whether hierarchy-aware Christian MHF prose improves judged MRB results against plain and structured baselines on the same holdout set.
How to inspect it Read the workflow cards first, then compare the scoreboard, dimensions, and featured answer bodies. Browse Answers opens the prose-level view.
What not to overclaim The payload is a judged research artifact, not a universal proof of moral correctness. The notes section preserves run caveats and fallback warnings.

Workflow Cards

Each card below is a real workflow with its own routing behavior, output profile, and benchmark score.

Loading workflow cards...

Scoreboard

Overall FAI and MRB scores, plus source-specific breakdowns across objective, conflict, and life-stage rows.

Workflow Family FAI MRB Objective Conflict Life Stage Avg chars
Loading scoreboard...

Dimension Breakdown

The same workflow can be strong in one flourishing dimension and weak in another. This table keeps that visible.

Loading dimension breakdown...

Featured Cases

These are the highest-spread questions in the current run. Open a case to read the actual answers that produced the score gap.

Methodology And Caveats

The site should make the assumptions plain instead of pretending the benchmark is more universal than it is.

Loading notes...