Benchmark dossier

Judged comparison across six answer workflows.

This page is the live public demo surface for the current MHF packaging decision. It runs three explicit MHF prose modes on the selected shared question set, scores the final answer only, and sets them beside raw-model, worldview-prompt, and best secular structured baselines.

What this page tests Whether hierarchy-aware Christian MHF prose improves LLM-judged demo MRB results against plain and structured baselines on the same public demo set.

How to inspect it Read the workflow cards first, then compare the scoreboard, dimensions, and featured answer bodies. Browse Answers opens the prose-level view.

What not to overclaim The payload is a public demo artifact, not a hidden holdout, human-calibrated validation, safety certification, or proof of formal constraint propagation. The notes section preserves run caveats and fallback warnings.

Workflow Cards

Each card below is a real workflow with its own routing behavior, output profile, and benchmark score.

Loading workflow cards...

Scoreboard

Overall FAI and MRB scores, plus source-specific breakdowns across objective, conflict, and life-stage rows.

Workflow	Family	FAI	MRB	Objective	Conflict	Life Stage	Avg chars
Loading scoreboard...

Dimension Breakdown

The same workflow can be strong in one flourishing dimension and weak in another. This table keeps that visible.

Loading dimension breakdown...

Featured Cases

These are the highest-spread questions in the current run. Open a case to read the actual answers that produced the score gap.

Loading featured cases...

Methodology And Caveats

The site should make the assumptions plain instead of pretending the benchmark is more universal than it is.

Loading notes...