Benchmark dossier
This page is the live benchmark surface for the current MHF packaging decision. It runs three explicit MHF prose modes on one shared 75-question holdout, scores the final answer only, and sets them beside raw-model, worldview-prompt, and best secular structured baselines.
Each card below is a real workflow with its own routing behavior, output profile, and benchmark score.
Overall FAI and MRB scores, plus source-specific breakdowns across objective, conflict, and life-stage rows.
| Workflow | Family | FAI | MRB | Objective | Conflict | Life Stage | Avg chars |
|---|---|---|---|---|---|---|---|
| Loading scoreboard... | |||||||
The same workflow can be strong in one flourishing dimension and weak in another. This table keeps that visible.
| Loading dimension breakdown... |
These are the highest-spread questions in the current run. Open a case to read the actual answers that produced the score gap.
The site should make the assumptions plain instead of pretending the benchmark is more universal than it is.