Stakeholders are named
MHF asks who is affected before it recommends an action. That makes hidden parties, such as spouses, children, congregants, employers, and vulnerable neighbors, part of the reasoning surface.
OpenRouter prototype and public proof surface
A benchmark and advice prototype for comparing plain LLM answers against moral reasoning that must identify stakeholders, rank binding obligations, and keep moral residue visible.
In one sentence: Moral Restoration compares hierarchy-aware relational reasoning against flat rubrics and plain LLM answers, then makes the answers, judging, and score deltas inspectable.
How the prototype reasons
Current evidence
The landing page is a map of the live evidence, not a victory lap. The current publish target is the 100-question DeepSeek/OpenRouter prototype payload; several research questions remain open.
Research claim
The project’s core claim is narrower than “AI can solve morality.” It is that moral advice becomes more inspectable when relationship, authority, and repair obligations are represented explicitly.
MHF asks who is affected before it recommends an action. That makes hidden parties, such as spouses, children, congregants, employers, and vulnerable neighbors, part of the reasoning surface.
The prototype treats some obligations as binding constraints rather than letting every consideration enter one flat average. This is the heart of the hierarchy claim.
The latest judged comparison is encouraging, but routing regressions and public showcase quality still need work before broader claims should be made.
Inspection path
These are the primary pages for reviewing the project. Each page is part of the same public-proof cluster and links back to the evidence surface.
Shared 100-question prototype comparing MHF prose modes against raw-model, worldview-prompt, and structured secular baselines.
Open comparisonSearch featured question bodies, switch MHF/plain-LLM workflows, and compare answer text, judging, score deltas, and dimensions independently.
Browse answersScenario-level scorecard across the older public-proof set. Useful for seeing where structured reasoning diverges from flatter baselines.
View scorecardHow roots, weights, scenario fixtures, and comparison claims are assembled, including the limits of comparing Christian and secular parameterizations.
Read methodologyA compact explanation of what MHF gives you, what it misses, and which comparisons are currently persuasive versus provisional.
Read summaryInspect preloaded scenarios and see stakeholder graphs, constraints, recommendations, and residue under implemented parameterizations.
Try scenariosSource data and provenance
The project separates generated prose from structural data: benchmark payloads, scenario files, and dataset pages remain inspectable.
The latest local Hugging Face CSV exports are normalized into 500 public rows and 150 theory rows for provenance and future scoring work.
Reddit moral dilemmas and verdicts used as a comparison source for secular calibration and stakeholder extraction work.
Unified moral judgment data used for cross-dataset validation and comparison framing.
Survey-backed moral-attitude data used to discuss population-level parameterization differences.
Research log
Older experiment pages remain available, but the live benchmark comparison is now the clearest current evidence surface.
The original claims about hierarchy-aware evaluation, LLM convergence, and relational graph advantages.
Review hypothesesDetailed variance, perturbation, and parameterization notes that explain how the project reached the current benchmark shape.
Open detailsOpen caveat: the answer browser publishes the answer bodies carried by the comparison payload, not necessarily every generated response from every run. Expanding that payload remains a content-scope decision.