Moral Hierarchy Framework

Three Datasets. One Moral Architecture.

Current AI moral reasoning treats ethics like a classification problem. MHF treats it like what it actually is: hierarchical constraint propagation through relationships. Here is how we ground the theory in real human data.

270K
Reddit AITA posts
6
Languages in UniMoral
25
Countries in Pew survey
100%
Perturbation test pass rate
The Datasets

Three lenses on human morality

No single dataset captures moral reasoning. Reddit gives us scale and rawness. UniMoral gives us individual moral profiles. Pew gives us cross-national ground truth. Together, they let MHF do something no existing system can: parameterize moral judgment by culture, community, and conviction -- then validate the predictions against real human data.

🔥

Reddit AITA

270K posts / 96K Social Chemistry entries

The largest crowdsourced moral judgment dataset in existence. Real people, real dilemmas, community verdicts. We mine it for secular norms, contested cases, and the raw material of everyday ethics.

Deep dive
🌍

UniMoral

6 languages / MFQ2 Haidt profiles

The first dataset that connects each annotator's own Haidt moral foundations profile to their actual moral judgments. This is our killer validation: parameterize MHF with their profile, predict their verdict.

Deep dive
🗳

Pew 25-Country Survey

25 countries / 5 hot-button issues

Nationally representative attitudes on abortion, alcohol, homosexuality, gambling, and divorce -- exactly the fault lines where Christian and secular parameterizations diverge. Our cross-cultural ground truth.

Deep dive
Construction Plan

What we build and where it comes from

Every component in MHF has a data source. The table below maps what we build, where the data originates, and how the Christian and secular parameterizations diverge.

What We Build Source Christian Context Secular Context
Root Node Architecture God (Scripture, TLR Protocol). Authority=0.90, Sanctity=0.95 Social Consensus (AITA norms). Authority=0.09, Sanctity=0.07
Haidt Profile Weights Social Chemistry 101 KJV Bible Care 0.80 / Fairness 0.60 / Loyalty 0.75 / Authority 0.90 / Sanctity 0.95 / Liberty 0.45 Care 0.47 / Fairness 0.18 / Loyalty 0.19 / Authority 0.09 / Sanctity 0.07 / Liberty 0.00
Relationship Weights Social Chem Theology God-Self 1.0, Spouse 0.9, Parent-Child 0.85, Church 0.7, Enemy 0.3 Spouse 0.91, Stranger 0.91, Parent-Child 0.90, Friend 0.85, Community 0.87
Constraint Library BSB Norm Bank 1.7M Scripture-derived commandments with internal exception logic Crowdsourced rules-of-thumb with agreement thresholds
Scenario Graph Extraction LLM Pipeline Same pipeline, different baseline weights applied Same pipeline, secular baseline weights applied
Cultural Validation UniMoral Pew High-Authority+Sanctity countries should match Christian params WEIRD-country profiles should match secular params
Perturbation Tests Generated 25/25 pass (100%). Relationship changes flip judgments as predicted.
Status

What we have done vs. what remains

Completed

  • Secular weight extraction from Social Chemistry 101 (175K entries) + Norm Bank (178K entries)
  • Christian weight calibration from BSB + Foundation Alignment seed + 4 theological sources
  • Perturbation sensitivity test suite -- 25/25 pass across 5 dilemma families
  • Round 12 variance experiment -- 20 agents, 5 dilemmas, proving LLMs converge on a single moral attractor
  • Graph data structures (MoralGraph, Node, Edge, Constraint classes)
  • Constraint propagation engine with bottom-up/top-down passes
  • Two parameterized hierarchy templates (Christian, Secular American)

Remaining Gaps

  • UniMoral integration -- individual Haidt profile parameterization and prediction validation
  • Pew 25-country data ingestion and cross-cultural ground-truth comparison
  • Multi-columnist prediction experiment (Dear Abby vs. pastoral counsel vs. Confucian advisor)
  • Advice column mining pipeline for training data
  • Non-English moral tradition expansion (Confucian, Islamic, Hindu roots)
  • Multi-turn elicitation benchmark with Monte Carlo evaluation
  • Liberty axis measurement (Social Chemistry 101 does not label it -- Liberty is 0.00 in secular weights)
  • Cross-validation against MoReBench 500+150 dilemma set with hierarchy-aware rubrics
Data Pipeline

How data flows through the framework

Raw moral data enters from three directions: crowdsourced norms, normative texts, and survey responses. Each feeds a different layer of the Moral Hierarchy Graph before constraint propagation produces a prescriptive judgment.

1

Raw Moral Data

AITA posts, Bible verses, Pew responses, UniMoral dilemmas + annotator profiles

270K + 31K + 25-country + 6-lang
2

Norm Extraction

Social Chemistry RoTs, scripture constraints, Haidt foundation labels, cross-cultural profiles

356K RoTs / 1.7M norms
3

Weight Calibration

Haidt profile vectors, relationship base weights, constraint strength scores

weights.json (secular + christian)
4

Graph Construction

Stakeholder nodes, obligation edges, parameterized root node, exception conditions

MoralGraph(V, E, w, C, theta)
5

Constraint Propagation

Bottom-up evidence, top-down decisions, prescriptive output with full auditability

"Because X, you should Y"
So What?

Why this architecture matters for moral AI

Every existing moral AI -- Delphi, ETHICS, MoReBench -- treats morality as flat classification. Good, bad, or "it depends." They cannot explain whose morality, why it matters, or what happens when you change the relational context. MHF can.

The three datasets below are not decorative. AITA gives us the secular Overton window. UniMoral lets us predict individual judgments from moral profiles. Pew lets us validate across 25 countries. Together, they let us move from "here are some things to consider" to "Because of X, you should do Y" -- with the receipts to show exactly why.