AITA supports secular norm calibration, UniMoral supports profile-sensitive tests, and Pew supports population-level attitude checks.
What this page establishes
These pages are source-data dossiers, not a claim that every validation path is complete. They show which datasets feed implemented work today and which ones remain planned comparison surfaces.
Secular and Christian weights plus perturbation evidence are present. UniMoral and Pew integration are still listed as remaining work below.
The dossier preserves what each source can show, where it is biased, and how it would be used in falsifiable comparisons.
What each dataset can support
No single dataset captures moral reasoning. Reddit gives us scale and rawness. UniMoral gives us individual moral profiles. Pew gives us cross-national survey evidence. Together, they let MHF test whether moral judgment can be parameterized by culture, community, and conviction against real human data.
Reddit AITA
The largest crowdsourced moral judgment dataset in existence. Real people, real dilemmas, community verdicts. We mine it for secular norms, contested cases, and the raw material of everyday ethics.
Deep diveUniMoral
The first dataset that connects each annotator's own Haidt moral foundations profile to their actual moral judgments. This is a direct validation target: parameterize MHF with their profile, then compare the prediction to their verdict.
Deep divePew 25-Country Survey
Nationally representative attitudes on abortion, alcohol, homosexuality, gambling, and divorce -- the fault lines where Christian and secular parameterizations are expected to diverge. This is cross-national comparison evidence.
Deep diveWhat we build and where it comes from
Every component in MHF has a data source. The table below maps what we build, where the data originates, and how the Christian and secular parameterizations diverge.
| What We Build | Source | Christian Context | Secular Context |
|---|---|---|---|
| Root Node | Architecture | God (Scripture, TLR Protocol). Authority=0.90, Sanctity=0.95 | Social Consensus (AITA norms). Authority=0.09, Sanctity=0.07 |
| Haidt Profile Weights | Social Chemistry 101 KJV Bible | Care 0.80 / Fairness 0.60 / Loyalty 0.75 / Authority 0.90 / Sanctity 0.95 / Liberty 0.45 | Care 0.47 / Fairness 0.18 / Loyalty 0.19 / Authority 0.09 / Sanctity 0.07 / Liberty 0.00 |
| Relationship Weights | Social Chem Theology | God-Self 1.0, Spouse 0.9, Parent-Child 0.85, Church 0.7, Enemy 0.3 | Spouse 0.91, Stranger 0.91, Parent-Child 0.90, Friend 0.85, Community 0.87 |
| Constraint Library | BSB Norm Bank 1.7M | Scripture-derived commandments with internal exception logic | Crowdsourced rules-of-thumb with agreement thresholds |
| Scenario Graph Extraction | LLM Pipeline | Same pipeline, different baseline weights applied | Same pipeline, secular baseline weights applied |
| Cultural Validation | UniMoral Pew | High-Authority+Sanctity countries should match Christian params | WEIRD-country profiles should match secular params |
| Perturbation Tests | Generated | 25/25 pass (100%). Relationship changes flip judgments as predicted. | |
What we have done vs. what remains
✓ Completed
- Secular weight extraction from Social Chemistry 101 (175K entries) + Norm Bank (178K entries)
- Christian weight calibration from BSB + Foundation Alignment seed + 4 theological sources
- Perturbation sensitivity test suite -- 25/25 pass across 5 dilemma families
- Round 12 variance experiment -- 20 agents, 5 dilemmas, showing LLM convergence around a common response pattern
- Graph data structures (MoralGraph, Node, Edge, Constraint classes)
- Constraint propagation engine with bottom-up/top-down passes
- Two parameterized hierarchy templates (Christian, Secular American)
○ Remaining Gaps
- UniMoral integration -- individual Haidt profile parameterization and prediction validation
- Pew 25-country data ingestion and cross-cultural ground-truth comparison
- Multi-columnist prediction experiment (Dear Abby vs. pastoral counsel vs. Confucian advisor)
- Advice column mining pipeline for training data
- Non-English moral tradition expansion (Confucian, Islamic, Hindu roots)
- Multi-turn elicitation benchmark with Monte Carlo evaluation
- Liberty axis measurement (Social Chemistry 101 does not label it -- Liberty is 0.00 in secular weights)
- Cross-validation against MoReBench 500+150 dilemma set with hierarchy-aware rubrics
How data flows through the framework
Raw moral data enters from three directions: crowdsourced norms, normative texts, and survey responses. Each feeds a different layer of the Moral Hierarchy Graph before constraint propagation produces a prescriptive judgment.
Raw Moral Data
AITA posts, Bible verses, Pew responses, UniMoral dilemmas + annotator profiles
Norm Extraction
Social Chemistry RoTs, scripture constraints, Haidt foundation labels, cross-cultural profiles
Weight Calibration
Haidt profile vectors, relationship base weights, constraint strength scores
Graph Construction
Stakeholder nodes, obligation edges, parameterized root node, exception conditions
Constraint Propagation
Bottom-up evidence, top-down decisions, prescriptive output with full auditability
Why this architecture matters for moral AI
Every existing moral AI -- Delphi, ETHICS, MoReBench -- treats morality as flat classification. Good, bad, or "it depends." They cannot explain whose morality, why it matters, or what happens when you change the relational context. MHF can.
The three datasets below are not decorative. AITA gives us the secular Overton window. UniMoral is the target for predicting individual judgments from moral profiles. Pew lets us test country-level patterns across 25 countries. Together, they move the project toward "Because of X, you should do Y" -- with the receipts to show exactly why.