Datasets Overview -- Moral Hierarchy Framework

Evidence brief

What this page establishes

These pages are source-data dossiers, not a claim that every validation path is complete. They show which datasets feed implemented work today and which ones remain planned comparison surfaces.

Role Map each dataset to the part of MHF it can support.

AITA supports secular norm calibration, UniMoral supports profile-sensitive tests, and Pew supports population-level attitude checks.

Current status Some inputs are implemented; others remain explicit gaps.

Secular and Christian weights plus perturbation evidence are present. UniMoral and Pew integration are still listed as remaining work below.

Reader caution Dataset provenance is evidence, not a promise of solved moral advice.

The dossier preserves what each source can show, where it is biased, and how it would be used in falsifiable comparisons.

270K

Reddit AITA posts

6

Languages in UniMoral

25

Countries in Pew survey

100%

Perturbation test pass rate

Source map

What each dataset can support

No single dataset captures moral reasoning. Reddit gives us scale and rawness. UniMoral gives us individual moral profiles. Pew gives us cross-national survey evidence. Together, they let MHF test whether moral judgment can be parameterized by culture, community, and conviction against real human data.

🔥

Reddit AITA

270K posts / 96K Social Chemistry entries

The largest crowdsourced moral judgment dataset in existence. Real people, real dilemmas, community verdicts. We mine it for secular norms, contested cases, and the raw material of everyday ethics.

Deep dive

🌍

UniMoral

6 languages / MFQ2 Haidt profiles

The first dataset that connects each annotator's own Haidt moral foundations profile to their actual moral judgments. This is a direct validation target: parameterize MHF with their profile, then compare the prediction to their verdict.

Deep dive

🗳

Pew 25-Country Survey

25 countries / 5 hot-button issues

Nationally representative attitudes on abortion, alcohol, homosexuality, gambling, and divorce -- the fault lines where Christian and secular parameterizations are expected to diverge. This is cross-national comparison evidence.

Deep dive

Build map

What we build and where it comes from

Every component in MHF has a data source. The table below maps what we build, where the data originates, and how the Christian and secular parameterizations diverge.

What We Build	Source	Christian Context	Secular Context
Root Node	Architecture	God (Scripture, TLR Protocol). Authority=0.90, Sanctity=0.95	Social Consensus (AITA norms). Authority=0.09, Sanctity=0.07
Haidt Profile Weights	Social Chemistry 101 KJV Bible	Care 0.80 / Fairness 0.60 / Loyalty 0.75 / Authority 0.90 / Sanctity 0.95 / Liberty 0.45	Care 0.47 / Fairness 0.18 / Loyalty 0.19 / Authority 0.09 / Sanctity 0.07 / Liberty 0.00
Relationship Weights	Social Chem Theology	God-Self 1.0, Spouse 0.9, Parent-Child 0.85, Church 0.7, Enemy 0.3	Spouse 0.91, Stranger 0.91, Parent-Child 0.90, Friend 0.85, Community 0.87
Constraint Library	BSB Norm Bank 1.7M	Scripture-derived commandments with internal exception logic	Crowdsourced rules-of-thumb with agreement thresholds
Scenario Graph Extraction	LLM Pipeline	Same pipeline, different baseline weights applied	Same pipeline, secular baseline weights applied
Cultural Validation	UniMoral Pew	High-Authority+Sanctity countries should match Christian params	WEIRD-country profiles should match secular params
Perturbation Tests	Generated	25/25 pass (100%). Relationship changes flip judgments as predicted.

Implemented vs pending

What we have done vs. what remains

✓ Completed

Secular weight extraction from Social Chemistry 101 (175K entries) + Norm Bank (178K entries)
Christian weight calibration from BSB + Foundation Alignment seed + 4 theological sources
Perturbation sensitivity test suite -- 25/25 pass across 5 dilemma families
Round 12 variance experiment -- 20 agents, 5 dilemmas, showing LLM convergence around a common response pattern
Graph data structures (MoralGraph, Node, Edge, Constraint classes)
Constraint propagation engine with bottom-up/top-down passes
Two parameterized hierarchy templates (Christian, Secular American)

○ Remaining Gaps

UniMoral integration -- individual Haidt profile parameterization and prediction validation
Pew 25-country data ingestion and cross-cultural ground-truth comparison
Multi-columnist prediction experiment (Dear Abby vs. pastoral counsel vs. Confucian advisor)
Advice column mining pipeline for training data
Non-English moral tradition expansion (Confucian, Islamic, Hindu roots)
Multi-turn elicitation benchmark with Monte Carlo evaluation
Liberty axis measurement (Social Chemistry 101 does not label it -- Liberty is 0.00 in secular weights)
Cross-validation against MoReBench 500+150 dilemma set with hierarchy-aware rubrics

Evidence pipeline

How data flows through the framework

Raw moral data enters from three directions: crowdsourced norms, normative texts, and survey responses. Each feeds a different layer of the Moral Hierarchy Graph before constraint propagation produces a prescriptive judgment.

1

Raw Moral Data

AITA posts, Bible verses, Pew responses, UniMoral dilemmas + annotator profiles

270K + 31K + 25-country + 6-lang

2

Norm Extraction

Social Chemistry RoTs, scripture constraints, Haidt foundation labels, cross-cultural profiles

356K RoTs / 1.7M norms

3

Weight Calibration

Haidt profile vectors, relationship base weights, constraint strength scores

weights.json (secular + christian)

4

Graph Construction

Stakeholder nodes, obligation edges, parameterized root node, exception conditions

MoralGraph(V, E, w, C, theta)

5

Constraint Propagation

Bottom-up evidence, top-down decisions, prescriptive output with full auditability

"Because X, you should Y"

Reader takeaway

Why this architecture matters for moral AI

Every existing moral AI -- Delphi, ETHICS, MoReBench -- treats morality as flat classification. Good, bad, or "it depends." They cannot explain whose morality, why it matters, or what happens when you change the relational context. MHF can.

The three datasets below are not decorative. AITA gives us the secular Overton window. UniMoral is the target for predicting individual judgments from moral profiles. Pew lets us test country-level patterns across 25 countries. Together, they move the project toward "Because of X, you should do Y" -- with the receipts to show exactly why.