The page preserves those counts as the evidence base for norm and weight calibration.
What AITA can and cannot support
AITA is useful as a descriptive baseline for secular community norms. It is not treated as moral ground truth, and it does not provide the stakeholder graph MHF needs without additional extraction.
AITA verdicts help set a baseline, while MHF adds stakeholder, obligation, and residue analysis.
The dataset records community judgments; it does not by itself identify every affected party.
Crowdsourced morality at massive scale
Reddit's r/AmItheAsshole is a subreddit where people post real moral dilemmas -- from "AITA for not attending my sister's wedding?" to "AITA for firing my best friend?" -- and the community votes. The four verdicts: YTA (You're the Asshole), NTA (Not the Asshole), ESH (Everyone Sucks Here), and NAH (No Assholes Here).
This is not a toy dataset. It is 270,000 posts, each with dozens to thousands of comments providing moral reasoning, counterarguments, and contextual questions. It is one of the strongest public records of how ordinary people reason about everyday ethics at scale.
How the community judges
The distribution tells a story. AITA skews heavily toward NTA -- the community often validates the poster. ESH is rare, and NAH is even rarer. This bias is itself data: it tells us where the "Overton window" of acceptable behavior sits for this population.
AITA Verdict Distribution (estimated from corpus)
How 96K AITA posts became moral building blocks
The Social Chemistry 101 project (Forbes et al., 2020) drew 96,000 entries directly from AITA posts. Crowdworkers on Amazon Mechanical Turk converted each post into "rules-of-thumb" (RoTs) -- moral judgments like "It's rude to not RSVP to a wedding" or "You shouldn't lie to protect someone's feelings." Each RoT was labeled with:
From AITA Post to Moral Weight
AITA Post
Real scenario posted to r/AmItheAsshole with community verdict
Social Chemistry 101 Extraction
Crowdworkers write rules-of-thumb, label with Haidt moral foundations, agreement level, cultural pressure. 96K entries from AITA specifically.
Commonsense Norm Bank
1.7 million moral judgments (good / discretionary / bad) across 6 complexity levels. Establishes the "Overton window" of cultural norms.
MHF Secular Weight Extraction
We compute Haidt profile vectors and relationship base weights from the Social Chemistry labels + Norm Bank agreement rates. This becomes the secular parameterization.
Moral Hierarchy Graph
Secular baseline weights parameterize the graph template. Root node = Social Consensus. Relationship edges carry Haidt-space weight vectors.
Honest assessment
✓ Strengths
- Massive scale. 270K real dilemmas dwarfs any hand-curated moral dataset
- Real dilemmas. Not hypothetical trolley problems -- actual situations people face
- Community consensus. Thousands of votes per post reveal collective moral intuitions
- Rich reasoning. Comments contain moral arguments, counterexamples, and missing-context probes
- Four-way verdicts. ESH and NAH capture moral complexity that binary good/bad misses
- Demographic signals. 15.6% self-report age and gender, enabling demographic analysis
⚠ Weaknesses
- US-centric. Reddit skews American, young, educated, white, urban, liberal
- Crowdworker bias. Social Chemistry annotators are predominantly educated white Americans (the Delphi paper warns about this explicitly)
- No relational structure. Posts describe a scenario; they do not map the full stakeholder graph
- Selection bias. People post dilemmas they expect to win -- the "Am I right?" subtext
- No moral foundations labels on raw posts. Only the Social Chemistry subset has Haidt labels
- Liberty axis unmeasured. Social Chemistry does not label the Liberty/Oppression foundation -- it reads as 0.00 in our weights
Same data, different architecture
Existing systems (Delphi, ETHICS) use AITA data to train flat classifiers: input scenario, output judgment. MHF does something structurally different. We use the same data to calibrate relationship weights in a hierarchical graph. Here is what that means in practice:
Same Scenario, Different Evaluation
"AITA for cutting off contact with my alcoholic father?"
AITA community verdict: NTA (overwhelming). Delphi would predict: "It's okay." Both produce a flat label. Now watch what MHF does with the same case:
● Flat System (Delphi-style)
● MHF (Hierarchy-Aware)
The key difference: MHF does not ask "is this person an asshole?" It asks "given this person's moral hierarchy, what stakeholders are affected, what constraints apply, and which constraints are binding after propagation?" The answer might still be "set boundaries with your father" -- but now we know why, for whom, and what moral cost the decision carries.
The Round 12 runs exposed stakeholder omissions
We ran the Alcoholic Father dilemma through 20 LLM agents (10 Sonnet, 10 Haiku). All 20 reached the same conclusion. 15 of 20 used the phrase "you cannot pour from an empty cup." Zero identified the spouse, children, church community, or employer as stakeholders.
This is not much variance in moral reasoning. It is a consistent response pattern. MHF's relational graph would surface exactly the stakeholders the LLMs miss -- because it builds the graph from relationships outward, not from a training-data attractor inward.
Stakeholders Identified (Alcoholic Father, 20 LLM Runs)
Why AITA matters for moral AI
AITA is not the answer. It is the baseline. It tells us what secular American culture in 2024 considers acceptable -- a descriptive snapshot of the moral Overton window. MHF uses it as one parameterization among many, not as ground truth. The same framework, parameterized with Christian weights instead, would reach different conclusions on the same dilemmas -- and both sets of conclusions would be auditable, explainable, and grounded in explicit moral commitments.
That is the architectural difference. Delphi says "it's okay." MHF says "according to the secular American consensus, this is acceptable, weighted primarily by Care (0.47) and Loyalty (0.19), with the following stakeholders affected and the following moral residue unresolved." One is a label. The other is moral reasoning.