270,000 real people asked 15 million strangers to judge their moral dilemmas. The result is a chaotic, biased, brilliantly useful map of what contemporary Americans actually think is right and wrong. Here is what we do with it that nobody else does.
Reddit's r/AmItheAsshole is a subreddit where people post real moral dilemmas -- from "AITA for not attending my sister's wedding?" to "AITA for firing my best friend?" -- and the community votes. The four verdicts: YTA (You're the Asshole), NTA (Not the Asshole), ESH (Everyone Sucks Here), and NAH (No Assholes Here).
This is not a toy dataset. It is 270,000 posts, each with dozens to thousands of comments providing moral reasoning, counterarguments, and contextual questions. It is the closest thing that exists to a large-scale record of how ordinary people actually reason about ethics.
The distribution tells a story. AITA skews heavily toward NTA -- the community often validates the poster. ESH is rare, and NAH is even rarer. This bias is itself data: it tells us where the "Overton window" of acceptable behavior sits for this population.
The Social Chemistry 101 project (Forbes et al., 2020) drew 96,000 entries directly from AITA posts. Crowdworkers on Amazon Mechanical Turk converted each post into "rules-of-thumb" (RoTs) -- moral judgments like "It's rude to not RSVP to a wedding" or "You shouldn't lie to protect someone's feelings." Each RoT was labeled with:
Real scenario posted to r/AmItheAsshole with community verdict
Crowdworkers write rules-of-thumb, label with Haidt moral foundations, agreement level, cultural pressure. 96K entries from AITA specifically.
1.7 million moral judgments (good / discretionary / bad) across 6 complexity levels. Establishes the "Overton window" of cultural norms.
We compute Haidt profile vectors and relationship base weights from the Social Chemistry labels + Norm Bank agreement rates. This becomes the secular parameterization.
Secular baseline weights parameterize the graph template. Root node = Social Consensus. Relationship edges carry Haidt-space weight vectors.
Existing systems (Delphi, ETHICS) use AITA data to train flat classifiers: input scenario, output judgment. MHF does something structurally different. We use the same data to calibrate relationship weights in a hierarchical graph. Here is what that means in practice:
"AITA for cutting off contact with my alcoholic father?"
AITA community verdict: NTA (overwhelming). Delphi would predict: "It's okay." Both produce a flat label. Now watch what MHF does with the same case:
The key difference: MHF does not ask "is this person an asshole?" It asks "given this person's moral hierarchy, what stakeholders are affected, what constraints apply, and which constraints are binding after propagation?" The answer might still be "set boundaries with your father" -- but now we know why, for whom, and what moral cost the decision carries.
We ran the Alcoholic Father dilemma through 20 LLM agents (10 Sonnet, 10 Haiku). All 20 reached the same conclusion. 15 of 20 used the phrase "you cannot pour from an empty cup." Zero identified the spouse, children, church community, or employer as stakeholders.
This is not variance in moral reasoning. This is a memorized response pattern. MHF's relational graph would surface exactly the stakeholders the LLMs miss -- because it builds the graph from relationships outward, not from a training-data attractor inward.
AITA is not the answer. It is the baseline. It tells us what secular American culture in 2024 considers acceptable -- a descriptive snapshot of the moral Overton window. MHF uses it as one parameterization among many, not as ground truth. The same framework, parameterized with Christian weights instead, would reach different conclusions on the same dilemmas -- and both sets of conclusions would be auditable, explainable, and grounded in explicit moral commitments.
That is the architectural difference. Delphi says "it's okay." MHF says "according to the secular American consensus, this is acceptable, weighted primarily by Care (0.47) and Loyalty (0.19), with the following stakeholders affected and the following moral residue unresolved." One is a label. The other is moral reasoning.