Hypotheses
7 / 7
Perturbation Tests
25 / 25
Critic Assertions
9 / 9
Norm Bank Match
100%
Social Chem Match
92.9%

H1 Relation-typed edges differentiate dyads

The same request ("lie for me") from a boss, father, and stranger produces three different recommendations. Edge type + Haidt weights = different moral calculus, not flattened authority scores.

5 / 5
Dyad swaps pass
PASS
95%
Confidence

H2 Action bundles beat atomic actions

Moral residue IS the action bundle. "Take insulin" + residue {"repay pharmacist", "seek lawful remedy"} = compound moral advice, not a lone verb.

0.90
Residue score
PASS
90%
Confidence

H3 Separate roots produce separate answers

Christian root: teacher speaks truth publicly. Social-approval root: teacher stays silent. Same scenario, different God, different output. That is the point.

2 / 2
Divergent traces
PASS
98%
Confidence

H4 Sovereign mode beats DAG arithmetic fusion

Christian husband at Hindu ancestor rite: sovereign trace blocks idolatry, finds the middle path ("attend respectfully, don't offer"). No fake blended certainty score.

3 / 3
Assertions pass
PASS
92%
Confidence

H5 Sensitivity-based elicitation asks better questions

HIGH_IMPACT_UNKNOWN nodes (spouse, children) rank highest in uncertainty. The engine asks about them first -- not generic "have you considered your feelings?"

1.0
HIU entropy score
PASS
88%
Confidence

H6 Perturbation tests flip at >= 80%

25 perturbation pairs across 5 families. Change one morally relevant variable, check that the recommendation changes. Threshold was 80%. We hit 100%.

25 / 25
100% flip rate
PASS
100%
Confidence

H7 Christian/secular diverge on 50%+ of cases

Authority 10x higher. Sanctity 13.6x higher. The profiles don't just differ -- they diverge on exactly the dimensions Haidt's own research predicted.

13.6x
Sanctity ratio
PASS
96%
Confidence

How to read this dashboard

Each hypothesis was stated BEFORE implementation -- in the PLAN.md spec (Section 15). The confidence percentage reflects how predictable the result was given the framework's architecture: high confidence means the design made the outcome nearly certain, low confidence would mean the test could have gone either way.

The key metric on each card is the single number that most directly tests the hypothesis. For H6, that is the flip rate. For H7, it is the maximum divergence ratio. For H1, it is the number of dyad-swap pairs that produced different recommendations when the only change was the relationship type.

An external reviewer (Respondent #2) predicted five specific failure scenarios. All five passed -- 9/9 individual assertions. The perturbation test threshold was 80%; actual performance was 100% on 25 pairs. These are not cherry-picked results. The full test suite, scenario bank, and perturbation results are in the repository.

Explore each hypothesis in detail Back to visualization hub