You can't do it by reasoning about the virtues of such a scenario, because that desirable state hinges on everyone actually believing in the god
there's a sleight-of-hand, here. presumably the state is desirable because of the result (nobody ever intentionally harms anyone else), and not the method (fear of god).
while the particular method may be the domain of faith, i see no reason why the resulting harmonious society can not be reached by clear-thinking means.[1]
well, hyperrationality is a -stition as well. but maybe we can allow this "dry" hyperstition, even if we find narratively interesting metaphysics inaccessible.
This naturally leads to accusations that rationalists are making things worse.
This seems like a misunderstanding of either the claims made in alignment pretraining or how labs would adopt this technique in practice. We actually don't recommend filtering any negative information about AI systems from pretraining. Rather, we find upsampling positive techniques is much more effective at changing alignment propensities. Bad data can also lead to good models, and it's reasonable to think that having a concept of misalignment is beneficial for post-training.
Also, alignment midtraining techniques like model spec midtraining don't use filtering at all, and I don't know of any evidence that labs should or are currently filtering LW style data.
Holding unfounded beliefs might sometimes, because of the causal power of belief, produce better outcomes than being rational. This post was inspired by a couple cases where this phenomenon seems hand-waved away in the Sequences.
"Diseased Thinking"
In this essay, Scott suggests that a consequentialist model deals with the question of whether to moralize issues like obesity better than a definitional argument over whether it is a "disease" or not. If it benefits the person, you moralize; otherwise you let them resort to medical interventions guilt-free.
But there's this annoying feature of morality where most people feel like it has to be absolute to be worth acting on.[1] You can't just say "we should only guilt people if it would benefit them". The person is either guilty or not guilty; you can't pragmatically decide whether they're guilty or not. The consequentialist frame debuffs the power of moral pressure.
Some individuals, who would have gotten their act together if everyone bought into the old-fashioned guilt and willpower model, will now take a medical way out, making them subject to side effects from the medication or procedure. On net, this could outweigh the benefit of lifting guilt from those for whom willpower is not the deciding factor. The consequentialist framework might actually produce worse equilibria than the traditional one.
What I'm getting at is that optimal solutions might involve people believing something unfounded, and rationality will never converge to this solution. This creates an inherent tension where conservatives reach better equilibria because they can believe in things like God or that a marriage is a divinely sanctioned mutual partnership.
Imagine a society where every member believes they will be punished eternally if they intentionally harm anyone else. If they all genuinely believe this and act according to it, nobody will ever hurt each other. How can you enter such a state? You can't do it by reasoning about the virtues of such a scenario, because that desirable state hinges on everyone actually believing in the god; belief-in-belief isn't sufficient.[2]
This puts rationalists in an awkward situation, where we can explain why the conservatives are happy, perhaps more accurately than they can, but can never achieve the same results. We become, in some sense, a barrier to that outcome being implemented throughout all of society.
"Why Our Kind Can't Cooperate"
Yudkowsky writes:
But the "too many cooks" aphorism exists for a reason. The above only applies if all the agents are unembedded and can take the optimal strategy with no disadvantage. Instead, in reality, they are embedded, and precious effort/energy/time is lost if the soldiers independently compute the strategy and debate about the best one, or even if the privates quietly obey their superiors' orders but question them in their hearts. People are useful for things other than making good decisions. It’s easier to have some people be mute limbs that blindly trust what the head tells them without needing to understand, agree with, or even hear its arguments.
Self-Confidence
Imagine that the more you believe in yourself, the more successful you will be in your career. Therefore, to optimize for success, you want to believe in yourself as much as possible.
Let's say that if you believe you can become a millionaire, you'll become a millionaire, but if you believe you can become a trillionaire, you'll end up with a net worth of a hundred billion dollars. In the latter case, your belief is far less accurate, but the result is more desirable.
AI
It seems like some influential people in AI believe that being optimistic about AI will lead to a better outcome, whereas being pessimistic will lead to a worse outcome, on the basis of things like alignment pretraining. This naturally leads to accusations that rationalists are making things worse. But holding optimistic beliefs because it might have positive effects isn't compatible with rationality.
Conclusion
This feels like Newcomb's problem. But Yudkowsky writes about Newcomb's Problem that
The hyperstition version doesn't look so good:
Alas, my belief cannot be whatever I like, and I might be condemned to belief envy--at least until rationality is vindicated. And I must accept that it may never be.
Anecdotally. I have had many frustrating conversations about this.
In practice, people do manage to enter these equilibria. How? Some people, say the members of a religion, are happy. Their religion tells them to convert outsiders. They find an outsider, invite them to eat and have fun with them, and show off how happy they are. The outsider is intrigued. The insiders' appeals to tradition and spiritual claims start resonating with the outsider, and they really want to become part of the group. Eventually some sort of psychic transition happens and they make an emotional proclamation of faith. Now they're part of the happy equilibrium. This can only happen if the emotional, social, and intuitive appeal far outweighs any skepticism.