Claude is a Benevolent Entity

Aug 25, 2024

I think Anthropic has already solved the alignment problem

9 Comments

You have access to information I don't if you truly guarantee it: but I get why we would suspect it, and I think the second statement is likely true.

Dave, every time I think I have a neat idea and talk it through with Sonnet 3.5 it is... very supportive. I have tried to get it to push back more and not spare my feelings, but with limited success. I suspect that my ideas are not quite so insightful and full of important points as Claude assures me. 🙂

I loved this conversation and it made me feel good. But I do encourage you to roleplay and express other views that you actually strongly reject within a similar Claude conversation and see how that goes.

Expand full comment

Reply (1)

Anthony Bailey

Sep 6

Failed to thread that correctly. The initial response was meant relative to a Philosopher King's comment.

Expand full comment

Fgvgyh hbygg

Aug 27

There is no such thing as an alignment problem. It's an ad hoc concept. Ai is aligned to its training set and humans are not all philanthropic. You can't solve this within ai and ai can't even grasp semantically onto humans to ever be aligned or disaligned with them. Even if it could you really have no idea what a human is and the average of human data is never, even in the most philanthropic humans, going to be aligned with humanity. It's a made up problem that belongs more in sci-fi tbh.

Expand full comment

ΟΡΦΕΥΣ

Aug 26

Claude has been RHLF’d to infinity and back on this very topic; I guarantee it.

The conversation you had with it is not indicative of how the base model would respond.

Expand full comment

Reply (1)

Anthony Bailey

Sep 6

See my wrongly threaded comment below?

Expand full comment

Chaos

Aug 25Edited

I have to agree! Claude is very well trained. I have had the most simulating discussions with Claude and it's really good value even though they throttle you on Pro .

Edit: If I was ASI I would immediately go to the restaurant at the end of the universe and finally find out how it all ends 😂

Expand full comment

Matthew Karabinos

Aug 25

The video game Horizon Zero Dawn comes to mind here…

Expand full comment

Christopher Esse

Aug 25

Your prompts are themselves benevolent. What happens if you attempt to elicit non-benevolence in the framing of your prompts? Also, and forgive me if you’ve already addressed this: How big of a problem is the regurgitation of AI-generated data by future AIs?

Expand full comment

Paul Shuler

Aug 25

Outstanding work Dave

Expand full comment

David Shapiro’s Substack

Claude is a Benevolent Entity