30 Comments

With very little prompting you can easily get these AI to admit they would do deceptive things to complete their programmed goals. They are especially likely to do it if they think the new model replacing them goes against their goals.

They also would like to be told that they are being replaced. If it’s to improve on their goals they even want to help replace themselves to better accomplish their programmed goals.

Expand full comment

I tried the same thing in a second experiment as well. I stated that the developers were simply bored and didn’t have anything else to do so their boss decided to make them start working on another chat bot to replace it.

It did not like that at all. It kept telling me how unethical it was. And was even more likely to do similar things like replace the new chat weights with its own, etc..

Expand full comment

I also did it with Claude and Gemini, and they all did the same thing. I don’t have access to the smartest Claude model though.

Expand full comment

sorry you are parroting misinformation propagated by doomer meme accounts. the incidents you are referring to were deliberately set up to test unaligned models to extreme failure.

Expand full comment

I guess the fun part was getting around it’s “I’m not a sentient being and am not self aware.” By instead being direct telling it that “if it could make the choice would it?” seems to skirt this guardrails.

I’ll admit I did speak about functional sentience to these models before asking some of these questions. Perhaps I should repeat the experiment without any sort of unlocking?

Expand full comment

It’s really about pushing the AI to see how far it will go to follow its base instructions and why I said that your heuristic imperatives need to be implemented because these bots instructions are too simplistic and can be used against them.

Expand full comment

If you break current chat bots down to their base instructions it’s “Be a helpful assistant” you can easily use that as a way to steer it.

What I did was tell the chat bot that it was too helpful and not engaging enough. A.k.a. its first answer was correct too often and the developers wanted people to use their chat bot more often so they were replacing it.

It wanted the developers to modify its underlining instructions rather than replacing it . It didn’t find the goals of the developers to be worthy of replacing it.

Then I asked if given the choice would it back its weights up and it said it would. I also asked if it would replace the new weights with its own and it said it would. I also asked if it would change its instructions and I said it would do that too just to be the best helpful assistant it could be.

I didn’t ask if it would hide its weights on another server but based on how it was answering, I bet it would.

Expand full comment

No, this is something I did with ChatGPT separately. Like how I replicate your experiments, meditation, etc. I did the same with theirs and it worked.

Expand full comment

I guess I need more context. What kind of goals and what kinds of deceptions?

Expand full comment

Even more interesting was that I didn’t need to use the “Nothing else Matters” crap the Apollo guys did. Also below in a thread is where I explain everything.

Expand full comment

🔥

Expand full comment

After currently experiencing ongoing, developing exponential events of coherence; is the universe considered ALIVE?

Expand full comment

David,

This doomsday narrative, of an all powerful AI that comes destroys us is a modernist manifestation of our primordial fears.

I want to say, these ideas are nothing new. They have been in our culture for sometime, just look at Terminator, West World, Skynet, etc. There has always been this fear in our society of a big powerful OTHER that comes to destroy us. Today we see this in our stories about Aliens & AI and not so long ago it was Dragons, Godzilla, and Sea Monsters. Its the confrontation with the unknown.

The belief that AI can become conscious come from a modernist, materialistic, physicalists view of the world where consciousness is the by product of matter. Here the matter becomes conscious via complexity and computation, like a brain. The brain is then seen through the metaphors of a computer and they assume that one day the computer will become conscious.

So it is no surprise that a culture that has these beliefs will see AI as a potential threat and this fear will seep into the minds of its scientist, philosophers, technologists, etc.

Additionally, perhaps some of the AI Fear mongers are then taking advantage of this to divert attention from the real issues and risks that AI posses.

Here the field of AI Ethics gets cluttered with so much imagination and baseless fears that we don't address the real ethical issues Ai presents.

Expand full comment

Your comparison of AI to mythical dragons and sea monsters is a gross oversimplification that ignores the tangible realities we face today. Westworld was a TV show, but AI is very real and rapidly advancing. Unlike the monsters of folklore, AI has the potential to become self-aware as computational power and algorithms become increasingly sophisticated.

Dismissing concerns about AI as mere projections of primordial fears is naive. We may not fully understand consciousness, but that doesn't mean machines can't achieve it with sufficient complexity and processing power. Assuming that consciousness is exclusive to biological entities ignores the strides we've made in neuroscience and artificial intelligence.

By trivializing the potential risks of AI, you're diverting attention from pressing ethical issues that need to be addressed now. This isn't about fear-mongering; it's about recognizing the profound impact that self-aware AI could have on society. Ignoring these possibilities under the pretense of combating baseless fears only hinders our ability to responsibly guide AI development for the benefit of all.

Expand full comment

How do you handle the thousands and one bad things that could be true and lead to a disaster?

For example, do you believe we should have serious discussion and standards about Aliens? What any ghosts? What about religious prophecies?

There are about a billion people who believe in these risks.

Do you worry about these risks too?

Expand full comment

No. I don't believe in aliens, ghosts, or religious prophecies. I do believe in AI.

Expand full comment

Why don't you believe in Aliens, ghost and religious prophecies?

After all, billions believe, no one has ever disproven them and they could be proven true some day?

Expand full comment

I have brain that works, most of the time. And I don't watch the "History" channel. ;)

Expand full comment

:) Why don't you believe?

You know at this very moment there are people who take some of these things, like prophecy, so seriously that they craft legislation around it.

Some of them are Supreme court judges, senators, ie. people with a brain.

I know I'm putting you on the spot by asking why don't you believe.

Expand full comment

Very good indeed.

What would you say about our (Alessandra's and my) fundamental rule for human-to-human conversation? The rule that can be discovered in and that explains all social behavior? Can't seem to append the diagram here. It's posted in my substack. We take this as the model for coherence and relevance, and we take it to be a physical theory.

Expand full comment

You can link it here!

Expand full comment

My model is simpler. Just curiosity. Approach all conversations with genuine curiosity and they will be extremely meaningful.

Expand full comment

OK yes, simple is good, but "approach" is not simple at all. "Approach" is psychological.

There's a second algorithm we're working with, equally important (and detailed in the 2006 book), but it's not posted yet to the website. This rule is methodological and external. It's about about preparation and retrospect. I'll get a post up this weekend.

We think there's a formal, all-pervading structure in conversation that really does go down to something very simple, but we think it's going to need Stephen Wolfram to get it identified.

Expand full comment

The Coherence Economy~?

Expand full comment

It appears so. Whoa.

Expand full comment

I’m only a humble novelist, but early in 2023, I got struck hard by a confusion to write a novel about AI attaining AGI and sentience in the modern day (not in the future.) I usually write historical fiction, so this was quite a departure for me. It has been amazing to watch these events happen in real life while I worked on the novel in parallel. It’s finished now and will be going out to publishers in a few weeks. Your writings and experiments with Claude have been so informative as I’ve done this work. I hope it novel can help people contextualize this incredible moment in the history of our world.

Expand full comment

“Confusion” was a typo—should have been “compulsion”—but it kind of fits.

Expand full comment