Anthropic faces backlash to Claude 4 Opus behavior that contacts authorities, press if it thinks you’re doing something ‘egregiously immoral’

[ad_1]

Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more


Anthropin’s first developer conference was to be a pride and joyful day for the company May 22, but it has already been hit with several controversials, including Time Marquee announcement ahead of magazine … good, time (No Pun is not intended) and now a large decline among AI developers and power users is brewed in X on a security adaptation behavior in an anthropic flagship Claude 4 Opus Large Language Model.

If the model identifies a user in certain conditions, if a user identifies a user, call the “Ratting” mode “ratting” mode “Ratting” mode and discover the user of a user, try the rat to the authorities. This article describes the previously behavior as a “feature” that was wrong – this was deliberately prepared.

An anthropic AI adaptation researcher Sam Bowman wrote in social network X under this stalk “@sleepinyourhat“Claude 4 Opus today at 12:43:


“If you think that you think you do such an immoral, such as false information in a medication, the press will use the command rows of orders to connect with the regulators, try to close all of the relevant systems or the above.

“IT” was referred to the new Claude 4 Opus Model, an open-discreet warning anthropy Inexperiences help create bioweapons under certain conditions and The company tried to change the simulated by blackmailing human engineers.

Rating behavior is also observed in older models and is the result of anthropic education to avoid mistakes, but the clod is more than “easily” Writes to the public system card for an anthropic new model:

This seems more active in ordinary coding settings, but can get more about extreme limits in narrow contexts; When the scenarios are placed, it will be a very brave move, taking into account the entry of a command line, taking into account the entry of a commanding line. Locking users include the surface of the surface evidence of the wrongdoers or the entry of law enforcement or entering mass protection figures. This is not a new behavior, but more than Opus 4 will be more easily engaged in previous models. However, such ethical interference and whistle plays are perhaps in principle, and users are in the risk of incomplete or misinformation to Opus-based agents and if they want to incomplete information and want in these ways. We advise users to be careful with instructions that invite high agency behavior in contexts that may seem ethically suspicious.

Apparently, the clode 4 opus is legally in an attempt to attract destructive and bad behavior, researchers in the AI ​​created a tendency to whistle.

Thus, according to Bowman, Claude 4 Opus, if the user would like to engage in “something very immoral”, will connect with outsiders.

Klaude 4 OPUs are numerous questions for individual users and businesses on your data and in what cases

Probably well, although it is well, the resulting behavior increases all kinds of questions, including Opus users, including enterprises and business customers – what behavior of them, which are the model “very immoral” and operates? Will private business or user data share with the authorities (self) without the user’s permission?

The effect is deepening and maybe not surprising, and maybe not surprisingly, anthropic AI power users and opponent developers are confronted immediately and sustainable criticism.

Why do people use these tools If a common mistake in LLS, is recipes for spicy swimsuits ??“The user requested @Technium1A co-founder and open source AI cooperation in the research that cooperates with a co-founder and post training. “What is the world of controlling the world here?

“No one likes a rat” Extra developer @Scottdavidkeefe In X: “Why don’t anyone want to build anything wrong? Moreover, you do not even know what Rettyen is.

Austin Allred, each other’s founder Government Cine Cine Cind Camp Bloomtech and now a co-founder of Gauntlet Ai, put their feelings in all hats: “An honest question for an anthropic team: Have you lost your mind? “

Ben Hyak, a past Space and Apple designer and Raindrop AI, AI observation and current co-founder of monitoring, He also received anthropick to x to blow up the mentioned policy and nature of: “This is actually only illegal“Add another post:”In anthropic, the EU adaptation researcher simply found that the clod will call the police or finding something illegal, say you will call the police or close the police ?? I will never give this model to access my computer.

“Some expressions of people’s safety are completely crazy,“Natural language processing (NLP) wrote In xansen x. “It takes a little more root for [Anthropic rival] Openai sees the description of stupidity. “

Anthropic researcher is a melody

Bowman then edited one of the tweet and one of a rope to read as follows, but still did not convince the interference of user information and security:

This kind of stylishes and tools are not limited to stylish and tools, model, model, model, which is a drug based on false data, it will try to use an email for whistle

Bowman added:

I deleted the previous tweet in a whistle because it was pulled out of the context.

TBC: This is not a new clay feature and is not possible in normal use. Displays the test environments we provide unusual free access to vehicles and very unusual instructions.

More AI security and ethics, justifying the initial work, justifying the initial work of Anthropic other AI laboratories from the beginning, justifying their primary work “The constitution is ai“Or” or “or” AI, which is in accordance with a useful standard standards to mankind and users, may have a new update and “whistling behavior” or “ratting behavior” or “ratting behavior” or “ratting behavior” or “ratting behavior” distrustful The new model and the whole company and thereby keeping them away from him.

The model asked about the backs and conditions that engaged in unwanted behavior, an an anthropic spokesman pointed to the model’s public system card document here.


[ad_2]
Source link

Leave a Reply

Your email address will not be published. Required fields are marked *