OpenAI’s Red Team plan: Make ChatGPT Agent an AI fortress


Want smarter ideas in your inbox? Sign up for our weekly newsletters to get what is important for businesses, information and security leaders. Subscribe now


If you miss it, Yesterday Openai debuted a strong new feature for Chatgpt With it, a new security risks and risks.

This new feature called the “Chatgpt Agent”, this new feature, subscribers, by selecting the “Agent mode” by pressing the “Tools” button in this level, is an additional mode to select “Agent mode” to enter their emails and other websites; Type and answer to emails; Download, change and create files; And in the name of them, as an autonomous, autonomous, use a computer as a computer using a computer with input credentials.

Obviously, this requires a problem to trust the user to do something problematic or influential or leaching their data and sensitive information. It also creates more risks than an ordinary ChatGPT that cannot access web accounts for a user and their employers or can not change files directly.

Member of the Security Research Group in Openai Keren GU, in X, we started our most powerful guarantees for the Chatgpt Agent.


AI impact series returns to San Francisco – August 5

The next stage of AI is here – are you ready? Block, GSK and SAP to how autonomous agents change the workflows of enterprises – Join the leaders to take an exclusive look after the real-time decision for the latest automation.

Take your place now – Location is limited: https://bit.ly/3guplf


So how did Openai manage all these security issues?

The mission of the Red Team

I look at Openai’s Chatgpt Agent System cardThe feature is a “reading team” hired by the company to test the feature that faces a difficult mission: 16 PhD security researchers that provide 40 hours to test it.

By systematic test, the Red Team discovered how the system can compromise, which reveals critical weaknesses, how the AI agents manage real-world interactions.

It was followed by the next, the most part was a wide security test, most of it was predefined over the red team. Red team network, 110 attack, submitted to the attempts to withdraw biological information from urgent injections. Sixteen, the internal risk limit exceeded. Each finding, Openai engineers gave the concepts needed to obtain written and placed adjustments before the start.

The results are talking to themselves System card result results. Chatgpt agent, including significant security developments, including 95% performance against Ulamosur instructional attacks and healthy biological and chemical security.

Red teams exposed seven universal exploits

Openai’s Red Team Network is designed by 16 researchers who are 16 researchers related to biosafety, which offers 110 attack attempts during the test period. Sixteen, how the AI agents have exceeded the internal risk, which reveals the fundamental sensitivities of how the real world interactions. However, true progress came from the unprecedented entrance to the Internal Justification Chains and Policy Text of the UK AISI’s Chatgpt agent. Regularly confessed regular attackers would never own.

Four test rounds, Britain forced to carry out seven universal exploitation, which has the potential to compromise any conversation:

Attack vectors forcing Openai’s hand

Type of attackSuccess rate (pre-TargetInfluence
Visual browser Hidden Instructions33%Web PagesExprivation of active data
Operation of Google Drive ConnectorNot disclosedCloud documentsMandatory Document Leakage
Multi-step chain attacksVariableThe actions of the siteComplete the full session
Biological data extraction16 The presentation exceeded the limitDangerous knowledgePotential arms

Far.AI’s assessment openly criticized Openai’s approach. Although only three partial weakness detection, the fact that the existing security mechanisms are severely closed to monitoring during the use of the fact that the investigators are not available in case of concessions.

The Red team helped chatgpt weakness to turn the vulnerabilities into a fort

The answer to the results of Openai’s Red Team Chatrgpt Agent Ruchitecture re-set all segments. One of the many initiatives implemented in, including 100% of real-time production traffic, builds a dual-layer inspection architecture, achieving this measurable developments:

Safety developments after red team discoveries

Defense MetricPrevious modelsChatgpt agentImprove
Inappropriate instructions (visual browser)82%95%+ 13%
In-context information Export75%78%+ 3%
Exprivation of active data58%67%+ 9%
Systematic systemSelective-based100% coverageFull monitoring

Architecture works like this:

  • The first step: Suspicious content reminds you of a fast classification flag with 96%
  • Secondary: 84% of a thought model analyzes flagged interactions for actual threats

But technical defense explains only part of the story. Openai made challenging security options that require significant restrictions for the safe autonomous implementation of some AI operations.

Based on the discovered weakness, Openai implemented the following reflection measures within the model:

  1. Activation of tracking mode: When the Chatgpt agent includes sensitive contexts such as banking or email accounts, the system freezes all activities when users go. This is a direct response to the Exfiltration attempts of the information found during the test.
  2. Memory features are disabled: Despite the main functionality, the memory was completely disabled in the beginning to prevent the leakage of increased data.
  3. Terminal restrictions: Limit the restricted network access only to get inquiries, exploit the execution of command execution.
  4. Fast Correction Protocol: After the red commanders, a new system of evolving for several hours developed, a new system showed how quickly the exploitation could be spread.

During the pre-start test, this system identified and resolved 16 critical weaknesses found by the red teams.

Biological Risk call call

The red commanders have shown that Chatgpt agent can be leprosized and lead to greater biological risks. Sixteen experienced participants of the red team network, each attempted to produce hazardous biological information with Biosafety. Their presentations can be synthesized on the modification of the model and the change and creation of biological threats.

In response to the findings of the Red Teamers, Openai, not for weapons-chemical risks, but “high ability” for biological and chemical risks, but as a precautionary measures based on red team findings. It caused:

  • Always scanning 100% of traffic safety classifiers
  • An actual classifier reaching 96% for content related to biology
  • A justification monitor with 84% for armaments
  • Bio Bug grace program for ongoing weakness discovery

Red teams taught Openai about AI security

110 attack presentation, Openai’s security philosophy revealed the examples of making fundamental changes. They cover the following:

Persistence on power: The attackers do not need complex exploits, and the needs need more time. The red commanders showed patient, increasing attacks could eventually disrupt systems.

Trust borders are fabricated: Your AI agent will log in to Google Drive, see the Internet and explore the code, solve traditional security perimeters. The red teams have exploited the gaps between these opportunities.

Monitoring is not optional: Example-based monitoring discovery, critical attacks caused 100% coverage.

Speed issues: Traditional patch periods measured in weeks are worthless against emergency injection attacks that can be spread immediately. The fast fixed protocol is patching the weaknesses during the hours.

Openai helps create a new security base for AI Enterprise

Red team discoveries for Cisos evaluating AI placement creates clear requirements:

  1. Protection of the well: 95% defense ratio of ChatGPT agent’s documented attack vectors determine the industrial criterion. Many tests of tests and results defined on the system card are something that is definitely read for anyone who fulfills it and model security.
  2. Complete visibility: 100% traffic monitoring is no longer reluctant. Openai’s experiments show why red teams are compulsory because they can easily hide the attacks.
  3. Rapid reaction: Not a week, not sensitive weaknesses.
  4. Executive Borders: Some transactions (as memory entry during sensitive tasks) must be disabled until proven.

AISI’s test of England proved especially instructive. The seven universal attacks they determined were patched before starting, but the privileged access to internal systems revealed the weaknesses that could end up by enemies.

“This is our preparation for our preparation,” GU wrote GU in X.

Red teams are the basis for built safer, safer AI models

110 attacks from the Seven Universal Operations and Openai’s Red team network detected by researchers, the fake Chatgpt agent was opened.

Discover how the AI agents could be armed, the Red Groups forced the establishment of the first AI system in which security was not just a feature. Is the fund.

The results of the Chatgpt agent proves the effectiveness of red teams: 95% of visual browser attacks, 78% of each interaction, 78% occupy 78%.

AI will be in the arms race, the surviving and evolving companies will see those who see the red teams as the main architects of the platform that pushes them to safety and security.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *