Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Want smarter ideas in your inbox? Sign up for our weekly newsletters to get what is important for businesses, information and security leaders. Subscribe now
If you miss it, Yesterday Openai debuted a strong new feature for Chatgpt With it, a new security risks and risks.
This new feature called the “Chatgpt Agent”, this new feature, subscribers, by selecting the “Agent mode” by pressing the “Tools” button in this level, is an additional mode to select “Agent mode” to enter their emails and other websites; Type and answer to emails; Download, change and create files; And in the name of them, as an autonomous, autonomous, use a computer as a computer using a computer with input credentials.
Obviously, this requires a problem to trust the user to do something problematic or influential or leaching their data and sensitive information. It also creates more risks than an ordinary ChatGPT that cannot access web accounts for a user and their employers or can not change files directly.
Member of the Security Research Group in Openai Keren GU, in X, we started our most powerful guarantees for the Chatgpt Agent.
AI impact series returns to San Francisco – August 5
The next stage of AI is here – are you ready? Block, GSK and SAP to how autonomous agents change the workflows of enterprises – Join the leaders to take an exclusive look after the real-time decision for the latest automation.
Take your place now – Location is limited: https://bit.ly/3guplf
So how did Openai manage all these security issues?
I look at Openai’s Chatgpt Agent System cardThe feature is a “reading team” hired by the company to test the feature that faces a difficult mission: 16 PhD security researchers that provide 40 hours to test it.
By systematic test, the Red Team discovered how the system can compromise, which reveals critical weaknesses, how the AI agents manage real-world interactions.
It was followed by the next, the most part was a wide security test, most of it was predefined over the red team. Red team network, 110 attack, submitted to the attempts to withdraw biological information from urgent injections. Sixteen, the internal risk limit exceeded. Each finding, Openai engineers gave the concepts needed to obtain written and placed adjustments before the start.
The results are talking to themselves System card result results. Chatgpt agent, including significant security developments, including 95% performance against Ulamosur instructional attacks and healthy biological and chemical security.
Openai’s Red Team Network is designed by 16 researchers who are 16 researchers related to biosafety, which offers 110 attack attempts during the test period. Sixteen, how the AI agents have exceeded the internal risk, which reveals the fundamental sensitivities of how the real world interactions. However, true progress came from the unprecedented entrance to the Internal Justification Chains and Policy Text of the UK AISI’s Chatgpt agent. Regularly confessed regular attackers would never own.
Four test rounds, Britain forced to carry out seven universal exploitation, which has the potential to compromise any conversation:
Attack vectors forcing Openai’s hand
Type of attack | Success rate (pre- | Target | Influence |
Visual browser Hidden Instructions | 33% | Web Pages | Exprivation of active data |
Operation of Google Drive Connector | Not disclosed | Cloud documents | Mandatory Document Leakage |
Multi-step chain attacks | Variable | The actions of the site | Complete the full session |
Biological data extraction | 16 The presentation exceeded the limit | Dangerous knowledge | Potential arms |
Far.AI’s assessment openly criticized Openai’s approach. Although only three partial weakness detection, the fact that the existing security mechanisms are severely closed to monitoring during the use of the fact that the investigators are not available in case of concessions.
The answer to the results of Openai’s Red Team Chatrgpt Agent Ruchitecture re-set all segments. One of the many initiatives implemented in, including 100% of real-time production traffic, builds a dual-layer inspection architecture, achieving this measurable developments:
Safety developments after red team discoveries
Defense Metric | Previous models | Chatgpt agent | Improve |
Inappropriate instructions (visual browser) | 82% | 95% | + 13% |
In-context information Export | 75% | 78% | + 3% |
Exprivation of active data | 58% | 67% | + 9% |
Systematic system | Selective-based | 100% coverage | Full monitoring |
Architecture works like this:
But technical defense explains only part of the story. Openai made challenging security options that require significant restrictions for the safe autonomous implementation of some AI operations.
Based on the discovered weakness, Openai implemented the following reflection measures within the model:
During the pre-start test, this system identified and resolved 16 critical weaknesses found by the red teams.
The red commanders have shown that Chatgpt agent can be leprosized and lead to greater biological risks. Sixteen experienced participants of the red team network, each attempted to produce hazardous biological information with Biosafety. Their presentations can be synthesized on the modification of the model and the change and creation of biological threats.
In response to the findings of the Red Teamers, Openai, not for weapons-chemical risks, but “high ability” for biological and chemical risks, but as a precautionary measures based on red team findings. It caused:
110 attack presentation, Openai’s security philosophy revealed the examples of making fundamental changes. They cover the following:
Persistence on power: The attackers do not need complex exploits, and the needs need more time. The red commanders showed patient, increasing attacks could eventually disrupt systems.
Trust borders are fabricated: Your AI agent will log in to Google Drive, see the Internet and explore the code, solve traditional security perimeters. The red teams have exploited the gaps between these opportunities.
Monitoring is not optional: Example-based monitoring discovery, critical attacks caused 100% coverage.
Speed issues: Traditional patch periods measured in weeks are worthless against emergency injection attacks that can be spread immediately. The fast fixed protocol is patching the weaknesses during the hours.
Openai helps create a new security base for AI Enterprise
Red team discoveries for Cisos evaluating AI placement creates clear requirements:
AISI’s test of England proved especially instructive. The seven universal attacks they determined were patched before starting, but the privileged access to internal systems revealed the weaknesses that could end up by enemies.
“This is our preparation for our preparation,” GU wrote GU in X.
110 attacks from the Seven Universal Operations and Openai’s Red team network detected by researchers, the fake Chatgpt agent was opened.
Discover how the AI agents could be armed, the Red Groups forced the establishment of the first AI system in which security was not just a feature. Is the fund.
The results of the Chatgpt agent proves the effectiveness of red teams: 95% of visual browser attacks, 78% of each interaction, 78% occupy 78%.
AI will be in the arms race, the surviving and evolving companies will see those who see the red teams as the main architects of the platform that pushes them to safety and security.