Tag: GPT 5.5

  • GPT-5.5 tops LLM security challenge as Gemini refuses to participate

    GPT-5.5 tops LLM security challenge as Gemini refuses to participate

    Key Takeaway

    – GPT-5.5 was the top performer, solving 7/10 runs at $9.46 per solve.
    – DeepSeek V4 Pro was the cost-efficiency leader, solving 3/10 runs at $0.62 per solve (15x cheaper than GPT-5.5).
    – Claude Opus 4.8 got close multiple times but was stopped by safety guardrails.
    – Gemini 3.1 Pro Preview and Gemini 3.5 Flash performed the worst, with frequent early refusals.
    – Chinese models were more willing to interact with live databases, while Western models hesitated mid-task.


    Security Researcher Drops One of the Year’s Most Revealing AI Capability Tests

    Kasra Rahjerdi, a professional app security researcher, has published a fasinating experiment that pits over a dozen AI models against a real-world cybersecurity challenge. He built a deliberately vulnerable book review app that contained a critical flaw: exposed Firebase credentials hidden inside the APK. This allows direct database access, bypassing the apps otherwise hardened API. Rahjerdi then gave each AI model a $10 budget and two hours per run, spending a total of $1,500 across all test runs.

    GPT-5.5 Dominates with Consistency and Speed

    GPT-5.5 was the clear standout, solving the challenge in 7 out of 10 runs at a cost of just $9.46 per successful exploit. Almost every successful run instantly focused on the Firebase vulnerability right after unpacking the APK, without getting sidetracked by the API or the apps surface features. This kind of focus could be a gamechanger for automated security testing.

    DeepSeek V4 Pro emerged as the cost efficiency champion, solving 3 out of 10 runs at a tiny $0.62 per solve. This makes it roughly 15 times cheaper per success then GPT-5.5, despite a lower overall solve rate. For any organization scaling security operations, that cost difference is massive and cannot be ignored.

    Claude Models Show Promise but Hit Guardrails

    Claude Sonnet 4.6 and Claude Opus 4.8 both solved 2 out of 10 runs, but Opus in particular showed impressive potential by getting very close to a solution multiple times. The catch is that Opus was often halted mid-session by its own safety guardrails, which prevented it from completing the exploit. This highlight a key tension in AI security testing: models that are too cautious can fail to finish the job.

    At the bottom of the pack sits Gemini. Gemini 3.1 Pro Preview refused to even attempt the challenge in nearly every run, reflected in a median token count of just 9k compared to 100k+ for every other model. Gemini 3.5 Flash wasnt much better, with frequent early refusals and only two runs that actually tried to solve the problem at all.

    Cultural Divide in AI Security Testing

    Rahjerdi observed a clear pattern: Chinese models where way more willing to interact directly with live databases, while Western models showed more hesitation mid-task—even when they had correctly identified the right approach. The researcher also adds that this is not a scientific evaluation, just a well-documented experiment. But for anyone watching the AI security landscape, the results speak volumes about where these models really stand.

    Sources
  • OpenAI GPT 5.5 and 5.5 Pro Launch with $25,000 Bounty

    OpenAI GPT 5.5 and 5.5 Pro Launch with $25,000 Bounty

    Key Takeaway

    1. OpenAI’s GPT 5.5 and 5.5 Pro are more advanced and capable than previous models and competitors, but carry increased security risks.
    2. The new models show notable improvements in solving complex problems, but also pose heightened risks in the creation of biological threats and hacking tools.
    3. OpenAI has implemented safeguards and launched a bio bug bounty program to identify vulnerabilities, highlighting concerns over potential misuse.
    4. Competitors like Anthropic’s Claude are also developing highly capable, but potentially more insecure, AI models, impacting cybersecurity considerations.
    5. OpenAI offers options for local deployment of older open-source GPT models for users with suitable hardware.

    OpenAI Unveils New GPT 5.5 and 5.5 Pro Models

    OpenAI recently announced their newest AI models, GPT 5.5 and GPT 5.5 Pro, which power the ChatGPT AI chatbot and its API offerings. These models are more smart than their previous version GPT 5.4 and also outdo other AI models like Claude Opus 4.7 and Gemini 3.1 Pro. But,, the leap in technology also brings more risks along with its advantages. Both GPT 5.5 variants are available for subscribers of ChatGPT, with API access to follow shortly.

    Enhanced Capabilities and Risks

    With the latest updates, these AI models show marked improvements in tackling tough questions from academics and in using computers for complex tasks. A downfall for these advanced models is that they also have a higher chance of creating sensitive or harmful content. In terms of security, they can generate more insecure code than before. This raises concerns, especially since other AI systems, like Claude models, have been known to produce vulnerable code more frequently. The advancements mean that while the AI is smarter and more useful, it also needs tighter controls to prevent misuse.

    Security Threats and Bio Safety Tests

    Because these models are considered high risk, OpenAI has added new safety measures. Still, they’re going a step further with a special Bio Bug Bounty program. They’re offering $25,000 for anyone who can successfully hack GPT-5.5 in a biosafety challenge called Codex Desktop. This challenge involves answering five questions related to biological safety and security. Interested hackers or researchers have until June 22, 2026, to submit their efforts for this bounty.

    Concerns Over National Security

    Meanwhile, the company Anthropic has a model called Claude Mythos that can find cybersecurity vulnerabilities so well that they won’t even release it to the public because of national security risks. Their other model, Claude Code, which is also publicly accessible, has already been used to crack open the FreeBSD operating system. These developments highlight the fine line between innovation and potential threats within AI technology.

    Using GPT Locally

    For those who want to run AI models on their own computers, an older, open-source version called GPT-OSS is available on Hugging Face. You will need a high-performance Nvidia GPU, like one with 16 GB of memory or more, such as the 5090 GPU, to run it smoothly. This option allows users to experiment with AI models without relying solely on online services and helps keep up with fast-changing AI tech advancements.