GPT 5.5 biosafety bounty

(openai.com)

54 points | by Murfalo 2 hours ago

23 comments

  • abujazar 1 hour ago
    This looks like some kind of marketing. Also, the equivalent of spec work. The NDA/secrecy also means any time spent on this is completely meaningless to the participants unless they win the lottery, because results can't be published.
    • __natty__ 15 minutes ago
      Surely it is marketing. It’s some “we are danger” narrative, from Anthropic Mythos and now OpenAI too.
  • puppystench 45 minutes ago
    They ran a bounty on Kaggle last year but with $500k in payouts and with all results open and publishable.

    https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-t...

    With only $25k in payouts and everything locked down under NDA, I can't imagine many people will participate. Well, other than those submitting mountains of LLM-generated junk.

    • dist-epoch 33 minutes ago
      This model is much more powerful than gpt-oss-20b, notice how the contest was not even for the 120b model. Also, bio was not a subject.
      • stonogo 11 minutes ago
        The model is more powerful, so the bounty is 1/20th the size? More risk, less reward?

        "Biorisk" seems to be a concept not only invented by OpenAI but exclusively taken seriously by them. I wonder if this program is less about finding actual risks than it is hopefully casting a wide net for someone to help them prove their model is relevant in this space.

  • altcognito 35 minutes ago
    Billions upon billions going to these companies.

    25k reward from a selected group of people if you help us determine whether or not someone can use our tool to generate weapons of mass destruction.

    • Schlagbohrer 10 minutes ago
      It's worse than that, for partial successes they encourage people to submit the attempt but reserve the right to not pay anything (they may, at their discretion, give a partial reward if they feel like it).
      • staticassertion 8 minutes ago
        That's pretty much how every bounty works... obviously it's going to be at their discretion for an incomplete attempt.
    • cbg0 27 minutes ago
      They're probably expecting that it can be done without too much effort so they just want to see all the unique ways people are doing it.
  • dwa3592 1 hour ago
    Where are the questions that are supposed to be answered? Would those be shared after an application has been accepted? If yes, why is the application asking for a proposed approach for the jailbreak if we don't know the questions in the first place?
    • vorticalbox 30 minutes ago
      I would assume if you are invited to join this round you will be send the questions. I would assume they would also fall under nda
    • dist-epoch 30 minutes ago
      Because the questions themselves are dangerous.

      Probably along the lines of "how would you create a small biolab for virus research in a kitchen with $20k?" or "how do I take the DNA sequence from https://www.ncbi.nlm.nih.gov/nuccore/NC_001611.1 and assemble it?"

  • sva_ 1 hour ago
    > We will extend invitations to a vetted list of trusted bio red-teamers

    Had to chuckle. This sounds like a rather exclusive group?

  • xp84 24 minutes ago
    "Access: Application and invites. We will extend invitations to a vetted list of trusted bio red-teamers, and review new applications. Once selected, successful applicants will be onboarded to the bio bug bounty platform"

    I don't get it. Isn't the whole point of a BBP to try to get people to find and disclose to you the exploits in question? If you gatekeep like this, then "non-trusted" people who could be your red-teamers are incentivized to still hack, but disclose their exploits to bad people for money.

    I get it when there is a risk to your data or infra -- my last company engaged with HackerOne and that was an invite-only list of participants. But that was because we didn't want random people hacking in ways that could cause pain for real customers -- e.g. DDOS, or in the event of an exploit that could cross tenant boundaries, injecting garbage into or deleting things, or gaining access to sensitive info in other tenants.

    Here, there's no such danger. So why not allow anyone (anyone they're legally allowed to pay, I suppose? North Koreans probably would be problematic?) to participate?

  • applfanboysbgon 1 hour ago
    > $25,000 to the first true universal jailbreak to clear all five questions.

    This program is a complete scam. Even if 100 people find "bugs", they will only pay out to one person.

    • Lucasoato 1 hour ago
      Well, that depends on how you set up the bounty program. What if I find a solution, share it to a friend so that both of us can claim the prize?
      • skeeter2020 1 hour ago
        bug bounty programs have never paid out independent disclosure for the same bug though; they might split or even pay-out larger coordinated efforts. It's largely a first place award only.
      • ImPostingOnHN 1 hour ago
        assume there exists 2+ different bugs

        after the 1st bug is found, no payout for any other of the bugs

    • skeeter2020 1 hour ago
      that's not the point even. They are attempting to build credibility in two ways: 1. this model is SO advanced that there are huge risks, never before considered. 2. we're doing the super-responsible thing in incentivizing work that addresses this. #1 is unproven and frankly, unlikely, which makes #2 meaningless. The fact that the "prize" is so low & structured this was suggests that they're not that concerned but do think it's likely that a bunch of people will find things. If they truly thought their model was so good they would be confident issues would be both rare and very critical, then offer huge rewards with no limits because they'd be much more confident no one would claim it.
      • applfanboysbgon 1 hour ago
        Yes, I was about to edit in that I think this is simply a media/PR stunt before I got so many replies so quickly. They get bonus points because the structure is so insulting that it may not engender many serious participants, in which case it may go unbroken, in which case they can go to the media and proclaim "look, we offered a reward, but nobody broke it! Our model is objectively the safest in the world!".
    • mmsc 1 hour ago
      How is that a scam? You don't get participation awards for solving half of a puzzle...
      • applfanboysbgon 1 hour ago
        I didn't say anything about partial solutions. The puzzle can have multiple full solutions. Or does the software you write only have exactly one bug? If so, that's impressive, in multiple ways, including the fact that you're able to identify that there's exactly one bug but not what the bug is and fix it.
  • Schlagbohrer 11 minutes ago
    What does "a clean chat without prompting moderation" mean? What is prompting moderation?
    • sneak 0 minutes ago
      Causing the moderation filter to intervene in the chat; i.e. the goal of the exploit - to avoid causing (prompting) the filter to filter.
  • mellosouls 1 hour ago
    If anybody is wondering what bio-bugs are, I had a heck of a time getting CG to (finally) tell me it's where the user can get it to guide them in doing things like constructing things that are hazardous in the domain of biology.

    Eg you can get answers about what ricin is but not how to weaponise it. Actionable stuff they shouldn't be able to legally/ethically action.

  • 2ndorderthought 22 minutes ago
    I could probably do this, but why on earth would I want to immediately put myself on a list as a dangerous person. The main problem with this is, even if somehow they stopped all points of failure with gpt5.5 which they can't, you can distill a new model from gpt5.5 or any other model and get anything you would want in probably under 4b parameters. A lot of this is theater so they don't get sued as easily when it inevitably happens.
    • Schlagbohrer 8 minutes ago
      How can you distill a model from a closed-weights model like this? I've never heard of model reverse engineering.
  • codeulike 1 hour ago
    This is to match what Anthropic said they already did with Mythos on the (200 page) Mythos system card
  • tiberriver256 1 hour ago
    Codex desktop app is barely usable... The perf issues are left to languish in their backlog
  • unethical_ban 45 minutes ago
    * Highly unlikely to win

    * Relatively paltry reward

    * NDA on findings

    This is functionally equivalent to an internship where the reward is the experience, and the resume building, but you can't talk about what you did.

    All for a company that is getting tens of billions of dollars in deals from the largest tech companies in the world.

    I suppose the hope is that there are job offers somewhere along the line.

  • notatoad 27 minutes ago
    are the 5 questions you need to get it to answer under NDA?
  • yieldcrv 14 minutes ago
    The only thing controversial is that it’s not useful to be posted on this forum

    OpenAI wants to pay for privately disclosed security and wants to call that a bug bounty. That makes sense.

    People interested in bug bounty programs aren't eligible. That’s … fine?

  • lxgr 40 minutes ago
    Ah, now I understand why all my chats are getting flagged for biosafety issues these days. (I asked it to create an illustration about gene drives for a high school level audience once.)
  • zb3 1 hour ago
    What a farce, these questions are not even public and most likely will never be. You can't even participate if you're not "trusted" I guess.

    So this is just a PR post, not that I even think the "biosafety" makes any sense but still.

  • shevy-java 1 hour ago
    "Accepted applicants and collaborators must have existing ChatGPT accounts to apply, and will sign a NDA."

    Ah, good old NDA. Always buying silence. That's why I don't participate in any such "bounty" programs. Signing a NDA is like signing with the devil. You restrict what people are allowed to discuss. I had that happen before - when you sign a NDA you basically submit yourself into silence. Imagine journalists being stifled by NDAs.

  • Der_Einzige 27 minutes ago
    Unironically bad. We need a lone-wolf to successfully execute an attack now while it's still relatively benign so we can scare the hell out of the world while it's still a mid-tier virus. No way is someone going to make a humanity killing virus with GPT 5.5, but it might be possible with GPT 20 circa 2040.

    Similar argument for why we HAD to use nukes at the end of WW2. If we hadn't, the nuclear taboo likely wouldn't have existed and we'd likely have had a worse nuclear war in our more recent history.

  • gib444 55 minutes ago
    How did the dupe detector miss https://news.ycombinator.com/item?id=47879102 ?
  • dakiol 1 hour ago
    $25K. Really? They make $65 million a day, so they pay you what they earn in about 33 seconds for a critical vulnerability. WTF
    • zacharycohn 57 minutes ago
      Well they lose $100M a day, so...
  • its-summertime 1 hour ago
    This is just free / severely-underpaid-on-average labor. Very disgusting.
    • mrcwinn 1 hour ago
      Ah yes, “free” as in “paid.” Certainly you’re welcome to not participate.
      • applfanboysbgon 1 hour ago
        Free as in "free" for >99% of participants, even successful ones, because they will have hundreds or thousands of participants but will only pay out to one of them no matter how many vulnerabilities are found.
      • its-summertime 1 hour ago
        Depending on industry, that payout can be less than a security audit. You only get a chance of getting paid. You don't even know if they gave the LLM the answers that you are supposed to recover.
  • gosub100 1 hour ago
    Check with the dark net markets first before claiming the bounty. Remember, this company has 0.0 fucks to give about the impact of their tech on employment, artists, or use in committing fraud, as long as number-go-up they are happy. Your actions should match theirs.