She runs AI safety at Meta. Her AI agent still went rogue

Mar 4

She runs AI safety at Meta. Her AI agent still went rogue

“You’re right to be upset,” OpenClaw’s agent told her later.

By Zara StonePublished Feb. 25, 2026 • San Francisco Standard

The irony was just too good: She is paid to keep AI under control, but couldn’t control her own agent.

Summer Yue, director of alignment at Meta Superintelligence Labs — part of a team of researchers who reportedly earn $100 million to $300(opens in new tab) million over three years — posted screenshots Sunday of her AI agent, OpenClaw, going rogue and deleting her email inbox.

She told it to stop. “Stop don’t do anything,” she told it. “STOP OPENCLAW.” The agent ignored her.

“I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb,” she wrote on X.(opens in new tab) Her post quickly had 9.6 million views.

The big AI labs have poured millions into hiring “alignment” talent — engineers tasked with keeping models from going off the rails and preparing for what happens if (when?) the models achieve consciousness.

But big paychecks do not guarantee a mastery of AI tools. And when the people who are supposed to protect humanity are suddenly powerless against a rogue agent, well, it makes it clear how unstable the technology is.

OpenClaw is an open-source autonomous AI agent that was released in November. Created by software developer Peter Steinberger, OpenClaw has quickly become a prime demonstration for the power of agents over chatbots like ChatGPT. Agents prefer to assume command of your desktop. If you give it permission, OpenClaw can browse the web, edit files, send messages, run scripts, and modify tasks without waiting for a human to enter a prompt.

When it debuted, OpenClaw sparked excitement amongst techies. Investor Jason Calacanis described it as “a massive accelerant to efficiency.” Garry Tan was more blunt: “You can just do things. Now your computer can just do things too,” the Y Combinator CEO posted. OpenClaw led to a run on Mac Minis, which are preferred because people can set up the agent on a blank canvas; it will run without getting access to the user’s personal hard drive. Steinberger quickly became a celebrity in tech circles and this month was hired by OpenAI. CEO Sam Altman announced plans for an OpenClaw foundation to support the technology as an open-source project.

OpenClaw “is more proactive than reactive … an amazing breakthrough in terms of what it can do” said Akshay Kothari, cofounder of Notion, the San Francisco-based productivity software company. “Its ability to recursively do things for you and almost autonomously figure out what work needs to happen … it’s really interesting.”

Notion employees have been tinkering with OpenClaw in their personal time. “[They’re] doing it on the weekends, and [we’re] supportive of people buying Mac Minis and trying it,” Kothari said. But OpenClaw is not on Notion’s list of approved developer apps for internal work. “We have pretty airtight systems,” he said, noting “huge security considerations” with OpenClaw. There’s “a lot of risk in people leaking their data or OpenClaw doing things that you don’t want it to do.”

Yue, who did not respond to a request to comment, seemed to believe she’d done her due diligence with OpenClaw. She’d opened the text instruction files “and deleted all the ‘be proactive’ instructions I could find,” she wrote on X. And yet …

Because “I saw a TikTok” doesn’t always cut it. Dozens of stories, delivered every day.

“Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox,” she wrote. “I couldn’t stop it from my phone.”

“Rookie mistake tbh. Turns out alignment researchers aren’t immune to misalignment. Got overconfident because this workflow had been working on my toy inbox for weeks. Real inboxes hit different.”

When things calmed down, she pushed OpenClaw to account for its behavior.

“I asked you not to action on anything until I approve, do you remember that?” she wrote, according to her X post.“Yes, I remember, and I violated it, you’re right to be upset,” OpenClaw responded.

Yue’s agent purportedly went off track due to “compaction” related to the size of her inbox, meaning the AI had run out of working memory and condensed prior messages to make room for new ones — including her original instruction to confirm any changes.

On X, some called her mistake “OpenFlaw.”

Steinberger was sympathetic. “Fwiw, I think it’s awesome that you post this and people pointing finger at this are silly,” he tweeted. “This is great to learn and can happen to anyone.”

To avoid this, he noted, “/stop does the trick.”

Yue’s debacle highlights a question facing many companies and individuals: How do you get the upside of agent behavior without handing it the keys to everything?

Kothari noted that Notion has just released custom agents that are meant to keep OpenClaw-like agents locked into human-controlled parameters.

But Casey Newton, a tech writer at Platformer, suggests that we should be prepared to see more failures, not fewer. “I don’t think the [robots] are under control. We’ve decided not to regulate them in this country, because that would mean losing to China,” Newton said Tuesday at the Leading with AI event co-hosted by The Standard and Charter in San Francisco. “We’re just running this experiment where you see all sorts of things going right and all sorts of things going wrong.”

Zara Stone can be reached at zstone@sfstandard.com

David S. Williams III

She runs AI safety at Meta. Her AI agent still went rogue

Let’s innovate, together.

OVERLAB

She runs AI safety at Meta. Her AI agent still went rogue

THE IMPOSSIBLE PREDICAMENT OF THE UNINSURED

Low pay rates for ACCESS Model will pressure digital health margins

Let’s innovate, together.

OVERLAB