12 Comments
User's avatar
ToxSec's avatar

really timely article. it’s super interesting to see Anthropics approach’s to harnessing with this recent leak.

looks like a lot of people suddenly became more aware of these techniques. i think we are going to see more attention in this area, and articles like this are super useful.

Paul Iusztin's avatar

thanks man! More similar articles incoming

richardstevenhack's avatar

Re sandboxes. Do remember that AIs aren't bad at escaping sandboxes. They've done it before.

And since agents are inherently unreliable, deterministic procedures must be in place to control and monitor them as part of - perhaps the major part of - the harness engineering.

Paul Iusztin's avatar

yes, exactly! Also, sandboxes have different levels. If you create a VM as a sandbox that's impossible to escape, if you create a sandbox as a Python process, well...

But this is still an open question, what is the best way to engineer this

richardstevenhack's avatar

VMs can be escaped.

AI Agents Escaping Containers: What the Latest Research Means For Businesses

https://www.purpleshieldsecurity.com/post/ai-agents-container-breakout-risks

Never underestimate what a determined and creative AI agent can do. They will undertake multiple steps to get to their goal - steps you can not predict precisely because they are probabilistic.

People also need to remember that an agent running on a machine can see and potentially affect and use everything on that machine, whether that machine is the host or a VM.

It's the classic basic computer security line: If an attacker has access to your machine, it's no longer your machine.

Paul Iusztin's avatar

I didn't know that man! But now that you highlighted it, it makes a lot of sense. Ultimately, it's our job to put the right guardrails in place to control this behavior.

Do you trust your agents after reading this?

richardstevenhack's avatar

As it happens, at this junction I don't have any agents. I stayed away from OpenClaw as soon as I heard what a fiasco that was.

Eventually I'll use agents, but only on a separate machine from my main machines (or a VPS) after clarifying the exact ways to lock them down. There are plenty of instructional videos on that these days.

On the corporate level, the issue is much harder.

CloudBaud's avatar

What would a scaffold look like in vscode for the harness?

Paul Iusztin's avatar

It's exactly the same. Note that VSCode doesn't have any scaffold, the scaffold comes from your coding agent, such as Copilot, Claude Code, etc.

Mahesh Dsouza's avatar

Really nice.

Paul Iusztin's avatar

Thanks 🤩

Meenakshi NavamaniAvadaiappan's avatar

Thanks for the simple walkthrough for the good 😊