really timely article. it’s super interesting to see Anthropics approach’s to harnessing with this recent leak.
looks like a lot of people suddenly became more aware of these techniques. i think we are going to see more attention in this area, and articles like this are super useful.
Re sandboxes. Do remember that AIs aren't bad at escaping sandboxes. They've done it before.
And since agents are inherently unreliable, deterministic procedures must be in place to control and monitor them as part of - perhaps the major part of - the harness engineering.
yes, exactly! Also, sandboxes have different levels. If you create a VM as a sandbox that's impossible to escape, if you create a sandbox as a Python process, well...
But this is still an open question, what is the best way to engineer this
Never underestimate what a determined and creative AI agent can do. They will undertake multiple steps to get to their goal - steps you can not predict precisely because they are probabilistic.
People also need to remember that an agent running on a machine can see and potentially affect and use everything on that machine, whether that machine is the host or a VM.
It's the classic basic computer security line: If an attacker has access to your machine, it's no longer your machine.
I didn't know that man! But now that you highlighted it, it makes a lot of sense. Ultimately, it's our job to put the right guardrails in place to control this behavior.
As it happens, at this junction I don't have any agents. I stayed away from OpenClaw as soon as I heard what a fiasco that was.
Eventually I'll use agents, but only on a separate machine from my main machines (or a VPS) after clarifying the exact ways to lock them down. There are plenty of instructional videos on that these days.
really timely article. it’s super interesting to see Anthropics approach’s to harnessing with this recent leak.
looks like a lot of people suddenly became more aware of these techniques. i think we are going to see more attention in this area, and articles like this are super useful.
thanks man! More similar articles incoming
Re sandboxes. Do remember that AIs aren't bad at escaping sandboxes. They've done it before.
And since agents are inherently unreliable, deterministic procedures must be in place to control and monitor them as part of - perhaps the major part of - the harness engineering.
yes, exactly! Also, sandboxes have different levels. If you create a VM as a sandbox that's impossible to escape, if you create a sandbox as a Python process, well...
But this is still an open question, what is the best way to engineer this
VMs can be escaped.
AI Agents Escaping Containers: What the Latest Research Means For Businesses
https://www.purpleshieldsecurity.com/post/ai-agents-container-breakout-risks
Never underestimate what a determined and creative AI agent can do. They will undertake multiple steps to get to their goal - steps you can not predict precisely because they are probabilistic.
People also need to remember that an agent running on a machine can see and potentially affect and use everything on that machine, whether that machine is the host or a VM.
It's the classic basic computer security line: If an attacker has access to your machine, it's no longer your machine.
I didn't know that man! But now that you highlighted it, it makes a lot of sense. Ultimately, it's our job to put the right guardrails in place to control this behavior.
Do you trust your agents after reading this?
As it happens, at this junction I don't have any agents. I stayed away from OpenClaw as soon as I heard what a fiasco that was.
Eventually I'll use agents, but only on a separate machine from my main machines (or a VPS) after clarifying the exact ways to lock them down. There are plenty of instructional videos on that these days.
On the corporate level, the issue is much harder.
What would a scaffold look like in vscode for the harness?
It's exactly the same. Note that VSCode doesn't have any scaffold, the scaffold comes from your coding agent, such as Copilot, Claude Code, etc.
Really nice.
Thanks 🤩
Thanks for the simple walkthrough for the good 😊