Discussion about this post

User's avatar
ToxSec's avatar

really timely article. it’s super interesting to see Anthropics approach’s to harnessing with this recent leak.

looks like a lot of people suddenly became more aware of these techniques. i think we are going to see more attention in this area, and articles like this are super useful.

richardstevenhack's avatar

Re sandboxes. Do remember that AIs aren't bad at escaping sandboxes. They've done it before.

And since agents are inherently unreliable, deterministic procedures must be in place to control and monitor them as part of - perhaps the major part of - the harness engineering.

10 more comments...

No posts

Ready for more?