Primitive skill: collision-free pick-and-place in a kitchen environment (RoboCasa). The robot plans and executes trajectories that avoid obstacles while manipulating objects.
Given the task "open all kitchen fixtures", the system auto-decomposes into a sequence of reusable skills and composes them into a full pipeline.
| # | Skill | Status |
|---|---|---|
| 1 | Shared skill infrastructure | Done |
| 2 | Navigate + reach handle | Done |
| 3 | Grasp handle | Done |
| 4 | Pull drawer open | Done (6/8 main wall drawers) |
| 5 | Swing cabinet door open | Done (3/4 main wall left-doors) |
| 6 | Compose open-all | 9/22 success, 3 partial, 10 fail |
Solves tasks via ReAct loop. Reads skills, writes code, debugs via state diffs. Many instances attack the same task in parallel.
Renders the solver's actual trajectory into a rich PR. Converts video to GIF. Pure git + gh.
Reviews code quality + watches the demo GIF. Checks for hardcoded values, framework usage, error handling.
Launches all agents concurrently. Manages asyncio queues between solver, PR, and oversight.
| Component | Technology | Role |
|---|---|---|
| Simulation | ManiSkill3 (NVIDIA PhysX) | GPU-parallel robot simulation |
| Solver LLM | Claude Sonnet via OpenRouter | Code generation + self-debugging |
| Oversight LLM | Claude Opus 4.6 via OpenRouter | Code review + visual verification (vision) |
| PR Pipeline | git + gh CLI + ffmpeg | Skill promotion via GitHub PRs |
| Agent Framework | Claude Agent SDK | Multi-agent orchestration + tool use |
| Orchestration | Python asyncio | Concurrent agent coordination |
| Trajectory Format | ATIF v1.6 (JSON) | Compatible with Harbor RL framework |
Each skill is a def solve(env, planner) function that works for any seed.
The hard tasks — peg insertion, charger plugging — require sub-millimeter alignment.
| Difficulty | Tasks | Status |
|---|---|---|
| Easy | PickCube, PushCube, PullCube | 3/3 solved |
| Medium | StackCube, LiftPegUpright, PlaceSphere, PokeCube, RollBall, PickSingleYCB, TurnFaucet | 4/7 solved |
| Hard | PegInsertionSide, PlugCharger, StackPyramid, AssemblingKits | 2/4 solved |
| Kitchen | RoboCasaKitchen (mobile robot, 70+ fixtures) | 9/22 fixtures opened |
The kitchen uses a completely different robot (mobile base + Panda arm). Same agent loop, same two tools. No retraining.
Give it building blocks, tools, and an environment. It tries, it fails, it learns from the failure, it tries again. Eventually it works.
Typical solve: 1–4 sim_exec attempts. The agent reads errors and state diffs to self-correct.
SKILL.MD
SCRIPTS/MAIN.PY