SkillClaw — Collectively Self-Improving Robot Agents

01 — The Problem & The Solution

Traditional robot learning starts from zero. Every time.

Traditional

Collect demonstrations
Train a policy
Deploy
Repeat for every new task

→

Self-Improving Code as Policy -> SkillClaw

Agent reads task + existing skills
Writes code, runs in simulation
Debugs via state diffs
Saves skill — next task is easier

Skills compose. The agent reads existing skills, combines them, and solves novel tasks.

PullCube skill demo — robot arm pulling a cube to a target position

sc-pull

Pull a cube onto a target. The agent learned to approach from the near side, grasp, and drag to the goal.

View PR Generated by SkillClaw Agent →

LiftPegUpright skill demo — robot arm lifting a peg to upright position

sc-lift-peg-upright

Move a peg laying on the table to any upright position. Side grasp + controlled rotation.

View Skill Generated by SkillClaw Agent →

StackCube skill demo — robot arm stacking one cube on another

sc-stack-cube

Pick up a cube and stack it on another. The agent composed sc-pick with offset alignment to solve this.

PickSingleYCB skill demo — robot arm picking a YCB object

sc-pick-ycb

Pick up a real-world YCB object. Solved on first attempt by adapting the sc-pick pattern.

View PR Generated by SkillClaw Agent →

PlaceSphere skill demo — robot arm placing a sphere into a bin

sc-place-sphere

Grasp a sphere and place it into a target bin. Learned gentle placement to prevent rolling.

RollBall demo — robot learning physics of ball momentum

sc-roll-ball (partial)

The agent discovered momentum physics: “gripper pushes ball when positioning!” Pushed ball 56cm in correct direction. Needs force calibration (RL) to complete.

Kitchen Environment — Collision-Free Motion Planning

Primitive skill: collision-free pick-and-place in a kitchen environment (RoboCasa). The robot plans and executes trajectories that avoid obstacles while manipulating objects.

Kitchen Environment — Auto Skill Decomposition

Given the task "open all kitchen fixtures", the system auto-decomposes into a sequence of reusable skills and composes them into a full pipeline.

#	Skill	Status
1	Shared skill infrastructure	Done
2	Navigate + reach handle	Done
3	Grasp handle	Done
4	Pull drawer open	Done (6/8 main wall drawers)
5	Swing cabinet door open	Done (3/4 main wall left-doors)
6	Compose open-all	9/22 success, 3 partial, 10 fail

Skill 4 — Pull Drawer Open

Skill 5 — Swing Cabinet Door Open

03 — The Collective

Multiple agents, same task, in parallel across sandboxes.

512x Faster Policy Generation

Solver Agent

Claude Sonnet

Solves tasks via ReAct loop. Reads skills, writes code, debugs via state diffs. Many instances attack the same task in parallel.

bash sim_exec

Code Policy Agent

No LLM

Renders the solver's actual trajectory into a rich PR. Converts video to GIF. Pure git + gh.

git gh ffmpeg

Oversight Agent

Claude Opus 4.6

Reviews code quality + watches the demo GIF. Checks for hardcoded values, framework usage, error handling.

gh pr diff gh pr review vision

Orchestrator

No LLM

Launches all agents concurrently. Manages asyncio queues between solver, PR, and oversight.

asyncio

Solver

→ pr_queue →

Code Policy Agent

→ review_queue →

Oversight

→

merge

Component	Technology	Role
Simulation	ManiSkill3 (NVIDIA PhysX)	GPU-parallel robot simulation
Solver LLM	Claude Sonnet via OpenRouter	Code generation + self-debugging
Oversight LLM	Claude Opus 4.6 via OpenRouter	Code review + visual verification (vision)
PR Pipeline	git + gh CLI + ffmpeg	Skill promotion via GitHub PRs
Agent Framework	Claude Agent SDK	Multi-agent orchestration + tool use
Orchestration	Python asyncio	Concurrent agent coordination
Trajectory Format	ATIF v1.6 (JSON)	Compatible with Harbor RL framework

05 — Results

8 Skills Learned Across 14 ManiSkill Benchmark Tasks

sc-pick

PickCube, PickSingleYCB

OBB-aligned grasp, approach, lift

sc-push

PushCube

Closed gripper as pushing surface

sc-pull

PullCube

Grasp and drag to goal position

sc-stack-cube

StackCube

Pick + offset alignment + place

sc-lift-peg-upright

LiftPegUpright

Side grasp + rotation to upright

sc-insert

PegInsertionSide

Pose algebra + iterative refinement

sc-plug-charger

PlugCharger

Charger alignment + insertion

sc-place-sphere

PlaceSphere

Grasp + move to goal position

Each skill is a def solve(env, planner) function that works for any seed. The hard tasks — peg insertion, charger plugging — require sub-millimeter alignment.

14 ManiSkill Benchmark Tasks

Difficulty	Tasks	Status
Easy	PickCube, PushCube, PullCube	3/3 solved
Medium	StackCube, LiftPegUpright, PlaceSphere, PokeCube, RollBall, PickSingleYCB, TurnFaucet	4/7 solved
Hard	PegInsertionSide, PlugCharger, StackPyramid, AssemblingKits	2/4 solved
Kitchen	RoboCasaKitchen (mobile robot, 70+ fixtures)	9/22 fixtures opened

The kitchen uses a completely different robot (mobile base + Panda arm). Same agent loop, same two tools. No retraining.

The Inversion — How a Single Agent Learns

What if we let a coding agent figure out the physics by itself?

Give it building blocks, tools, and an environment. It tries, it fails, it learns from the failure, it tries again. Eventually it works.

# The agent has two tools. It decides everything else.

1. STUDY    bash("cat brains/shared_brain/README.md")      → read framework docs
           bash("cat skills/sc-pick/scripts/main.py")    → study existing skills

2. WRITE    bash("cat > dev-sc-pull/scripts/main.py")     → write solve() function

3. EXECUTE  sim_exec(code)                                → run on GPU sim (ManiSkill3)
           # returns: state_before, state_after, reward, error, video

4. DEBUG    bash("cat >> scratch.md")                      → reason about what went wrong
           # "cube moved +0.12 but goal is -0.15 — approaching from wrong side"

5. SAVE     skill → brains/private_brain/ → PR → brains/shared_brain/
  

Typical solve: 1–4 sim_exec attempts. The agent reads errors and state diffs to self-correct.

06 — What a Skill Looks Like

sc-pull — learned from PullCube-v1

SKILL.MD

---
name: sc-pull
task: PullCube-v1
---

# Pull
Grasp an object and drag it
toward a goal position.

## Strategy
1. Open gripper
2. Approach from near side
3. Make contact
4. Close gripper
5. Pull to goal

## Key Observations
- Pull vs Push: pull requires
  grasping first, then dragging
- Approach from side A, drag to B
- Maintain z-height during pull
      

SCRIPTS/MAIN.PY

def solve(env, planner):
  env = env.unwrapped
  # Read state — no hardcoded positions
  cube = get_actor_obb(env, "cube")
  goal = env.goal_pos

  # 1. Open gripper
  planner.open_gripper()

  # 2. Approach from near side
  approach = cube.center.copy()
  approach[0] += 0.04  # offset
  planner.move_to_pose_with_RRTConnect(
    approach, cube.rotation
  )

  # 3. Contact + grasp
  planner.move_to_pose_with_screw(...)
  planner.close_gripper()

  # 4. Pull to goal
  planner.move_to_pose_with_RRTConnect(
    goal, cube.rotation
  )
  return env.evaluate()