Nebius Build SF 2026

SkillClaw

Collectively Self-Improving Robot Agents
8
Skills Learned
14
Benchmark Tasks
512x
Faster Policy Gen
› bash("python -m rlvr.agents --tasks PullCube-v1 --parallel --agents 512")
Solver Agent 1
PullCube-v1 · seed 42
├─bash study sc-pick
├─sim_exec attempt 1
├─scratch debug
└─sim_exec attempt 2
Solver Agent 2
PullCube-v1 · seed 17
├─bash study sc-pick
├─sim_exec attempt 1
├─sim_exec attempt 2
└─sim_exec attempt 3
Solver Agent 3
PullCube-v1 · seed 99
├─bash study sc-push
├─sim_exec attempt 1
└─save sc-pull
Code Policy Agent Review Code Policy Agent Review Code Policy Agent Review Shared Brain
01 — The Problem & The Solution
Traditional robot learning starts from zero. Every time.
Traditional
  • Collect demonstrations
  • Train a policy
  • Deploy
  • Repeat for every new task
Self-Improving Code as Policy -> SkillClaw
  • Agent reads task + existing skills
  • Writes code, runs in simulation
  • Debugs via state diffs
  • Saves skill — next task is easier
Skills compose. The agent reads existing skills, combines them, and solves novel tasks.
PullCube skill demo — robot arm pulling a cube to a target position
sc-pull
Pull a cube onto a target. The agent learned to approach from the near side, grasp, and drag to the goal.
View PR Generated by SkillClaw Agent →
LiftPegUpright skill demo — robot arm lifting a peg to upright position
sc-lift-peg-upright
Move a peg laying on the table to any upright position. Side grasp + controlled rotation.
View Skill Generated by SkillClaw Agent →
StackCube skill demo — robot arm stacking one cube on another
sc-stack-cube
Pick up a cube and stack it on another. The agent composed sc-pick with offset alignment to solve this.
PickSingleYCB skill demo — robot arm picking a YCB object
sc-pick-ycb
Pick up a real-world YCB object. Solved on first attempt by adapting the sc-pick pattern.
View PR Generated by SkillClaw Agent →
PlaceSphere skill demo — robot arm placing a sphere into a bin
sc-place-sphere
Grasp a sphere and place it into a target bin. Learned gentle placement to prevent rolling.
RollBall demo — robot learning physics of ball momentum
sc-roll-ball (partial)
The agent discovered momentum physics: “gripper pushes ball when positioning!” Pushed ball 56cm in correct direction. Needs force calibration (RL) to complete.
Kitchen Environment — Collision-Free Motion Planning

Primitive skill: collision-free pick-and-place in a kitchen environment (RoboCasa). The robot plans and executes trajectories that avoid obstacles while manipulating objects.

Kitchen Environment — Auto Skill Decomposition

Given the task "open all kitchen fixtures", the system auto-decomposes into a sequence of reusable skills and composes them into a full pipeline.

#SkillStatus
1Shared skill infrastructureDone
2Navigate + reach handleDone
3Grasp handleDone
4Pull drawer openDone (6/8 main wall drawers)
5Swing cabinet door openDone (3/4 main wall left-doors)
6Compose open-all9/22 success, 3 partial, 10 fail
Skill 4 — Pull Drawer Open
Skill 5 — Swing Cabinet Door Open
03 — The Collective
Multiple agents, same task, in parallel across sandboxes.
512x Faster Policy Generation
Solver Agent
Solver Agent
Claude Sonnet

Solves tasks via ReAct loop. Reads skills, writes code, debugs via state diffs. Many instances attack the same task in parallel.

bash sim_exec
Code Policy Agent
Code Policy Agent
No LLM

Renders the solver's actual trajectory into a rich PR. Converts video to GIF. Pure git + gh.

git gh ffmpeg
Oversight Agent
Oversight Agent
Claude Opus 4.6

Reviews code quality + watches the demo GIF. Checks for hardcoded values, framework usage, error handling.

gh pr diff gh pr review vision
Orchestrator
Orchestrator
No LLM

Launches all agents concurrently. Manages asyncio queues between solver, PR, and oversight.

asyncio
Solver
→ pr_queue →
Code Policy Agent
→ review_queue →
Oversight
merge
ComponentTechnologyRole
SimulationManiSkill3 (NVIDIA PhysX)GPU-parallel robot simulation
Solver LLMClaude Sonnet via OpenRouterCode generation + self-debugging
Oversight LLMClaude Opus 4.6 via OpenRouterCode review + visual verification (vision)
PR Pipelinegit + gh CLI + ffmpegSkill promotion via GitHub PRs
Agent FrameworkClaude Agent SDKMulti-agent orchestration + tool use
OrchestrationPython asyncioConcurrent agent coordination
Trajectory FormatATIF v1.6 (JSON)Compatible with Harbor RL framework
05 — Results
8 Skills Learned Across 14 ManiSkill Benchmark Tasks
sc-pick
PickCube, PickSingleYCB
OBB-aligned grasp, approach, lift
sc-push
PushCube
Closed gripper as pushing surface
sc-pull
PullCube
Grasp and drag to goal position
sc-stack-cube
StackCube
Pick + offset alignment + place
sc-lift-peg-upright
LiftPegUpright
Side grasp + rotation to upright
sc-insert
PegInsertionSide
Pose algebra + iterative refinement
sc-plug-charger
PlugCharger
Charger alignment + insertion
sc-place-sphere
PlaceSphere
Grasp + move to goal position

Each skill is a def solve(env, planner) function that works for any seed. The hard tasks — peg insertion, charger plugging — require sub-millimeter alignment.

14 ManiSkill Benchmark Tasks
DifficultyTasksStatus
EasyPickCube, PushCube, PullCube3/3 solved
MediumStackCube, LiftPegUpright, PlaceSphere, PokeCube, RollBall, PickSingleYCB, TurnFaucet4/7 solved
HardPegInsertionSide, PlugCharger, StackPyramid, AssemblingKits2/4 solved
KitchenRoboCasaKitchen (mobile robot, 70+ fixtures)9/22 fixtures opened

The kitchen uses a completely different robot (mobile base + Panda arm). Same agent loop, same two tools. No retraining.

The Inversion — How a Single Agent Learns
What if we let a coding agent figure out the physics by itself?

Give it building blocks, tools, and an environment. It tries, it fails, it learns from the failure, it tries again. Eventually it works.

# The agent has two tools. It decides everything else. 1. STUDY bash("cat brains/shared_brain/README.md") → read framework docs bash("cat skills/sc-pick/scripts/main.py") → study existing skills 2. WRITE bash("cat > dev-sc-pull/scripts/main.py") → write solve() function 3. EXECUTE sim_exec(code) → run on GPU sim (ManiSkill3) # returns: state_before, state_after, reward, error, video 4. DEBUG bash("cat >> scratch.md") → reason about what went wrong # "cube moved +0.12 but goal is -0.15 — approaching from wrong side" 5. SAVE skill → brains/private_brain/ → PR → brains/shared_brain/

Typical solve: 1–4 sim_exec attempts. The agent reads errors and state diffs to self-correct.

06 — What a Skill Looks Like
sc-pull — learned from PullCube-v1

SKILL.MD

--- name: sc-pull task: PullCube-v1 --- # Pull Grasp an object and drag it toward a goal position. ## Strategy 1. Open gripper 2. Approach from near side 3. Make contact 4. Close gripper 5. Pull to goal ## Key Observations - Pull vs Push: pull requires grasping first, then dragging - Approach from side A, drag to B - Maintain z-height during pull

SCRIPTS/MAIN.PY

def solve(env, planner): env = env.unwrapped # Read state — no hardcoded positions cube = get_actor_obb(env, "cube") goal = env.goal_pos # 1. Open gripper planner.open_gripper() # 2. Approach from near side approach = cube.center.copy() approach[0] += 0.04 # offset planner.move_to_pose_with_RRTConnect( approach, cube.rotation ) # 3. Contact + grasp planner.move_to_pose_with_screw(...) planner.close_gripper() # 4. Pull to goal planner.move_to_pose_with_RRTConnect( goal, cube.rotation ) return env.evaluate()