novitalabs/rft-tinker
Python
Captured source
source ↗novitalabs/rft-tinker
Description: RL training demo with tinker + sandbox
Language: Python
Stars: 0
Forks: 0
Open issues: 0
Created: 2026-01-19T08:22:11Z
Pushed: 2026-01-26T03:06:14Z
Default branch: main
Fork: no
Archived: no
README:
RFT-Tinker: R2E-Gym Training with Tinker API + Agent Sandbox
Overview
Experimental setup for training code generation models on R2E-Gym dataset using:
- Tinker API for RL model training
- Agent Sandbox for safe code execution
- R2E-Gym Dataset (4.5K real-world GitHub issues)
Reproducing DeepSWE experiments (42.2% Pass@1 on SWE-Bench-Verified).
Quick Start
1. Clone Repository
git clone https://github.com/novitalabs/rft-tinker.git cd rft-tinker
2. Install Dependencies
python3 -m venv venv source venv/bin/activate pip install datasets huggingface-hub novita-sandbox tinker torch transformers
3. Configure API Keys
Copy the example environment file:
cp .env.example .env.local
Edit .env.local with your API keys:
# Agent Sandbox API Key (get from https://novita.ai) NOVITA_API_KEY=your_novita_api_key_here # Tinker API Token (get from Tinker platform) TINKER_API_TOKEN=your_tinker_api_token_here # Template IDs NOVITA_TEMPLATE_BASE=vn9xnp3cm92x6rmqlgwc
Warning: Never commit `.env.local` with real credentials!
4. Run Tests
Test Agent Sandbox connectivity:
python -m tests.integration.test_novita_basic
Test R2E-Gym workflow:
python -m tests.integration.test_r2e_gym_workflow
5. Prepare Dataset
Download R2E-Gym sample (50 instances):
python scripts/prepare_data/prepare_r2e_sample.py
Test dataset loading:
python -m tests.unit.test_dataset_loading
Project Structure
rft-tinker/ ├── src/ # Core source code │ ├── datasets/ # Dataset utilities and repo mapping │ ├── environments/ # Sandbox environment wrappers │ ├── rollout/ # Multi-turn rollout pipeline │ └── utils/ # Utility functions ├── tests/ # All test files │ ├── integration/ # Integration tests │ ├── rollout/ # Rollout pipeline tests │ └── unit/ # Unit tests ├── scripts/ # Utility scripts ├── templates/ # Agent Sandbox Dockerfile templates ├── docs/ # Documentation ├── data/ # Datasets (gitignored) ├── outputs/ # Generated outputs (gitignored) ├── tinker_r2e_training.py # RL training script ├── tinker_sft_training.py # SFT training script └── .env.example # API keys template
Training Scripts
RL Training (GRPO)
python tinker_r2e_training.py
Configuration (in script): | Parameter | Value | Purpose | |-----------|-------|---------| | GROUP_SIZE | 10 | Parallel sandboxes per problem | | MAX_STEPS | 40 | Max actions per episode | | SAVE_INTERVAL | 2 | Checkpoint frequency (batches) | | TEMPERATURE | 1.0 | Sampling temperature |
SFT Training (Optional Warm-Start)
python tinker_sft_training.py
Converts gold patches to edit trajectories for supervised fine-tuning warm-start.
Weight Validation
python validate_sft_weights.py
Validates SFT checkpoint weights before RL training.
Agent Sandbox Templates
r2e-gym-base (vn9xnp3cm92x6rmqlgwc)
- Python 3.8.10, pytest 8.3.5, numpy 1.24.4
- Core: scipy, sympy, requests, pillow
- For most Python repositories
r2e-gym-scientific
- Adds: pandas, scikit-learn, matplotlib, seaborn, h5py
- For scientific computing
r2e-gym-pillow
- Pillow 10.4.0 with full image processing
- For image-heavy repositories
Agent Sandbox API
from novita_sandbox.core import Sandbox
# Create sandbox
sandbox = Sandbox.create(
api_key=api_key,
template=template_id,
timeout=3600
)
# Run commands (synchronous - no await)
result = sandbox.commands.run("echo 'Hello World'")
print(result.stdout)
print(result.exit_code)
# Write files
sandbox.files.write("/path/to/file.py", content.encode())R2E-Gym Workflow
Standard evaluation workflow:
# 1. Clone repo at base commit
sandbox.commands.run(f"git clone {repo_url} /tmp/testbed")
sandbox.commands.run(f"cd /tmp/testbed && git checkout {base_commit}")
# 2. Apply model-generated patch
sandbox.files.write("/tmp/patch.diff", patch_content)
sandbox.commands.run("cd /tmp/testbed && git apply /tmp/patch.diff")
# 3. Run tests that should now pass (FAIL_TO_PASS)
result = sandbox.commands.run(f"cd /tmp/testbed && pytest {fail_tests}")
# 4. Run tests that should remain passing (PASS_TO_PASS)
result = sandbox.commands.run(f"cd /tmp/testbed && pytest {pass_tests}")
# 5. Compute reward
reward = 1.0 if all_tests_passed else 0.0Dataset Schema
Each R2E-Gym instance contains:
{
"instance_id": "orange3__2d9617bd",
"repo": "orange3",
"commit_hash": "2d9617bd0cb1f0ba61771258410ab8fae8e7e24d",
"problem_statement": "[ISSUE] ...",
"modified_files": [...],
"test_files": ["test_1.py"],
"test_codes": ["..."],
"old_commit_exit_code": 1, # Tests fail before fix
"new_commit_exit_code": 0, # Tests pass after fix
"gold_patch": {...}
}Available Actions (in Rollout Generator)
The rollout generator provides 8 tools for the model:
1. bash - Execute shell commands 2. read - Read file content (with line range support) 3. search - Pattern search (grep -rn) 4. find_file - Locate files by pattern 5. list_dir - Directory listing (ls -lah) 6. edit - Line-based file editing 7. run_test - Execute test commands 8. submit - Submit solution
Performance Notes
Based on actual training measurements:
| Phase | Duration | % of Batch | |-------|----------|------------| | Sandbox creation (10×) | ~21s | 1.2% | | Repository setup (10×) | ~2 min | 6.7% | | Rollout execution | ~25-28 min | ~90% | | Training update | ~30s | 1.7% | | Sandbox cleanup | ~15s | 0.8% |
Key metrics:
- Sandbox hot-start latency: 60-100ms/task
- Concurrent sandboxes: Up to 150 per account
DeepSWE Comparison
| Aspect | DeepSWE | This Setup | |--------|---------|------------| | Model | Qwen3-32B | Qwen3-30B-A3B | | Hardware | 64 H100 | Tinker | | Dataset | R2E-Gym (4.5K) | Same ✅ | | Sandbox | Kubernetes + Docker | Agent Sandbox ✅ | | Pass@1 | 42.2% (SOTA) | TBD |
Documentation
- [Technical Blog](docs/novita-sandbox-rl-training.md) - Detailed guide on RL training with Agent Sandbox
- [Progress Report](docs/PROGRESS.md) - Detailed development progress
References
- DeepSWE Paper: https://www.together.ai/blog/deepswe
- **R2E-Gym…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10New repo, no traction yet