What does this repo signal mean?

Baseten published basetenlabs/baseten-skills (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo basetenlabs/baseten-skills · language Python · Reusable AI skills for Baseten model deployment.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Baseten Repo: basetenlabs/baseten-skills

Captured source

source ↗

GitHub/github.com/basetenlabs/baseten-skills

basetenlabs/baseten-skills repository metadata

Source ↗

published Apr 17, 2026seen Jun 5captured Jun 11http 200method plain

basetenlabs/baseten-skills

Description: Skills for using Baseten effectively

Language: Python

License: MIT

Stars: 6

Forks: 1

Open issues: 0

Created: 2026-04-17T12:53:11Z

Pushed: 2026-06-04T18:50:36Z

Default branch: main

Fork: no

Archived: no

README:

Baseten Skills

Agent DX bundle — [baseten skill](skills/baseten/) tuned for Baseten backend MCP, Docs MCP and CLI.

The MCP makes token usage and wall time more efficient - our evals (below) show that, while agents can still achieve goals with raw REST API usage with similar pass rate. Additionally, the MCP tool annotations allow agent harnesses formal gating of destructive operations, providing additional safeguards.

What you can do without leaving the chat:

Debug live: "Why do I see this log line" "Fix my deploy" → agent pulls logs, finds stack trace, proposes fix.
Operate: Promote dev → prod, bump autoscaling for traffic spike, run a test predict.
Keep the overview: "What's deployed, healthy, cold?" One-shot status across your account, easy cleanups.
Skip the doc dive: Agent gets pointers to Baseten docs, blogposts and more in context.
Wire up automations: Plug it into your own agents or internal tools for reactive ops without glue code.
Install once, works everywhere: npx add-mcp, your API key, done. Uniform setup across 14+ coding agents.
Read-only by default, mutations gated via harness policy check.

Set Up

Requirements:

For interacting with your Baseten workspace, provide an API key with management permissions (you can get it from the

webapp). We recommend using a purpose-dedicated key, so it can be independently revoked without impacting other workstreams.

Node >= 18 (for the install tools)

Installation

export BASETEN_MCP_KEY=...

{ [ -n "$BASETEN_MCP_KEY" ] && [ "$BASETEN_MCP_KEY" != "..." ]; } || { echo "Error: set BASETEN_MCP_KEY first"; false; } && \
npx add-mcp https://api.baseten.co/mcp -g -y --header "Authorization: Bearer ${BASETEN_MCP_KEY}" && \
npx add-mcp https://docs.baseten.co/mcp -n "baseten_docs" -g -y && \
npx skills add basetenlabs/baseten-skills -g -y

-g installs it globally on your host.
-y confirms selection for all detected harnesses.
If your harness supports env variable interpolation, you may also edit the MCP config file to expand your env vars

and set the desired key in the shell that starts the agent.

The truss CLI is separate and needed only for deployment authoring (not pure ops work). See CLI docs. E.g. if you use pip (similar for other package managers):

pip install truss --upgrade

You can install only part of the components or modify commands - but the best user experience comes from their combination.

Getting started & Usage

After installation, most agents require a restart.

Check if the MCP servers connect with /mcp or /mcps (if not connected, verify the BASETEN_MCP_KEY in the harness config file).

You can start asking any questions or tasks related to Baseten, from chatting about the docs, to brainstorming solution approaches, deploying and iterating on models or managing your workspace. Most agents trigger the skill as needed automatically; alternatively you can invoke it with /baseten.

Evaluation results

We measured the baseten skill against the bare Claude Opus 4.7 baseline across 16 tasks spanning model authoring, integration, operate, debug, and tune workflows. Five configurations × 4 runs × 16 evals = 320 runs.

| Configuration | Pass rate | Wall (s) | Cost ($) | |----------------------------------------------------|-----------|----------|----------| | Naked model (no skill, no MCP, no docs) | 0.89 | 107 | 0.56 | | + docs MCP | 0.85 | 110 | 0.66 | | + docs MCP + skill | 0.87 | 136 | 0.73 | | + docs MCP + baseten MCP | 0.91 | 99 | 0.54 | | + docs MCP + baseten MCP + skill (full kit) | 0.97 | 99 | 0.55 |

Highlights (95% CIs from cluster bootstrap over evals):

Full kit lifts pass rate from 0.89 to 0.97 vs. naked Opus 4.7 (Δ +0.08, CI excludes 0). Quality gains compound

when skill and MCP are paired: adding either on top of the other is significant on its own.

The baseten MCP cuts wall and cost roughly in half on backend-heavy tasks with no quality cost. On operate

tasks (promote, autoscale, status), wall drops from 124s → 53s and cost from $0.82 → $0.35 when MCP is added to a skill-loaded agent. Similar magnitudes on debug and tune.

Opus has strong baseline Baseten knowledge — most authoring tasks pass without the toolkit. The toolkit's

measurable value concentrates on tasks that need live workspace state (operate, debug, tune).

Full methodology, marginal effects across all metrics, per-eval breakdowns, and per-group analysis: [Full eval report](evals/baseten/README.md).

Notability

notability 1.0/10

Low traction new repo