WritingDigitalOcean (GradientAI)DigitalOcean (GradientAI)published Jun 3, 2026seen 5d

DigitalOcean Serverless Inference: A Deep Dive

Open original ↗

Captured source

source ↗
published Jun 3, 2026seen 5dcaptured 3dhttp 200method plain

DigitalOcean Serverless Inference: A Deep Dive | DigitalOcean

© 2026 DigitalOcean, LLC. Sitemap .

Dark mode is coming soon. Engineering DigitalOcean Serverless Inference: A Deep Dive

By smehta

Updated: June 3, 2026 17 min read

"}], "max_tokens": 1024 }'

Model Context Protocol (MCP)

Connect to remote MCP servers — authenticated or unauthenticated — for live data access:

Shell curl -X POST https://inference.do-ai.run/v1/chat/completions \ -H "Authorization: Bearer $MODEL_ACCESS_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai-gpt-4o", "messages": [{"role": "user", "content": "Fetch my DigitalOcean account info."}], "tools": [{ "type": "mcp", "server_label": "digitalocean", "server_url": "https://accounts.mcp.digitalocean.com/mcp", "authorization": "Bearer $DIGITALOCEAN_API_TOKEN", "allowed_tools": ["account-get-information"] }], "tool_choice": "required", "max_tokens": 512 }'

Web Search

Give models access to real-time web content:

Shell curl -X POST https://inference.do-ai.run/v1/responses \ -H "Authorization: Bearer $MODEL_ACCESS_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai-gpt-4o", "input": "What are the latest DigitalOcean Droplet pricing changes?", "tools": [{"type": "web_search", "max_uses": 3, "max_results": 5}], "max_output_tokens": 1024 }'

Agentic Workflows (Claude Code)

We offer full Anthropic tool-use compatibility through /v1/messages. Set ANTHROPIC_BASE_URL to https://inference.do-ai.run/v1/messages to run Claude Code and other agentic workflows on DigitalOcean:

Shell curl https://inference.do-ai.run/v1/messages \ -H "x-api-key: $MODEL_ACCESS_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "content-type: application/json" \ -d '{ "model": "anthropic-claude-4.6-sonnet", "max_tokens": 4096, "tools": [{ "name": "read_file", "description": "Read a file from the local filesystem.", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]} }], "messages": [{"role": "user", "content": "Refactor the authentication logic in src/auth.ts."}] }'

Pricing

(current as of May 2026)

Knowledge base retrieval and MCP incur no additional charges beyond standard per-token inference costs. Web search is $10 per 1,000 requests.

Inference Router

We mentioned the Inference Router earlier as a key differentiator. Here’s how it works in practice.

The Inference Router classifies each incoming request against your configured tasks, then selects the best model from a pool. Each task has up to 3 models and a selection policy: Cost Efficiency (cheapest by token cost), Speed Optimization (fastest by TTFT), Manual Ranking (your specified order), or Optimal (DigitalOcean’s benchmarking, for pre-configured tasks).

Using it is a one-line change — prefix the router name with router: in the model field:

Shell curl -X POST https://inference.do-ai.run/v1/chat/completions \ -H "Authorization: Bearer $MODEL_ACCESS_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "router:my-support-router", "messag

Notability

notability 4.0/10

Product blog, not model release