Rails testing on autopilot: Building an agent that writes what developers won't
Captured source
source ↗Rails testing on autopilot: Building an agent that writes what developers won't | Mistral AI Solutions Rails testing on autopilot: Building an agent that writes what developers won't March 11, 2026 By By Maxime Langelier & Mathis Grosmaitre - Applied AI - Proto team
Back to Blog
9 min read
Share this post
Copy to clipboard Copied
In most large Rails monoliths, organizations prioritize writing new features over writing tests for them. Over time, more and more code goes untested, forcing teams to spend more time debugging painful bugs. We built an autonomous agent that closes that gap. It reads Rails source files, generates or improves RSpec tests, validates them against style rules and coverage targets, and runs inside a CI/CD pipeline with no human intervention. To operate on codebases at this scale, it runs in parallel: multiple instances working on different files simultaneously. RSpec through an agent's eyes Ruby is dynamically typed: there is no compilation step, so errors surface at runtime. For our agent, this means the only way to verify test syntax is to execute it. RSpec , the standard Rails testing framework, makes tests expressive and readable, but its domain-specific language is easy to get wrong. When the agent reads a Ruby on Rails codebase, it reads five main file types (models, serializers, controllers, mailers, helpers), each structured differently (therefore tested in different ways). The agent needs distinct instructions for each type. One benefit: the mapping from source file to spec file is nearly 1:1. The general convention is:
There are a few exceptions to that rule however, like that app/controllers/ are sometimes mapped to spec/requests/ , or that sometimes a single source file can have multiple spec files, in which case the convention is:
This straightforward mapping makes it easy to locate the tests for any given file, or to identify files that lack tests entirely. Where it gets harder for our agent is that to avoid duplicating code, RSpec relies heavily on shared context: factories, fixtures, database schemas... Factories: Reusable templates for creating test objects with predefined attributes, making it easy to generate consistent test data.
Fixtures: Static data files that preload test database records, providing a fixed baseline for tests.
If a factory file doesn’t exist, the agent creates it; if it does, the agent reuses it. Because factories are shared across many tests (unlike spec files), careless changes can easily break tests elsewhere, so updates to these files must be made with caution. Building the agent with Vibe We built the agent on top of Vibe , Mistral's open-source coding assistant. The default system prompt was sufficient for this project, so we focused on three levers: repository-level context, specialized skills, and custom tools. Context engineering Context engineering was central to our approach. Vibe supports a repository-level AGENTS.md file: when running on a repository with this file at its root, its contents are automatically appended to the system prompt. The AGENTS.md we used provided basic details about the target repositories, but mostly, it provided the agent with a step-by-step execution plan: 1. Read the source file
2. Read the documentation (if it exists)
3. Check if a spec already exists
4. Choose and read exactly one skill based on the source file location
5. Find existing patterns, factories, and helpers
6. Execute the skill (Extract → Factory → Generate tests)
7. Validate with Rubocop tool
8. Validate with SimpleCov tool
Each step included details about what to do and what the success criteria are. We also included some best practices of RSpec on areas where we felt it was important to orient the agent. Example:
- NEVER use
be_present,be_truthy,be_between, orinclude(:key)
These are vague. Use eq(exact_value) always
We found the agent would sometimes skip methods or leave edge cases untested: it would generate a spec that looked complete but quietly ignored a few public methods from the source file. To counter this, the AGENTS.md ends with a forced self-review: the agent must re-read the source file and explicitly ask itself "Did I test every public method? Count them." before finishing. If anything is missing, it goes back. With this generic AGENTS.md file forcing the agent to follow strict planning, our quality score went from 0.68 to 0.74 , all from a single markdown file with framework-level instructions. Using SKILLS files: Recall step 4 of our AGENTS.md : 4. Choose and read exactly one skill based on the source file location A single generic skill would produce mediocre results: the instructions precise enough for testing a model file are the wrong instructions for a controller file. What worked was creating a separate skills file for each category, plus one for plain Ruby files. Here is an example of a basic skills file for testing controllers: ---
name: "Generate Request Spec"
description: "Generate RSpec request tests for a Rails controller. Use when the source file is in app/controllers/."
---
Generate Request Spec
File Scope
spec/requests//_spec.rb— drop_controllerfrom the filename
spec/factories/.rb— create or update if needed
Example tests for Controllers
frozen_string_literal: true
require 'rails_helper'
describe 'Admin::Users', type: :request do
let(:user) { create(:user, :admin) }
before { sign_in user }
Unauthorized access — one test per action
describe '#authorized?' do
let(:user) { create(:user) }
it 'GET /admin/users redirects' do
get '/admin/users'
expect(response).to have_http_status(:redirect)
end
end
Each action: happy + sad paths
describe 'POST /admin/users' do
let(:valid_params) { { user: attributes_for(:user) } }
let(:invalid_params) { { user: { email: '' } } }
context 'with valid params' do
it 'creates a record' do
expect { post '/admin/users', params: valid_params }.to change(User, :count).by(1)
expect(response).to have_http_status(:created)
end
end
context 'with invalid params' do
it 'returns unprocessable entity with errors' do
post '/admin/users', params: invalid_params
expect(response).to have_http_status(:unprocessable_entity)
json = JSON.parse(response.body, symbolize_names: true)
expect(json[:errors]).to include("Email can't be blank")
end
end
end
end
Critical Rules
- Assert content, not just status: always parse JSON and verify exact values
- **Test exact error...
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10Low traction blog post about a testing agent