What does this writing signal mean?

Mistral AI published Rails testing on autopilot: Building an agent that writes what developers won't. This talking signal gives public context for research themes, product direction, policy, or launch framing. High-signal details: Low traction blog post about a testing agent · Rails testing on autopilot: Building an agent that writes what developers won't | Mistral AI Solutions Rails testing on autopilot: Building an agent that writes what.... onlylabs links this event to 1 captured evidence page and 6 related writing signals. It also maps to Evals and quality in the data-business radar.

Mistral AI Writing: Rails testing on autopilot: Building an agent that writes what developers won't

Captured source

source ↗

mistral.ai/mistral.ai/news/rails-testing-on-autopilot-building-an-agent-that-writes-what-developers-wont

Rails testing on autopilot: Building an agent that writes what developers won't

Source ↗

published Mar 11, 2026seen 2hcaptured 2hhttp 200method plain

Rails testing on autopilot: Building an agent that writes what developers won't | Mistral AI Solutions Rails testing on autopilot: Building an agent that writes what developers won't March 11, 2026 By By Maxime Langelier & Mathis Grosmaitre - Applied AI - Proto team

Back to Blog

9 min read

Share this post

Copy to clipboard Copied

In most large Rails monoliths, organizations prioritize writing new features over writing tests for them. Over time, more and more code goes untested, forcing teams to spend more time debugging painful bugs. We built an autonomous agent that closes that gap. It reads Rails source files, generates or improves RSpec tests, validates them against style rules and coverage targets, and runs inside a CI/CD pipeline with no human intervention. To operate on codebases at this scale, it runs in parallel: multiple instances working on different files simultaneously. RSpec through an agent's eyes Ruby is dynamically typed: there is no compilation step, so errors surface at runtime. For our agent, this means the only way to verify test syntax is to execute it. RSpec , the standard Rails testing framework, makes tests expressive and readable, but its domain-specific language is easy to get wrong. When the agent reads a Ruby on Rails codebase, it reads five main file types (models, serializers, controllers, mailers, helpers), each structured differently (therefore tested in different ways). The agent needs distinct instructions for each type. One benefit: the mapping from source file to spec file is nearly 1:1. The general convention is:

There are a few exceptions to that rule however, like that app/controllers/ are sometimes mapped to spec/requests/ , or that sometimes a single source file can have multiple spec files, in which case the convention is:

This straightforward mapping makes it easy to locate the tests for any given file, or to identify files that lack tests entirely. Where it gets harder for our agent is that to avoid duplicating code, RSpec relies heavily on shared context: factories, fixtures, database schemas... Factories: Reusable templates for creating test objects with predefined attributes, making it easy to generate consistent test data.

Fixtures: Static data files that preload test database records, providing a fixed baseline for tests.

If a factory file doesn’t exist, the agent creates it; if it does, the agent reuses it. Because factories are shared across many tests (unlike spec files), careless changes can easily break tests elsewhere, so updates to these files must be made with caution. Building the agent with Vibe We built the agent on top of Vibe , Mistral's open-source coding assistant. The default system prompt was sufficient for this project, so we focused on three levers: repository-level context, specialized skills, and custom tools. Context engineering Context engineering was central to our approach. Vibe supports a repository-level AGENTS.md file: when running on a repository with this file at its root, its contents are automatically appended to the system prompt. The AGENTS.md we used provided basic details about the target repositories, but mostly, it provided the agent with a step-by-step execution plan: 1. Read the source file

2. Read the documentation (if it exists)

3. Check if a spec already exists

4. Choose and read exactly one skill based on the source file location

5. Find existing patterns, factories, and helpers

6. Execute the skill (Extract → Factory → Generate tests)

7. Validate with Rubocop tool

8. Validate with SimpleCov tool

Each step included details about what to do and what the success criteria are. We also included some best practices of RSpec on areas where we felt it was important to orient the agent. Example:

NEVER use be_present, be_truthy, be_between, or include(:key)

These are vague. Use eq(exact_value) always

We found the agent would sometimes skip methods or leave edge cases untested: it would generate a spec that looked complete but quietly ignored a few public methods from the source file. To counter this, the AGENTS.md ends with a forced self-review: the agent must re-read the source file and explicitly ask itself "Did I test every public method? Count them." before finishing. If anything is missing, it goes back. With this generic AGENTS.md file forcing the agent to follow strict planning, our quality score went from 0.68 to 0.74 , all from a single markdown file with framework-level instructions. Using SKILLS files: Recall step 4 of our AGENTS.md : 4. Choose and read exactly one skill based on the source file location A single generic skill would produce mediocre results: the instructions precise enough for testing a model file are the wrong instructions for a controller file. What worked was creating a separate skills file for each category, plus one for plain Ruby files. Here is an example of a basic skills file for testing controllers: ---

name: "Generate Request Spec"

description: "Generate RSpec request tests for a Rails controller. Use when the source file is in app/controllers/."

---

Generate Request Spec

File Scope

spec/requests//_spec.rb — drop _controller from the filename

spec/factories/.rb — create or update if needed

Example tests for Controllers

frozen_string_literal: true

require 'rails_helper'

describe 'Admin::Users', type: :request do

let(:user) { create(:user, :admin) }

before { sign_in user }

Unauthorized access — one test per action

describe '#authorized?' do

let(:user) { create(:user) }

it 'GET /admin/users redirects' do

get '/admin/users'

expect(response).to have_http_status(:redirect)

end

Each action: happy + sad paths

describe 'POST /admin/users' do

let(:valid_params) { { user: attributes_for(:user) } }

let(:invalid_params) { { user: { email: '' } } }

context 'with valid params' do

it 'creates a record' do

expect { post '/admin/users', params: valid_params }.to change(User, :count).by(1)

expect(response).to have_http_status(:created)

end

context 'with invalid params' do

it 'returns unprocessable entity with errors' do

post '/admin/users', params: invalid_params

expect(response).to have_http_status(:unprocessable_entity)

json = JSON.parse(response.body, symbolize_names: true)

expect(json[:errors]).to include("Email can't be blank")

end

Critical Rules

Assert content, not just status: always parse JSON and verify exact values

**Test exact error...

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Low traction blog post about a testing agent