microsoft/waza azd-ext-microsoft-azd-waza_0.31.0
microsoft/waza
Captured source
source ↗published Apr 28, 2026seen 4dcaptured 9hhttp 200method plain
Waza azd Extension v0.31.0
Repository: microsoft/waza
Tag: azd-ext-microsoft-azd-waza_0.31.0
Published: 2026-04-28T20:08:51Z
Prerelease: no
Release notes:
Changelog
All notable changes to waza will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
[0.31.0] - 2026-04-28
Added
- Custom agent (`.agent.md`) eval support — Discover
.agent.mdfiles alongsideSKILL.md, parse agent-specific frontmatter (tools,model,handoffs,mcp-servers,agents), auto-injecttool_constraintgrader from agenttools:field, complete worked example underexamples/custom-agent/, and new "Evaluating Custom Agents" docs guide (#226, closes #225)
Fixed
- Mock engine echoes file content —
_output_containsexpectations against file contents now work in CI without a real model. Mock response includes task metadata, file paths, and a 1KB content preview per resource (#228, closes #227) - `waza serve` no longer crashes when stdin isn't a terminal — MCP stdio server only starts when
term.IsTerminal()is true; piped input or background mode no longer kills the HTTP dashboard (#224)
Changed
- Vocabulary renames — Internal types renamed:
BenchmarkSpec→EvalSpec,TestRunner→EvalRunner. Not a breaking change for external consumers (types live ininternal/) (#222)
Documentation
- Cross-reference audit for recent renames + custom agent feature: added
.agent.mdcoverage to quickstart, getting-started, GUIDE, TUTORIAL, examples README; updated mock engine descriptions in INTEGRATION-TESTING and eval-yaml guide (#230)
Dependencies
- Bump postcss from 8.5.6 to 8.5.12 in /site (#229)
[0.30.1] - 2026-04-22
Documentation
- Updated README with missing CLI commands — Added documentation for recently-added CLI commands that were missing from the README (#220)
[0.30.0] - 2026-04-22
Added
- `waza quality` command — LLM-as-Judge skill quality scoring that evaluates skill output quality using a configurable judge model (#218)
- Scope-reduction advisory check —
waza checknow includes an advisory that flags skills with overly broad scope, helping authors tighten skill definitions (#219)
[0.29.0] - 2026-04-22
Added
- `--keep-workspace` flag — Preserve the temporary workspace after task execution for debugging agent output (#123, #217)
- `--no-skills` flag and `disabled_skills` config — Disable specific skills during evaluation to isolate behavior (#126, #216)
- Non-blocking version update check — CLI now checks for newer waza versions in the background without slowing startup (#104, #214)
- Per-task `skill_directories` — Specify different skill directories for individual tasks in eval YAML (#156, #215)
Dependencies
- Bump astro and @astrojs/starlight in /site (#212)
[0.28.0] - 2026-04-21
Added
- Follow-up prompts in eval YAML — Tasks can now include pre-written follow-up prompts for multi-turn evaluation conversations (#189, #209)
- `waza models` command — List all available models supported by the configured engine (#208)
- Early termination for trigger tests — Trigger tests can now stop early once the target skill is invoked, reducing evaluation time (#207)
Fixed
- Stricter YAML validation — Audited all YAML parsers; unknown fields in
TestCasedefinitions are now properly rejected (#132, #206) - Test fixture assertion syntax — Fixed invalid Python expression in a test fixture assertion (#197)
- CI integration test stability — CI integration tests now correctly handle expected eval failures when using the mock executor (#210)
Documentation
- Added Quick Start guide to the documentation site (#205)
[0.27.0] - 2026-04-21
Added
- `output_contains_any` expectation — New expectation field that passes when the agent response contains any one of the specified strings (#203)
- `max_response_time_ms` behavior rule — Enforce maximum response time constraints on agent execution (#201)
- Task prompt from file — Task
promptfield can now reference an external file path instead of inline text (#157, #200) - `tool_calls` grader — New grader type that validates the specific tool calls an agent makes during execution (#187, #202)
Fixed
- Webserver test resilience — Webserver tests now skip gracefully when frontend assets are not built (#204)
[0.26.0] - 2026-04-21
Changed
- Timestamped output directories —
run --output-dirnow groups result files by timestamp for cleaner organization (#153) - Improved debug logging — Debug output is now more structured and useful for troubleshooting (#152)
Fixed
- `--discover` finds eval.yaml in nested layout — Skill discovery now correctly locates
eval.yamlfiles inevals/{name}/directories at the project root (#44) - Diff grader reads post-execution workspace — The diff grader now reads files from the workspace after agent execution completes, not before (#165, #196)
- Grader config validation — Required grader configuration fields are now validated before evaluation starts (#195)
- macOS install and trigger test count — Fixed macOS binary installation and an off-by-one error in trigger test counting (#164, #184, #193)
Documentation
- Added cache command reference, prompt mode documentation, and complete YAML schema reference (#198)
- Updated demo guide and added CI/CD integration guide (#112, #89, #194)
Dependencies
- Bump defu from 6.1.4 to 6.1.6 in /site (#181)
- Bump vite from 6.4.1 to 6.4.2 in /site and /web (#182, #192)
- Bump go.opentelemetry.io/otel/sdk from 1.42.0 to 1.43.0 (#185)
- Bump astro from 5.17.3 to 5.18.1 in /site (#163)
- Bump picomatch from 4.0.3 to 4.0.4 in /site and /web (#159, #160)
- Bump smol-toml from 1.6.0 to 1.6.1 in /site (#158)
[0.25.0] - 2026-04-21
Added
- Eval coverage grid generator — New coverage output that visualizes which skills have eval coverage across grader types (#92)
Fixed
- SKILL.md injection and trigger fixture loading —
waza runnow correctly injects SKILL.md content into the evaluation context, loads trigger test fixtures, and passes MCP server configuration to the engine (#191)
Dependencies
- Bump h3 from 1.15.5 to 1.15.8 in /site (#144)
[0.24.0] - 2026-03-25
###…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Routine extension version update.