ReleaseMicrosoftMicrosoftpublished May 21, 2026seen 4d

microsoft/waza azd-ext-microsoft-azd-waza_0.33.0

microsoft/waza

Open original ↗

Captured source

source ↗
published May 21, 2026seen 4dcaptured 13hhttp 200method plain

Waza azd Extension v0.33.0

Repository: microsoft/waza

Tag: azd-ext-microsoft-azd-waza_0.33.0

Published: 2026-05-21T19:54:43Z

Prerelease: no

Release notes:

Changelog

All notable changes to waza will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[0.33.0] - 2026-05-21

Note: This release includes the changes previously prepared under 0.32.0, which was not published.

Added

  • Configurable eval file naming.waza.yaml can now configure files.evalFile, files.taskGlob, and files.taskFileSuffix, with the new naming carried through scaffolding, workspace discovery, discovery mode, schemas, and docs while preserving the existing eval.yaml and tasks/*.yaml defaults (#254, closes #232)
  • Instruction files in eval runs — Eval-level config.instruction_files and task-level instruction_files now copy files from the active context into task workspaces and append path-labeled contents to the Copilot system message (#248, closes #239)

Fixed

  • Prompt graders use the execution engine — Prompt graders now route judge turns through CopilotEngine instead of constructing a Copilot client directly, keeping grader execution aligned with engine configuration and preserving follow-up recovery behavior (#258, closes #54)
  • Prompt grader follow-up recovery — Prompt grading now preserves collected grades when a follow-up turn fails after successful grader collection (#251)
  • Bundled Copilot CLI updated — Embedded copilot-cli bundles are updated from 1.0.2 to 1.0.49 across supported platforms, with reproducible pinned bundle generation via COPILOT_CLI_VERSION (#260, closes #244)
  • Spec-aligned skill scaffoldingwaza new skill no longer asks for a nonstandard skill type or emits type: frontmatter, and the wizard now rejects early exits that omit required name or description fields (#261, closes #243)
  • `waza check` eval discovery — Nested skills and separated evals are discovered consistently in multi-skill workspaces (#247, closes #238)
  • Skill body routing markers — Compliance scoring now detects trigger, anti-trigger, and routing markers in SKILL.md body sections as well as frontmatter descriptions (#236, closes #223)

Changed

  • Copilot SDK v0.3.0 migration — Updated github.com/github/copilot-sdk/go to v0.3.0, migrated session event handling to typed payloads, and refreshed transcript, logging, web API, usage collection, suggestion trace, and test coverage for the new API (#255, closes #253)
  • Dashboard validation coverage — Added coverage for dashboard lint and end-to-end validation (#249)
  • Install documentation — Replaced unsupported go install guidance and clarified Windows/WSL install behavior (#246, closes #242; #245, closes #241)
  • Dependencies — Bump devalue in /site, postcss in /web, and astro in /site (#237, #235, #234)

[0.31.0] - 2026-04-28

Added

  • Custom agent (`.agent.md`) eval support — Discover .agent.md files alongside SKILL.md, parse agent-specific frontmatter (tools, model, handoffs, mcp-servers, agents), auto-inject tool_constraint grader from agent tools: field, complete worked example under examples/custom-agent/, and new "Evaluating Custom Agents" docs guide (#226, closes #225)

Fixed

  • Mock engine echoes file content_output_contains expectations against file contents now work in CI without a real model. Mock response includes task metadata, file paths, and a 1KB content preview per resource (#228, closes #227)
  • `waza serve` no longer crashes when stdin isn't a terminal — MCP stdio server only starts when term.IsTerminal() is true; piped input or background mode no longer kills the HTTP dashboard (#224)

Changed

  • Vocabulary renames — Internal types renamed: BenchmarkSpecEvalSpec, TestRunnerEvalRunner. Not a breaking change for external consumers (types live in internal/) (#222)

Documentation

  • Cross-reference audit for recent renames + custom agent feature: added .agent.md coverage to quickstart, getting-started, GUIDE, TUTORIAL, examples README; updated mock engine descriptions in INTEGRATION-TESTING and eval-yaml guide (#230)

Dependencies

  • Bump postcss from 8.5.6 to 8.5.12 in /site (#229)

[0.30.1] - 2026-04-22

Documentation

  • Updated README with missing CLI commands — Added documentation for recently-added CLI commands that were missing from the README (#220)

[0.30.0] - 2026-04-22

Added

  • `waza quality` command — LLM-as-Judge skill quality scoring that evaluates skill output quality using a configurable judge model (#218)
  • Scope-reduction advisory checkwaza check now includes an advisory that flags skills with overly broad scope, helping authors tighten skill definitions (#219)

[0.29.0] - 2026-04-22

Added

  • `--keep-workspace` flag — Preserve the temporary workspace after task execution for debugging agent output (#123, #217)
  • `--no-skills` flag and `disabled_skills` config — Disable specific skills during evaluation to isolate behavior (#126, #216)
  • Non-blocking version update check — CLI now checks for newer waza versions in the background without slowing startup (#104, #214)
  • Per-task `skill_directories` — Specify different skill directories for individual tasks in eval YAML (#156, #215)

Dependencies

  • Bump astro and @astrojs/starlight in /site (#212)

[0.28.0] - 2026-04-21

Added

  • Follow-up prompts in eval YAML — Tasks can now include pre-written follow-up prompts for multi-turn evaluation conversations (#189, #209)
  • `waza models` command — List all available models supported by the configured engine (#208)
  • Early termination for trigger tests — Trigger tests can now stop early once the target skill is invoked, reducing evaluation time (#207)

Fixed

  • Stricter YAML validation — Audited all YAML parsers; unknown fields in TestCase definitions are now properly rejected (#132, #206)
  • Test fixture assertion syntax — Fixed invalid Python expression in a test fixture assertion (#197)
  • CI integration test stability — CI integration tests now correctly handle expected eval failures when using the mock executor (#210)

Documentation

  • Added Quick Start guide to the documentation site (#205)

##…

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Routine version update of an extension