ReleaseMicrosoftMicrosoftpublished Apr 22, 2026seen 4d

microsoft/waza azd-ext-microsoft-azd-waza_0.30.1

microsoft/waza

Open original ↗

Captured source

source ↗
published Apr 22, 2026seen 4dcaptured 9hhttp 200method plain

Waza azd Extension v0.30.1

Repository: microsoft/waza

Tag: azd-ext-microsoft-azd-waza_0.30.1

Published: 2026-04-22T20:59:45Z

Prerelease: no

Release notes:

Changelog

All notable changes to waza will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[0.24.0] - 2026-03-25

Changed

  • Strict YAML validation — All YAML parsers now use KnownFields(true) to reject unknown fields, catching typos and misconfigurations early (#132, #133)
  • `max_workers` renamed to `workers` — Config YAML key renamed for consistency across all config types (breaking change)
  • Unified token countingwaza check and waza tokens count now share the same counting logic for consistent results (#146)

Fixed

  • Typo in prompt grader — Fixed "prmopt" → "prompt" in error message

Dependencies

  • Bump h3 from 1.15.8 to 1.15.9 in /site (#155)
  • Bump github.com/buger/jsonparser from 1.1.1 to 1.1.2 (#149)

[0.21.0] - 2026-03-12

Added

  • `waza new task from-prompt` command — Record Copilot sessions into task YAML files for eval creation (#110)
  • Trigger heuristic grader — New grader type that scores based on trigger/anti-trigger matching heuristics (#90)
  • Eval scaffolding commandwaza eval new generates eval.yaml scaffolding for skills (#94)
  • Multi-trial flakiness detection — Detect flaky evals across multiple trial runs (#103)
  • Snapshot auto-update workflow — Diff grader can now auto-update snapshot files on mismatch (#95)
  • Per-file token budget configuration — Configure token budgets per-file in .waza.yaml (#96)
  • Skill-aware thresholdswaza tokens compare supports skill-specific threshold configuration (#93)
  • Sensei scoring parity — WHEN triggers, spec-security, invalid level, and advisory checks 16-18 (#79)
  • CI/CD integration guide — GitHub Actions and Azure DevOps integration documentation (#100)
  • FileWriter service — Refactored waza init inventory with FileWriter abstraction (#63)

Fixed

  • `waza suggest` deadlockExecute() now applies the request timeout before calling Start(), preventing goroutine deadlock (#43)
  • `ResourceFile.Content` type — Changed from string to []byte for proper binary file handling (#117)
  • `tokens compare` in subdirectory — No longer shows all files as "added" when run from a subdirectory (#105)
  • `--output-dir` ignored — Fixed --output-dir having no effect for single-skill runs (#109)
  • Web dashboard build order — Build dashboard assets before Go compilation (#107)
  • Test file leak — Fixed test that leaked files into the repo (#120)
  • Config schema defaults — Aligned config.schema.json defaults with Go source of truth (#65)
  • Skill discovery path — Discover skills under .github/skills/ directory (#69)

Changed

  • Renamed config node max_workers to workers for consistency across all config types
  • This is a breaking change
  • Custom YAML deserializers for config types (#106)
  • Validate only known fields in YAML decoders. (#132)
  • Token limits priority inverted to .waza.yaml first (#64)
  • @wbreza added to CODEOWNERS (#111)
  • Go 1.26+ noted in agent instruction files (#108)

[0.9.0] - 2026-02-23

Added

  • A/B baseline testing--baseline flag runs each task with and without skill, computes weighted improvement scores across quality, tokens, turns, time, and task completion (#307)
  • Pairwise LLM judgingpairwise mode on prompt grader with position-swap bias mitigation. Three modes: pairwise, independent, both. Magnitude scoring from much-better to much-worse (#310)
  • Tool constraint grader — New tool_constraint grader type with expect_tools, reject_tools, max_turns, max_tokens constraints. Validates agent tool usage behavior (#391)
  • Auto skill discovery--discover flag walks directory trees for SKILL.md + eval.yaml pairs. --strict mode fails if any skill lacks eval coverage (#392)
  • Releases page — New docs site page at reference/releases with platform download links, install commands, and azd extension info (#383)

Fixed

  • Lint warnings — Resolved errcheck (webserver) and ineffassign (utils) lint warnings

Changed

  • Competitive research — Added OpenAI Evals analysis (docs/research/waza-vs-openai-evals.md), skill-validator analysis (docs/research/waza-vs-skill-validator.md), and eval registry design doc (docs/research/waza-eval-registry-design.md)
  • Mermaid diagrams — Converted remaining ASCII diagrams to Mermaid across all markdown files. Added Mermaid directive to AGENTS.md

[0.8.0] - 2026-02-21

Added

  • MCP Serverwaza serve now includes an always-on MCP server with 10 tools (eval.list, eval.get, eval.validate, eval.run, task.list, run.status, run.cancel, results.summary, results.runs, skill.check) via stdio transport (#286)
  • `waza suggest` command — LLM-powered eval suggestions: reads SKILL.md, proposes test cases, graders, and fixtures. Flags: --model, --dry-run, --apply, --output-dir, --format (#287)
  • Interactive workflow skillskills/waza-interactive/SKILL.md with 5 workflow scenarios for conversational eval orchestration (#288)
  • Grader weightingweight field on grader configs, ComputeWeightedRunScore method, dashboard weighted scores column (#299)
  • Statistical confidence intervals — Bootstrap CI with 10K resamples, 95% confidence, normalized gain. Dashboard CI bands and significance badges (#308)
  • Judge model support--judge-model flag and judge_model config for separate LLM-as-judge model (#309)
  • Spec compliance checks — 8 agentskills.io compliance checks in waza check and waza dev (#314)
  • SkillsBench advisory — 5 advisory checks (module-count, complexity, negative-delta, procedural, over-specificity) (#315)
  • MCP integration scoring — 4 MCP integration checks in waza dev (#316)
  • Batch skill processingwaza dev processes multiple skills in one run (#317)
  • Token compare --strict — Budget enforcement mode for waza tokens compare (#318)
  • Scaffold trigger tests — Auto-generate trigger test YAML from SKILL.md frontmatter (#319)
  • Skill profilewaza tokens profile for static analysis of skill token distribution (#311)
  • JUnit XML reporter —…

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Minor tool update, low traction