microsoft/waza azd-ext-microsoft-azd-waza_0.30.1
microsoft/waza
Captured source
source ↗published Apr 22, 2026seen 4dcaptured 9hhttp 200method plain
Waza azd Extension v0.30.1
Repository: microsoft/waza
Tag: azd-ext-microsoft-azd-waza_0.30.1
Published: 2026-04-22T20:59:45Z
Prerelease: no
Release notes:
Changelog
All notable changes to waza will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
[0.24.0] - 2026-03-25
Changed
- Strict YAML validation — All YAML parsers now use
KnownFields(true)to reject unknown fields, catching typos and misconfigurations early (#132, #133) - `max_workers` renamed to `workers` — Config YAML key renamed for consistency across all config types (breaking change)
- Unified token counting —
waza checkandwaza tokens countnow share the same counting logic for consistent results (#146)
Fixed
- Typo in prompt grader — Fixed "prmopt" → "prompt" in error message
Dependencies
- Bump h3 from 1.15.8 to 1.15.9 in /site (#155)
- Bump github.com/buger/jsonparser from 1.1.1 to 1.1.2 (#149)
[0.21.0] - 2026-03-12
Added
- `waza new task from-prompt` command — Record Copilot sessions into task YAML files for eval creation (#110)
- Trigger heuristic grader — New grader type that scores based on trigger/anti-trigger matching heuristics (#90)
- Eval scaffolding command —
waza eval newgenerates eval.yaml scaffolding for skills (#94) - Multi-trial flakiness detection — Detect flaky evals across multiple trial runs (#103)
- Snapshot auto-update workflow — Diff grader can now auto-update snapshot files on mismatch (#95)
- Per-file token budget configuration — Configure token budgets per-file in
.waza.yaml(#96) - Skill-aware thresholds —
waza tokens comparesupports skill-specific threshold configuration (#93) - Sensei scoring parity — WHEN triggers, spec-security, invalid level, and advisory checks 16-18 (#79)
- CI/CD integration guide — GitHub Actions and Azure DevOps integration documentation (#100)
- FileWriter service — Refactored
waza initinventory with FileWriter abstraction (#63)
Fixed
- `waza suggest` deadlock —
Execute()now applies the request timeout before callingStart(), preventing goroutine deadlock (#43) - `ResourceFile.Content` type — Changed from
stringto[]bytefor proper binary file handling (#117) - `tokens compare` in subdirectory — No longer shows all files as "added" when run from a subdirectory (#105)
- `--output-dir` ignored — Fixed
--output-dirhaving no effect for single-skill runs (#109) - Web dashboard build order — Build dashboard assets before Go compilation (#107)
- Test file leak — Fixed test that leaked files into the repo (#120)
- Config schema defaults — Aligned
config.schema.jsondefaults with Go source of truth (#65) - Skill discovery path — Discover skills under
.github/skills/directory (#69)
Changed
- Renamed
confignodemax_workerstoworkersfor consistency across all config types - This is a breaking change
- Custom YAML deserializers for config types (#106)
- Validate only known fields in YAML decoders. (#132)
- Token limits priority inverted to
.waza.yamlfirst (#64) @wbrezaadded to CODEOWNERS (#111)- Go 1.26+ noted in agent instruction files (#108)
[0.9.0] - 2026-02-23
Added
- A/B baseline testing —
--baselineflag runs each task with and without skill, computes weighted improvement scores across quality, tokens, turns, time, and task completion (#307) - Pairwise LLM judging —
pairwisemode onpromptgrader with position-swap bias mitigation. Three modes: pairwise, independent, both. Magnitude scoring from much-better to much-worse (#310) - Tool constraint grader — New
tool_constraintgrader type withexpect_tools,reject_tools,max_turns,max_tokensconstraints. Validates agent tool usage behavior (#391) - Auto skill discovery —
--discoverflag walks directory trees for SKILL.md + eval.yaml pairs.--strictmode fails if any skill lacks eval coverage (#392) - Releases page — New docs site page at
reference/releaseswith platform download links, install commands, and azd extension info (#383)
Fixed
- Lint warnings — Resolved errcheck (webserver) and ineffassign (utils) lint warnings
Changed
- Competitive research — Added OpenAI Evals analysis (
docs/research/waza-vs-openai-evals.md), skill-validator analysis (docs/research/waza-vs-skill-validator.md), and eval registry design doc (docs/research/waza-eval-registry-design.md) - Mermaid diagrams — Converted remaining ASCII diagrams to Mermaid across all markdown files. Added Mermaid directive to AGENTS.md
[0.8.0] - 2026-02-21
Added
- MCP Server —
waza servenow includes an always-on MCP server with 10 tools (eval.list, eval.get, eval.validate, eval.run, task.list, run.status, run.cancel, results.summary, results.runs, skill.check) via stdio transport (#286) - `waza suggest` command — LLM-powered eval suggestions: reads SKILL.md, proposes test cases, graders, and fixtures. Flags:
--model,--dry-run,--apply,--output-dir,--format(#287) - Interactive workflow skill —
skills/waza-interactive/SKILL.mdwith 5 workflow scenarios for conversational eval orchestration (#288) - Grader weighting —
weightfield on grader configs,ComputeWeightedRunScoremethod, dashboard weighted scores column (#299) - Statistical confidence intervals — Bootstrap CI with 10K resamples, 95% confidence, normalized gain. Dashboard CI bands and significance badges (#308)
- Judge model support —
--judge-modelflag andjudge_modelconfig for separate LLM-as-judge model (#309) - Spec compliance checks — 8 agentskills.io compliance checks in
waza checkandwaza dev(#314) - SkillsBench advisory — 5 advisory checks (module-count, complexity, negative-delta, procedural, over-specificity) (#315)
- MCP integration scoring — 4 MCP integration checks in
waza dev(#316) - Batch skill processing —
waza devprocesses multiple skills in one run (#317) - Token compare --strict — Budget enforcement mode for
waza tokens compare(#318) - Scaffold trigger tests — Auto-generate trigger test YAML from SKILL.md frontmatter (#319)
- Skill profile —
waza tokens profilefor static analysis of skill token distribution (#311) - JUnit XML reporter —…
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Minor tool update, low traction