Agent Studio: Release notes

Highlights

  • What shipped: MetaPrompter, ADLC enhancements (build flow + template import/export), marketplace/templates, and evaluation + regression testing

  • Why it matters: Improves predictability and testability of Digital Workers, accelerates iteration on instructions and configurations, and strengthens confidence for production rollouts

  • Who benefits: Builders and operators of Digital Workers who need consistent outputs, faster iteration, and higher-confidence releases

New features

Usability

Agent Development Lifecycle (ADLC)

This release delivers new and enhanced capabilities for building and managing Digital Worker (DW) templates, deployment, testing, and versioning. The Agent Studio now provides:

  • Enhanced Digital Worker build flow

    • Guided build flow for Digital Workers: Create → Instructions → Tools → Skills → Triggers → Settings

    • Instructions as the core control plane: define persona/role, context, guardrails, tool usage, response structure, and multi-step reasoning

    • Tool management is simplified, with connector-based standard tools and custom tool creation as needed

    • Skills as reusable reasoning patterns (prompt, reasoning, and domain skills) to standardize high-quality outputs

  • Import/Export of Digital Workers

    • Export as Template workflow with guided checklist (e.g., overview → integrations → branding & metadata → review & export) to make installations predictable and secure

    • Export summary provides an at-a-glance inventory of what’s included (e.g., number of skills, tools, and triggers) and allows editing the system description shown to installers

    • Clear scoping during export: exporting affects the template package only (does not modify the live system)

    • Marketplace import includes an install readiness checklist and surfaces required integrations/connectors that must be configured before the agent can be installed

Marketplace & Templates
  • New marketplace experience to discover industrial AI agents and deploy enterprise-grade solutions

  • Browse and filter by categories (e.g., Quality Control, Predictive Maintenance, Automation, Supply Chain, Safety & Compliance, Process Optimization, Operations, Logistics)

  • Search across agents, categories, and features; preview and view agent details before deployment

  • Agent cards highlight key signals such as ratings, last-updated date, usage/downloads, and badges (e.g., Trending, Popular, New, Stable)

MetaPrompter
  • Scope (Q1 2026): available for editing existing Agents only

  • Interactive prompt designer that generates structured instructions

  • Enforces state-based output formats for consistent, predictable behavior

  • Supports iterative instruction updates to accommodate customer-specific rule changes without code

Memory Management

Long-term memory per entity (Memory maps)
  • Dual-path memory initialization: combine continuous entity-specific learning with optional user-defined Rule Books for bootstrapping

  • Captures learnings from HITL feedback to evolve instructions and long-term memory over time

  • Designed to improve resilience to variable inputs (e.g., changing document formats and incomplete data)

Monitoring

Dashboards
  • Digital Worker Activity overview with trend charts and KPI tiles (e.g., total events, HITL conversations, total tests)

  • Current Activity snapshot (active Digital Workers, active conversations)

  • Flow monitoring with completion visibility (Digital Worker flows and close rate)

  • Leaderboards for operational triage (Top Digital Workers; HITL conversations by agent)

  • Governance/quality surfaces (Eval failures) and cost visibility (token usage by agent)

Evals
  • Dataset-based eval runs with scorecards across key dimensions (Accuracy, Relevance, Helpfulness), plus operational metrics (tests run, success rate, failures, average latency)

  • Per-test-case drill-down: view the agent plan, conversation transcript, and detailed rubric scoring with reasoning to support faster debugging and iteration

  • Run history tracking to compare results over time and spot regressions

Regression testing
  • Organize tests into datasets with individual test cases and pass/fail status tracking

  • Run history captures when an eval was executed and by whom, making it easier to compare runs and detect regressions

  • Performance signals included alongside quality (e.g., per-test latency and average runtime) to catch speed/timeout failures early

Additional notes

Improvements
  • Deterministic intent based agent output

  • Predictable responses for machine interaction (schema/state-based output normalization)

  • MS Teams Adaptive cards support

Security & compliance
  • None

Last updated

Was this helpful?