Agentic Studio: Release notes
Highlights
What shipped: MetaPrompter, ADLC enhancements (build flow + template import/export), marketplace/templates, and evaluation + regression testing
Why it matters: Improves predictability and testability of Digital Workers, accelerates iteration on instructions and configurations, and strengthens confidence for production rollouts
Who benefits: Builders and operators of Digital Workers who need consistent outputs, faster iteration, and higher-confidence releases
New features
Usability
Agent Development Lifecycle (ADLC)
This release delivers new and enhanced capabilities for building and managing Digital Worker (DW) templates, deployment, testing, and versioning. The Agentic Studio now provides:
Enhanced Digital worker build flow
Guided build flow for Digital Workers: Create → Instructions → Tools → Skills → Triggers → Settings
Instructions as the core control plane: define persona/role, context, guardrails, tool usage, response structure, and multi-step reasoning
Tool management is simplified, with connector-based standard tools and custom tool creation as needed
Skills as reusable reasoning patterns (prompt, reasoning, and domain skills) to standardize high-quality outputs
Import/Export of Digital Workers
Export as Template workflow with guided checklist (e.g., overview → integrations → branding & metadata → review & export) to make installations predictable and secure
Export summary provides an at-a-glance inventory of what’s included (e.g., number of skills, tools, and triggers) and allows editing the system description shown to installers
Clear scoping during export: exporting affects the template package only (does not modify the live system)
Marketplace import includes an install readiness checklist and surfaces required integrations/connectors that must be configured before the agent can be installed
Marketplace & Templates
New marketplace experience to discover industrial AI agents and deploy enterprise-grade solutions
Browse and filter by categories (e.g., Quality Control, Predictive Maintenance, Automation, Supply Chain, Safety & Compliance, Process Optimization, Operations, Logistics)
Search across agents, categories, and features; preview and view agent details before deployment
Agent cards highlight key signals such as ratings, last-updated date, usage/downloads, and badges (e.g., Trending, Popular, New, Stable)
MetaPrompter
Scope (Q1 2026): available for editing existing Agents only
Interactive prompt designer that generates structured instructions
Enforces state-based output formats for consistent, predictable behavior
Supports iterative instruction updates to accommodate customer-specific rule changes without code
Memory Management
Long-term memory per entity (Memory maps)
Dual-path memory initialization: combine continuous entity-specific learning with optional user-defined Rule Books for bootstrapping
Captures learnings from HITL feedback to evolve instructions and long-term memory over time
Designed to improve resilience to variable inputs (e.g., changing document formats and incomplete data)
Monitoring
Dashboards
Digital Worker Activity overview with trend charts and KPI tiles (e.g., total events, HITL conversations, total tests)
Current Activity snapshot (active Digital Workers, active conversations)
Flow monitoring with completion visibility (Digital Worker flows and close rate)
Leaderboards for operational triage (Top Digital Workers; HITL conversations by agent)
Governance/quality surfaces (Eval failures) and cost visibility (token usage by agent)
Evals
Dataset-based eval runs with scorecards across key dimensions (Accuracy, Relevance, Helpfulness), plus operational metrics (tests run, success rate, failures, average latency)
Per-test-case drill-down: view the agent plan, conversation transcript, and detailed rubric scoring with reasoning to support faster debugging and iteration
Run history tracking to compare results over time and spot regressions
Regression testing
Organize tests into datasets with individual test cases and pass/fail status tracking
Run history captures when an eval was executed and by whom, making it easier to compare runs and detect regressions
Performance signals included alongside quality (e.g., per-test latency and average runtime) to catch speed/timeout failures early
Additional notes
Last updated
Was this helpful?