# Evaluating Digital Workers: Scorecards, Rubrics, and Dashboards

Agentic Studio includes a built-in evaluation framework that scores every digital worker interaction after it closes. This framework lets you define what good looks like, measure whether the digital worker is meeting that standard, and identify where instructions, skills, or tools need adjustment.&#x20;

The evaluation approach mirrors Auto-QA: structured scorecards and rubrics produce consistent, auditable scores that roll up into dashboards for trend monitoring.&#x20;

### **Scorecards**&#x20;

A scorecard defines what gets measured and how each dimension is weighted. It is the top-level structure that ties your evaluation criteria together.&#x20;

A typical scorecard includes:&#x20;

* A defined purpose, for example Support Quality, Dispatch Triage Quality, or Sales Qualification.&#x20;
* Score categories (high-level measurement buckets) such as Policy and Process Adherence, Accuracy and Resolution Quality, Communication Quality, or Safety and Compliance.&#x20;
* Weights for each category, for example 40% accuracy, 30% process adherence, 20% communication, 10% compliance.&#x20;
* One or more rubric criteria attached to each category.&#x20;

Once configured, a scorecard is published to one or more digital workers in a specific environment. Start with UAT before publishing to Production.&#x20;

### **Rubrics**&#x20;

Rubrics define the specific criteria used to score an interaction. Each rubric item is a clear, observable check or question that a reviewer, human or automated, can answer consistently.&#x20;

Examples of rubric criteria:&#x20;

* Did the digital worker follow the required verification steps?&#x20;
* Was the final answer accurate and complete?&#x20;
* Did the digital worker use the correct tool or configuration?&#x20;
* Was the tone professional and helpful?&#x20;

For each criterion, you set a scoring scale (for example, Pass/Fail, 1 to 5, or 0 to 2) and add guidance for reviewers: what good looks like, common failure modes, and what evidence to look for in the conversation transcript.&#x20;

Rubric versions should be maintained over time so that historical interactions remain comparable and auditable even as criteria evolve.&#x20;

### **Interaction-Level Results**&#x20;

After a conversation closes, it is scored automatically. The rubric evaluation is displayed directly on the conversation view.&#x20;

* Open any agent interaction to see criterion-level scores, category rollups, and the overall score.&#x20;
* Drill into failures to understand which rubric criteria drove the score down and what evidence in the conversation supports the rating.&#x20;
* Use the interaction view for spot-checking, incident review, and debugging specific changes to instructions, skills, or tools.&#x20;

### **Evaluation Dashboards**&#x20;

Dashboards aggregate scores across interactions so you can monitor quality at scale and detect regressions over time.&#x20;

Dashboard capabilities include:&#x20;

* Trend tracking: overall score trends by day or week.&#x20;
* Slicing: filter by digital worker, workspace, environment, channel, score category, or individual rubric criterion.&#x20;
* Distribution monitoring: percentage of interactions below a threshold, pass/fail rates, and category breakdowns.&#x20;
* Release validation: compare scores before and after a new version is promoted to Production.&#x20;

Use dashboards not just to track averages, but to understand distributions. A high average score that hides a tail of very low-scoring interactions is a sign of a systematic failure mode, not overall good performance.&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://kb.theloops.io/agenticstudio/resource-library/explainers/evaluating-digital-workers-scorecards-rubrics-and-dashboards.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
