Platform

Why Contextual AI?

Industries

Use Cases

Customer Stories

Developers

Learn

Company

Join Us

User feedback and annotation is now built into the Contextual AI Platform

November 6, 2025

Nicholas LuiProduct Manager

Ezra BergerProduct Marketing

AI teams can now audit and improve agent performance entirely within the Contextual AI platform. Collect, categorize, and visualize human feedback to resolve quality concerns around accuracy and use-case specialization —all without juggling separate spreadsheets, annotation tools, and dashboards.

Why human evaluation breaks down

After an agent is initially set up, the challenge of optimizing the AI workflow for production begins. For many of our customers, this “hill-climbing” to deliver a production-quality AI agent often takes the most effort. These processes start out being highly iterative and manual, and not having the right tooling can add cognitive load through excessive context switching. Many teams struggle with human evaluation because:

Manual reviews are time-consuming: Annotators spend hours recording user feedback and categorizing issues in spreadsheets.
Fragmented tools break workflows: Scores, outputs, and logs live in separate systems, forcing constant context-switching to piece together what went wrong.
Lack of analytics slow improvements: Without automated analytics, teams struggle to identify recurring failures, track resolution progress, or prove improvements over time.

From feedback to resolution in a few steps

The human evaluation workflow in the platform includes tools to:

Collect: Configure the feedback UI that enables users to grade agent outputs and retrievals
Categorize: Create custom labels and use them to annotate and organize user feedback
Investigate: Diagnose the root cause of quality issues by inspecting both agent outputs and the underlying retrieved documents
Track: Filter feedback, view automated trend visualizations and update issue resolution statuses

“Contextual’s new feedback collection and annotation workflow brings everything into one place—letting us annotate queries, track resolution, and visualize results seamlessly. It’s improved our team’s productivity and reduced the number of tools we need to manage QA.”
— Natasha Gangal - AI Strategy Lead, ShipBob

From user feedback to error resolution

Kick off user evaluation by setting up the feedback workflow in the agent configuration settings. After submitting a query, users are prompted to rate the agent’s response and select optional issue labels. You can customize these labels and make feedback required.

Collecting user feedback

Once you've configured the feedback UI, users can start using intuitive 👍 / 👎 / 🚩 buttons to grade the quality of responses. Negative grades automatically trigger a pop-up for detailed feedback.

Exploring user feedback

Graded or flagged outputs are immediately populated inside the human evaluation dashboard. You can easily access the dashboard through the agent menu.

Categorizing user feedback

After reviewing initial feedback, developers can set up annotation workflows by creating custom category fields in the agent configuration interface. By default you can create customized categories for queries, errors, response quality, and resolution status.

For query categorization, you can use our intelligent prediction system to automatically classify queries into the most relevant category based on your defined labels and descriptions.

If you don't like the default category names, you can change them to whatever makes sense for your team.

Organizing reviewed and flagged outputs is seamless. Developers can annotate each piece of feedback with the previously defined labels.

Tracking and resolving quality issues

Manage annotated outputs with granular filtering and status tracking tools. Granular filters can surface outputs that need attention by evaluation criteria, time period, or resolution status.

Get instant root cause analysis by clicking any agent citation to see the prompt, output and source document chunks.

Automatically generated charts and graphs provide instant visibility of any annotation categories trends within a selected time period.

Export data as a CSV anytime to share with other teams or integrate with external systems.

Start improving and try it out

Human evaluation is critical for agent improvement but shouldn't require manual work and constant tool-switching. This new workflow unifies fragmented processes in one platform, delivering granular customization, automation, and intuitive annotation.

Read the documentation and get started!

Blog

Platform

Contextual AI Platform

RAG Component APIs

Why Contextual AI?

Research

Industries

Financial Services

Engineering & Manufacturing

Legal & Professional Services

Use Cases

Use Cases

Root Cause Analysis

Customer Stories

Qualcomm

ShipBob

Developers

Docs

Learn

Blog

Research

Company

About Us

Trust & Security

Join Us

Careers

User feedback and annotation is now built into the Contextual AI Platform

Why human evaluation breaks down

From feedback to resolution in a few steps

From user feedback to error resolution

Collecting user feedback

Exploring user feedback

Categorizing user feedback

Tracking and resolving quality issues

Start improving and try it out

Related Articles

Making Search Agents Faster and Smarter

Introducing Agent Composer—AI for when it IS rocket science

What is Context Engineering?