Build the evaluation harness, scorecards, and policy-aware audit plus redaction flow

## Problem
Operators need a usable trust layer that combines transcript evaluation, readable scorecards, and safe audit exports with redaction control.

## Scope
- Track evaluation harness inputs, transcript scoring, scorecard history, audit exports, and redaction policy behavior
- Cover how quality and trust state appears per agent profile or session
- Keep sensitive data handling explicit

## Out of scope
- Low-level telemetry emission that belongs in earlier observability issues
- Toolchain installation and provider health flows

## Implementation notes
- Make scorecards and audit exports actionable to operators
- Align redaction with provider secrets, prompt data, tool arguments, and results
- Keep the issue compatible with the official evaluation packages

## Definition of Done
- The issue defines the minimum trust surfaces: harness, scorecards, audit, and redaction
- Later implementation can proceed without re-deciding how evaluations surface in the product

## Verification
- Review the issue against the feature spec, evaluation-package issue, and approval flows

## Dependencies
- Parent epic: #20
- Related epic: #16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build the evaluation harness, scorecards, and policy-aware audit plus redaction flow #67

Problem

Scope

Out of scope

Implementation notes

Definition of Done

Verification

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Build the evaluation harness, scorecards, and policy-aware audit plus redaction flow #67

Description

Problem

Scope

Out of scope

Implementation notes

Definition of Done

Verification

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions