Agent Eval Lab | Mukunda Rao Katta

Public Research Artifacts

Scenario design, expected behavior, failure modes, and readiness scorecards.

Zenodo DOI

Mixed-check regression testing for LLM and agent workflows.

Zenodo DOI

Small-rule checks for prompt injection and vector poisoning risks.

Figshare DOI

Area	Question	Signal
Task fit	Does the agent know the boundary of the task?	Clear goal, inputs, and stop condition
Tool use	Are tool calls traceable and justified?	Observable calls and recoverable failures
Reliability	Can the workflow be repeated?	Fixtures, expected behavior, and regression checks
Readiness	Is it safe to widen access?	Known risks, review notes, and rollout decision