Eval report
A release-review artifact with case results, pass rate, and tag breakdown.
Workflow Intelligence Layer
Create an evidence loop before AI changes hit production.
Mycroft is the evaluation and release-readiness layer for AI workflows, combining offline and model-backed evals with richer keyword, latency, and report outputs.
Status
Open-source eval harness
Domain
Retrieval, prompt, and workflow evaluation
Primary outcome
Objective pass/fail signals before a workflow ships or relaunches
Best paired with
Rapid Build or Pilot Rescue Sprint
Overview
Mycroft exists because teams often tweak prompts, retrieval, and orchestration without any stable release gate. It gives Baker Street and client teams a small, auditable harness for proving whether a change actually improved the workflow.
Best Fit
Artifacts
These are the delivery artifacts the repo is designed to produce, not just the internal implementation detail.
A release-review artifact with case results, pass rate, and tag breakdown.
Representative cases for the workflow slice that matters commercially.
Explicit rules for what must pass before a change is accepted.
Workflow
Each product is designed to slot into a fixed-scope Baker Street engagement rather than sit as a disconnected side project.
Define the representative case set and the thresholds that matter.
Run offline or model-backed evals against the dataset.
Use the generated report to decide whether the change is fit to release.
Next Move
Move from product detail into the related package, workflow, or delivery stack page.
Move from product detail into the related package, workflow, or delivery stack page.
Move from product detail into the related package, workflow, or delivery stack page.
Product Intake
Mycroft fits when a team already has a workflow in motion but needs evidence around quality, latency, and failure conditions before it expands or relaunches.