Workflow Intelligence Layer

Mycroft

Create an evidence loop before AI changes hit production.

Mycroft is the evaluation and release-readiness layer for AI workflows, combining offline and model-backed evals with richer keyword, latency, and report outputs.

Discuss this product View the stack
Offline and OpenAI modesKeyword and latency gatesMarkdown and JSON reports

Status

Open-source eval harness

Domain

Retrieval, prompt, and workflow evaluation

Primary outcome

Objective pass/fail signals before a workflow ships or relaunches

Best paired with

Rapid Build or Pilot Rescue Sprint

Overview Artifacts Workflow Product intake

Evidence before confidence theatre

Mycroft exists because teams often tweak prompts, retrieval, and orchestration without any stable release gate. It gives Baker Street and client teams a small, auditable harness for proving whether a change actually improved the workflow.

Who this product is built for

The outputs teams actually use

These are the delivery artifacts the repo is designed to produce, not just the internal implementation detail.

Artifact

Eval report

A release-review artifact with case results, pass rate, and tag breakdown.

Artifact

Dataset pack

Representative cases for the workflow slice that matters commercially.

Artifact

Threshold config

Explicit rules for what must pass before a change is accepted.

How it gets used in real delivery

Each product is designed to slot into a fixed-scope Baker Street engagement rather than sit as a disconnected side project.

Step 1

Define the representative case set and the thresholds that matter.

Step 2

Run offline or model-backed evals against the dataset.

Step 3

Use the generated report to decide whether the change is fit to release.

Use it alongside the rest of the Baker Street system

Need a release gate for a live AI workflow?

Mycroft fits when a team already has a workflow in motion but needs evidence around quality, latency, and failure conditions before it expands or relaunches.