Baker Street | Delivery Stack

Overview

What this page should tell you quickly

Where we work

AI workflow systems
Legacy service upgrades
Operator and executive dashboards

What we set up first

Clear telemetry and core metrics
Service contracts for legacy systems
Safe access into business tools

What clients get

Better executive visibility
Faster delivery decisions
A cleaner path to production

Main Products

The Sherlock product set is the main product layer

These are the named Baker Street products that package the method itself: case shaping, repo intelligence, evals, rescue, and adversarial hardening. The partner tools below support delivery, but this is the core product set clients actually buy into.

Repo Intelligence Hub

221B

Open-source hub

The operating layer for fast software investigation.

221B gives Baker Street investigations a shared operating system: repo intake, evidence grading, reusable templates, and MCP access to the underlying knowledge base.

Primary outcome: Faster orientation in unfamiliar systems with clearer evidence and handoff

MCP-enabled knowledge baseRepo registry and intake workflowReport and decision templates

MCP-enabled knowledge base
Repo registry and intake workflow
Report and decision templates

Strategic Insight Layer

Adler

Open-source investigation toolkit

Turn an ambiguous AI opportunity into a scoped next move.

Adler packages the Baker Street investigation sprint into reusable templates, example outputs, and executive-ready decision support for teams that need clarity before build work starts.

Primary outcome: A ranked recommendation and implementation brief in days, not weeks

Hypothesis backlogExecutive readout templatesWorked customer-support case

Hypothesis backlog
Executive readout templates
Worked customer-support case

Workflow Intelligence Layer

Mycroft

Open-source eval harness

Create an evidence loop before AI changes hit production.

Mycroft is the evaluation and release-readiness layer for AI workflows, combining offline and model-backed evals with richer keyword, latency, and report outputs.

Primary outcome: Objective pass/fail signals before a workflow ships or relaunches

Offline and OpenAI modesKeyword and latency gatesMarkdown and JSON reports

Offline and OpenAI modes
Keyword and latency gates
Markdown and JSON reports

Pilot Doctor

Watson

Open-source rescue playbook

Recover the AI pilot before it turns into an account problem.

Watson is a practical rescue system for stalled pilots, combining triage, stabilization, relaunch gates, KPI recovery tracking, and production handoff assets.

Primary outcome: A credible decision to relaunch, rescope, or stop with evidence

Failure-mode triageKPI recovery scorecardProduction handoff templates

Failure-mode triage
KPI recovery scorecard
Production handoff templates

Adversarial Testing Layer

Moriarty

Open-source red-team starter kit

Pressure-test the workflow before the workflow embarrasses you.

Moriarty is the red-team and failure-seeking layer in the Sherlock product set, built for prompt injection, policy evasion, tool misuse, and remediation verification work.

Primary outcome: A clearer view of exploit paths, severity, and whether fixes actually worked

Scenario packsRules-of-engagement checklistsWorked support-assistant red-team example

Scenario packs
Rules-of-engagement checklists
Worked support-assistant red-team example

Core Stack

The infrastructure and access layers that support the products

These are the foundations we reach for early because they make the Sherlock products easier to instrument, integrate, and govern in real delivery.

Analytics

PostHog

Our default for analytics and product telemetry, so ignored workflow metrics become visible early.

Track cycle time, delay, and drop-off
See how operators actually use the workflow
Roll out new features with better evidence

Service contracts

OpenAPI

Our standard way to modernize legacy services and make them usable by new products, agents, and internal tools.

Document older systems properly
Create stable contracts for integration work
Reduce friction between teams and tools

Tool access

MCP

Our preferred pattern for giving AI systems and teams structured access to core business tools.

Connect workflows to real systems safely
Set clearer boundaries for agents and reviewers
Make executive and operator access easier to govern

Delivery Tools

The supporting tools we use to move faster

Once the core stack is clear, these tools help us tighten testing, Python setup, dashboard quality, and product pricing without adding unnecessary complexity.

Testing

Blacksmith

Speeds up CI runs so builds, tests, and container changes do not slow delivery down.

Faster GitHub Actions runs
Less time lost to queues and caches
Quicker feedback during active sprints

Python tooling

Astral

Improves Python setup, linting, and environment consistency across development and CI.

Cleaner Python environment setup
Faster local development loops
More reliable backend and eval workflows

Front-end dashboards

TanStack

Powers data-heavy front-end products such as review queues, reporting views, and operator dashboards.

Stronger query and caching patterns
Better tables and operational data views
A more robust base for internal tools

Pricing

Polar

Useful when a workflow becomes a product and needs pricing, subscriptions, or usage-based billing.

Supports packaged pricing models
Helps with usage-based billing setup
Makes product monetization less bespoke

Next Step

Need to modernize an operational workflow or legacy service?

Bring the workflow, the tooling constraint, and the outcome you need to prove. We can shape the first sprint around analytics, access, and one production-grade slice of the system.

Start a case triage See partner model