Where we work
- AI workflow systems
- Legacy service upgrades
- Operator and executive dashboards
Delivery Stack
We help teams modernize workflow products, legacy services, and internal dashboards with a simpler delivery stack that makes metrics, integrations, and tool access easier to manage.
Overview
Main Products
These are the named Baker Street products that package the method itself: case shaping, repo intelligence, evals, rescue, and adversarial hardening. The partner tools below support delivery, but this is the core product set clients actually buy into.
Open-source hub
The operating layer for fast software investigation.
221B gives Baker Street investigations a shared operating system: repo intake, evidence grading, reusable templates, and MCP access to the underlying knowledge base.
Primary outcome: Faster orientation in unfamiliar systems with clearer evidence and handoff
Open-source investigation toolkit
Turn an ambiguous AI opportunity into a scoped next move.
Adler packages the Baker Street investigation sprint into reusable templates, example outputs, and executive-ready decision support for teams that need clarity before build work starts.
Primary outcome: A ranked recommendation and implementation brief in days, not weeks
Open-source eval harness
Create an evidence loop before AI changes hit production.
Mycroft is the evaluation and release-readiness layer for AI workflows, combining offline and model-backed evals with richer keyword, latency, and report outputs.
Primary outcome: Objective pass/fail signals before a workflow ships or relaunches
Open-source rescue playbook
Recover the AI pilot before it turns into an account problem.
Watson is a practical rescue system for stalled pilots, combining triage, stabilization, relaunch gates, KPI recovery tracking, and production handoff assets.
Primary outcome: A credible decision to relaunch, rescope, or stop with evidence
Open-source red-team starter kit
Pressure-test the workflow before the workflow embarrasses you.
Moriarty is the red-team and failure-seeking layer in the Sherlock product set, built for prompt injection, policy evasion, tool misuse, and remediation verification work.
Primary outcome: A clearer view of exploit paths, severity, and whether fixes actually worked
Core Stack
These are the foundations we reach for early because they make the Sherlock products easier to instrument, integrate, and govern in real delivery.
Our default for analytics and product telemetry, so ignored workflow metrics become visible early.
Our standard way to modernize legacy services and make them usable by new products, agents, and internal tools.
Our preferred pattern for giving AI systems and teams structured access to core business tools.
Delivery Tools
Once the core stack is clear, these tools help us tighten testing, Python setup, dashboard quality, and product pricing without adding unnecessary complexity.
Speeds up CI runs so builds, tests, and container changes do not slow delivery down.
Improves Python setup, linting, and environment consistency across development and CI.
Powers data-heavy front-end products such as review queues, reporting views, and operator dashboards.
Useful when a workflow becomes a product and needs pricing, subscriptions, or usage-based billing.
Next Step
Bring the workflow, the tooling constraint, and the outcome you need to prove. We can shape the first sprint around analytics, access, and one production-grade slice of the system.