Is it worth adopting? A 20-minute rubric for any open-source AI tool

Every week it shows up in someone's channel: a link, and the question "should we use this?" On the other end is a new open-source AI tool — a serving runtime, an eval harness, a vector store, yet another agent framework — and the honest default answer is a skim of the README, a glance at the star count, and a gut call. Gut calls don't scale. Stars measure hype, not whether the project will still be standing when you get paged at 2am because of it.

There's a better move, and it isn't a week-long bake-off. It's one repeatable, time-boxed pass you can run on anything before it earns a place in your stack — the same instinct behind everything here: keep the good stuff, skip the noise. Six axes, a scorecard, and a go/no-go checklist, finished in about twenty minutes. Here it is, as a template you run on your own shortlist.

Why a rubric beats a vibe

A dependency is a long-term liability that arrives disguised as a quick win. The README shows you the project on its best day; you inherit it on its worst — the breaking change with no migration guide, the maintainer who went quiet, the license clause legal finds the week before launch. The cost of adopting the wrong one isn't the afternoon you spent integrating it. It's the quarter you'll spend ripping it out, plus the incident that forced the issue.

So the check is worth doing. The reason it gets skipped is that "evaluate it properly" sounds like a week of work, so it loses to "looks fine, ship it." The fix is to make the check small. Time-box it to twenty minutes — short enough that it happens every time, structured enough that it actually is. You're not auditing the codebase. You're sampling six signals that predict regret.

The six axes

Six questions. Score each from 0 to 2 — red, yellow, green — against the signals below.

Axis	The question it answers
License	Can I legally use, modify, and ship this?
Maintenance velocity	Is it alive, or quietly coasting?
Bus factor	What happens if the maintainer walks away?
Escape hatch	How expensive is it to rip out?
Docs honesty	Do the docs match reality?
Prod fit	Will it survive my scale, latency, and ops?

Axis 1 · License

What to inspect: the LICENSE file — at the root and any nested ones, because monorepos and open-core projects love a stricter license under /enterprise. Read the SPDX or OSI identifier, not the marketing word "open." Source-available is not open source. The question that matters: can you use, modify, and ship this commercially the way you actually intend to?

Green (2) — a single, clear, OSI-approved permissive or weak-copyleft license (MIT, Apache-2.0, BSD-3-Clause, MPL-2.0), with a patent grant if it's Apache.
Yellow (1) — strong copyleft (GPL/AGPL) that's fine internally but needs legal eyes before you distribute, or a dual-license whose open tier has real limits.
Red (0) — source-available licenses (BSL, SSPL), no LICENSE file at all, or a custom license no lawyer has seen. Treat missing or bespoke as red, not yellow.

Axis 2 · Maintenance velocity

What to inspect: commit cadence, release rhythm, and how fast issues and PRs get a human response. You're answering one thing: is this alive, or quietly coasting toward abandonment?

Green (2) — commits within the last few weeks, regular tagged releases, maintainers replying to issues in days.
Yellow (1) — activity exists but it's lumpy; releases are sporadic; issues get answered eventually.
Red (0) — last release over a year ago, a wall of stale issues, PRs rotting unreviewed. A popular-but-dead project is still dead.

Axis 3 · Bus factor

What to inspect: contributor concentration. Open the contributors or insights view and look at who actually merges code. The bus factor is how many people would have to get hit by a bus before the project stalls — and for a frightening number of beloved tools, it's one.

Green (2) — multiple active maintainers, or a foundation/company with real governance behind it (a GOVERNANCE.md, a security policy).
Yellow (1) — one dominant maintainer, but a genuine contributor community and some sign of succession.
Red (0) — a single hero account doing nearly all the work, no governance, no plan for the day they burn out.

Axis 4 · Escape hatch

What to inspect: the cost of ripping it out. Does it sit behind a standard interface — an OpenAI-compatible endpoint, the OpenTelemetry protocol, plain SQL — or its own bespoke API? Can you get your data back out? Is the license and codebase such that you could fork and self-host if upstream goes sideways? This is exactly the "thin glue, swap any piece" instinct behind the open-source AI stack we'd actually build on.

Green (2) — standard, portable interfaces, clean data export, a forkable license. Leaving is a config change.
Yellow (1) — some lock-in, but a documented migration path and exportable state.
Red (0) — proprietary formats, no export, deep API entanglement. Adopting it is a one-way door.

Axis 5 · Docs honesty

What to inspect: whether the docs describe the software that actually exists. Run the quickstart on a clean machine. Are limitations and failure modes named, or is it all happy path and aspirational roadmap?

Green (2) — quickstart works first try on a fresh environment, limitations stated plainly, docs versioned to releases.
Yellow (1) — mostly right but stale in spots; you fill the gaps from issues or source.
Red (0) — quickstart fails, docs describe a version that doesn't exist, or "coming soon" is doing the heavy lifting.

Axis 6 · Prod fit

What to inspect: your reality — scale, latency, security, ops — not the demo's. Is there a real deployment story (config, upgrades, resource needs, a security contact) and a way to see inside it in production (logs, metrics, health checks)?

Green (2) — documented production deployments, observability hooks, sane defaults, a stated security policy.
Yellow (1) — runs in prod with effort; you'll wire up your own monitoring and harden the defaults.
Red (0) — demo-grade only, no ops story, no way to observe it when it misbehaves.

The scorecard

Score each axis against the signals above and write the number down — the act of committing to a number is what turns a vibe into a decision you can defend in a doc.

Axis	Score (0–2)
License	___
Maintenance velocity	___
Bus factor	___
Escape hatch	___
Docs honesty	___
Prod fit	___
Total	___ / 12

How to read the total: 10–12 is a strong yes; 7–9 is a yes with eyes open — you have named gaps to mitigate; below 7 is a no for anything you'd genuinely depend on. But the total is a sorting tool, not the verdict. The checklist below is the gate.

How to weight it for your context: decide which axes are veto axes before you score, not after. A CLI you'll run in CI and could swap in an afternoon? The license is the only thing that can truly sink it. A vector store that will hold production data for the next three years? Escape hatch and prod fit become non-negotiable, and a 0 on either ends the conversation regardless of the total. Weighting isn't multiplying scores — it's deciding in advance which zeros you refuse to live with.

The 12-point go/no-go checklist

The scorecard ranks; the checklist vetoes. Twelve yes/no questions, two per axis. The rule is blunt on purpose: any hard "no" is a stop, not a deduction. You don't average a "no" away — a single one ends it, because these are the things you can't fix after you've adopted the tool.

License — Is there a LICENSE file with a clear OSI-approved or SPDX-identified license?
License — Does it permit commercial use, modification, and distribution the way you intend to ship?
Velocity — Has there been meaningful activity — a release or real commits — in the last six months?
Velocity — Do recent issues and PRs get a response from a maintainer?
Bus factor — Would the project survive its top contributor walking away tomorrow (multiple maintainers or governance backing)?
Bus factor — Is there a stated process for reporting bugs and security issues?
Escape hatch — Could you replace it within a sprint — standard interfaces, exportable data, no hard lock-in?
Escape hatch — Could you fork and self-host it if upstream stalled?
Docs — Did the quickstart actually work on a clean machine?
Docs — Do the docs name real limitations, not just the happy path?
Prod fit — Is there a real deployment and operations story, not just a demo?
Prod fit — Can you observe it in production when it misbehaves?

Twelve yeses and you have a real candidate. One honest no — especially on license or escape hatch — and you keep looking, no matter how good the demo felt.

Run it in 20 minutes

The whole point is a check fast enough that you actually do it. Order of operations, worst-case first so you fail cheap:

Minutes 0–3 · LICENSE. Open the LICENSE file (and any nested ones), confirm the SPDX identifier, check it against how you'll ship. If the license doesn't fit, stop here — you just saved yourself seventeen minutes.
Minutes 3–7 · Releases. Skim the releases, tags, and changelog. Recency and rhythm tell you whether it's alive. Real, noted releases beat a pile of untagged commits.
Minutes 7–12 · Issues and PRs. Sort by recently updated. Are maintainers replying? How old is the oldest unaddressed serious bug? Glance at the contributor graph for bus factor.
Minutes 12–20 · A short spike. Clone it into a clean container or fresh virtualenv, run the quickstart, then do the one thing you'd actually use it for. Docs honesty and prod fit reveal themselves in eight minutes of hands-on far more honestly than any README. Scope the spike tight — the same discipline that makes handing work to coding agents work: shrink it until success is checkable.

Twenty minutes, six scores, twelve answers, and a decision you can put in writing.

The cache

A few things worth keeping — examples, not endorsements, the way we keep everything else here:

OpenSSF Scorecard automates a chunk of the maintenance-velocity and bus-factor axes: it runs health and security checks against a repo and hands you a number. Useful for triaging a long shortlist before you spend human minutes on it.
The SPDX license list and OSI — plus choosealicense.com for plain-English summaries — are ground truth for Axis 1. When in doubt about a license, trust the identifier, not the marketing.
CHAOSS has spent years defining community-health metrics: the established vocabulary for "velocity" and "contributor concentration" if you want to make those axes rigorous rather than eyeballed.

Use them to go faster, not to outsource the judgment. The tools you're evaluating will change — half the names on your current shortlist will be gone in two years. The rubric won't. A license you can ship, a project that's alive, a maintainer base that isn't one tired person, a door you can walk back out of, docs that tell the truth, and an ops story that survives contact with production — those six questions are what separate a dependency you keep from one you come to regret.

Run the pass. Write the numbers down. Then decide.