ideas.
May 13, 2026 3 min read cliteamautomation

Dependency license auditor for commercial projects

A CLI tool that scans package manifests across npm and PyPI, resolves each dependency's SPDX license, and flags violations against a policy file — so teams shipping commercial software catch license drift before it ships.

The idea

A command-line tool that reads your project's dependency manifests — package.json, requirements.txt, Cargo.toml, go.mod — and resolves the SPDX license for every direct and transitive dependency. It compares each license against a configurable policy file and exits non-zero with a violation report. Designed to drop into CI so license drift is caught before code ships, not during a legal audit six months later.

Why build this

Companies shipping commercial software need to know whether their dependency tree contains copyleft licenses like GPL-3.0 that require them to open-source their own code. Most teams discover this late — during an acquisition due-diligence review, or when a lawyer asks. Existing tools like FOSSA and Snyk solve this but are SaaS products with per-seat pricing that price out small teams. The raw data is freely available: npm, PyPI, and crates.io all expose license metadata via public APIs. An open-source CLI that runs in CI without sending your dependency list to a third party fills a real gap.

Stack sketch

  • Language: Python — tomllib, requirements-parser, and json are all in stdlib, making multi-ecosystem parsing straightforward
  • License resolution: npm registry JSON API (registry.npmjs.org/<pkg>/<version>), PyPI JSON API (pypi.org/pypi/<pkg>/<version>/json), crates.io REST API for Rust
  • SPDX parsing: spdx-tools Python library for parsing and comparing SPDX expressions, including compound expressions like MIT OR Apache-2.0
  • Policy file: a single YAML file listing either allowed licenses (allowlist mode) or denied licenses (denylist mode)
  • Output: human-readable table, JSON, and JUnit XML for CI integration (GitHub Actions, GitLab CI, Jenkins)
  • Caching: local SQLite cache keyed by package + version, invalidated after 7 days, with a --cache-dir flag for a shared CI cache volume

Scope for v1

  • Scan package.json (npm) and requirements.txt (PyPI) — two ecosystems covers the majority of mixed projects
  • Resolve direct dependencies only; log a warning that transitive resolution is skipped, with a --transitive flag as a follow-up stub
  • YAML policy with an allowlist of SPDX identifiers (MIT, Apache-2.0, BSD-2-Clause, etc.)
  • Three distinct exit codes: 0 for pass, 1 for policy violations, 2 for resolution errors — so CI can distinguish "you have a GPL dep" from "the registry was unreachable"
  • A --unknown-licenses flag accepting warn, fail, or ignore to handle packages with no declared license
  • No Cargo or Go support in v1

Where it could go

Add Cargo and Go module support to cover Rust and Go shops — both ecosystems have well-documented public APIs and consistent license metadata. A --suggest flag could propose MIT-licensed alternatives for violating packages where a known drop-in exists, turning the tool from a linter into an active dependency advisor.

A lightweight self-hosted dashboard that ingests the JSON report over time would let legal and engineering share a single compliance view without both needing CLI access. At that point the project crosses from a developer utility into a fundable niche: open-core compliance tooling for teams that can't afford enterprise SaaS but have real legal obligations.

Watch out for

Many packages declare no license field at all, or use a freeform string like "UNLICENSED" or "SEE LICENSE IN LICENSE.md" that doesn't map to an SPDX identifier — handle this from day one with the --unknown-licenses flag, or teams will see noisy failures on their first run and disable the check entirely.