Setting Up a Python Project Skeleton That Scales

Most Python projects start with mkdir myproject && touch myproject/__init__.py. Six months later you’re untangling circular imports, debating whether to add a utils/ directory, and wondering why mypy is angry about a transitive dependency.

The membox project — a local knowledge graph and RAG memory layer for coding agents — had a hard constraint from day one: the core must run on the Python standard library plus SQLite. No network calls, no LLM APIs, no heavy frameworks. Here is how we set up the foundation to honor that constraint while keeping the developer experience productive.

Three Dependencies, Not Thirty

The runtime dependency list in pyproject.toml is short:

# pyproject.toml — [project]
dependencies = [
    "pydantic>=2",
    "typer>=0.15",
    "rich>=13",
]

[project.optional-dependencies]
llm  = ["openai>=1"]
ast  = ["tree-sitter>=0.24", "tree-sitter-python>=0.23"]
dev  = ["pytest>=8", "pytest-cov>=6", "ruff>=0.11",
        "pre-commit>=4", "mypy>=1.15"]

pydantic validates data at the boundary. When an LLM returns malformed JSON, a ValidationError catches it before it poisons the knowledge graph. The ExtractedGraph model is the single source of truth for what “valid extraction output” means.

typer turns type annotations into a CLI. A parameter annotated str becomes a positional argument; bool becomes a --flag. Help text renders via rich automatically. No argparse boilerplate; the function signature is the CLI spec.

rich renders terminal output — tables, progress bars, colored text. It is a typer dependency anyway, but we use it directly for confirmation messages and structured entity listings.

Everything else is opt-in. OpenAI goes behind the llm extra; tree-sitter behind ast. A coding agent that embeds membox can use the core without paying the LLM dependency cost.

The Tooling Trifecta

Tool	Replaces	Role
ruff	black + flake8 + isort	Lint and format; 14 rule sets configured in `pyproject.toml`
mypy —strict	—	Enforces type contracts at every Protocol boundary
pre-commit	—	Runs ruff + mypy + file checks on every commit

One pattern worth calling out: ruff is a single Rust binary that replaced three Python tools. It runs in under 100 ms on this codebase — fast enough that the linter never feels like a bottleneck.

Module Boundaries

The src/ layout prevents accidental imports from the working directory. Without it, import membox might resolve to ./membox/ instead of the installed package — a class of bugs that is invisible until you try to package for distribution.

Inside src/membox/, modules split by responsibility:

src/membox/
├── cli/                # typer commands, one file per verb
│   └── commands/       # ingest, query, list-entities, version
├── config.py           # MemboxConfig (pydantic models for provider settings)
├── core/               # domain logic, no network imports
│   ├── agent.py        # MemoryAgent orchestration
│   ├── normalize.py    # name/predicate normalization
│   └── store/          # SQLite operations, split by concern
├── model/              # pydantic schemas (Entity, Relation, ExtractedGraph…)
├── providers/          # wire-level Protocol adapters (ChatClient, EmbedClient)
└── services/           # domain-level Protocols + implementations
    ├── embedding.py    # Embedder Protocol, DummyEmbedder, OpenAIEmbedder
    ├── extraction.py   # LLMExtractor Protocol, DummyExtractor, OpenAIExtractor
    └── prompts/        # LLM prompt templates

The critical boundary: core/ never imports from providers/ or services/ at runtime. services/ composes providers/ adapters to satisfy domain Protocols. providers/ speaks HTTP and nothing else. This means adding a new LLM provider is one adapter file and zero changes to the rest of the codebase.

Protocols Before Implementations

Every external dependency hides behind a typing.Protocol. There are two layers.

Wire level (providers/base.py) — the raw HTTP contract:

class ChatClient(Protocol):
    def complete(
        self,
        system: str,
        user: str,
        json_schema: type[BaseModel] | None = None,
    ) -> str: ...

class EmbedClient(Protocol):
    def embed(self, texts: list[str]) -> list[list[float]]: ...

Domain level (services/extraction.py) — what the agent actually calls:

class LLMExtractor(Protocol):
    def extract(self, text: str) -> ExtractedGraph: ...
    def extract_query_entities(self, query: str) -> list[str]: ...

The MemoryAgent depends on LLMExtractor and Embedder — never on openai.OpenAI or any concrete class. When OPENAI_API_KEY is unset, the factory returns a no-op backend and the entire system still boots:

# src/membox/cli/_common.py
def make_agent(db: str, no_llm: bool = False, warn: bool = False) -> MemoryAgent:
    extractor, embedder = create_default_extractor(use_llm=not no_llm)
    if warn and isinstance(extractor, DummyExtractor):
        typer.echo("No OPENAI_API_KEY — using no-op extractor", err=True)
    return MemoryAgent(extractor=extractor, embedder=embedder, db_path=db)

This graceful degradation is not a workaround — it is a feature. Tests run without network access. CI runs without secrets. The knowledge graph works offline.

When Is the Skeleton “Done”?

Phase 1 has a concrete exit criterion: tests/test_skeleton.py passes. These tests verify the skeleton itself — import chains, CLI commands, Protocol stubs — not the implementations behind them:

# tests/test_skeleton.py (excerpt)

def test_all_public_imports() -> None:
    from membox import (
        DummyEmbedder, DummyExtractor, Embedder, Entity,
        ExtractedGraph, KnowledgeStore, MemoryAgent, ...
    )

def test_cli_help_shows_all_commands() -> None:
    from membox.cli import app
    runner = CliRunner()
    result = runner.invoke(app, ["--help"])
    for cmd in ["ingest", "ingest-file", "query",
                "list-entities", "list-relations", "version"]:
        assert cmd in result.output

If any of these fail — a missing export, a renamed command, a broken Protocol — the skeleton is incomplete. This contract lets later phases proceed without wondering “did I break the CLI?”

What’s Next

With the skeleton validated, the next step is filling in core/store/: the SQLite schema, the INSERT OR IGNORE deduplication, the BFS query engine. Because the Protocols are already in place, the store can be developed and tested in complete isolation from the LLM layer.