Development

This page describes the development workflow and the conventions used in this repository.

Workflow overview

Use the Makefile targets for everything. They wrap uv run ... so you do not need to activate a virtual environment manually.

Typical day-to-day:

make final

Documentation-only build:

make docs

Clean caches and build artifacts:

make clean

Remove the virtual environment (full reset):

make clean-venv

Makefile workflow

The Makefile is the single entry point for development tasks (env setup, quality checks, docs builds, and running experiments).

  • Prefer make final before pushing.

  • Prefer make docs to validate documentation changes.

  • Use make run EXP=<id> to execute an experiment and write artifacts to out/<id>/.

Dependency groups

This summary is included from the Makefile documentation:

uv installs dependencies from your pyproject.toml. This project uses “extras”:

  • default: runtime dependencies (needed to run the package)

  • dev: developer tools (ruff, mypy, pytest, …)

  • docs: documentation tools (sphinx, theme, myst, bibtex, …)

What the Makefile uses

  • Most developer commands run via:

    uv run --extra dev ...
    
  • Documentation build runs via:

    uv run --extra docs ...
    

Why docs-deps uses --all-extras

make docs-deps runs:

uv sync --all-extras

because the docs pipeline also runs a small helper under the dev extra before invoking Sphinx.


Virtual environment behavior

Where is the venv?

The venv is always created at:

  • .venv/

Minimum Python version

The Makefile checks PYTHON_MIN (default is 3.14). If your Python is older, make python-check fails.

Common “reset” if your venv is broken

make clean-venv
make install-dev

Formatting (ruff)

Formatting is purely about how code looks, not what it does.

Targets

Target

What it does

When to use it

make format

formats selected repo paths

normal formatting

make format-check

checks formatting only (no changes)

CI-like check

make fmt

“broad auto-fix”: ruff fixes + formats everything

when you want the repo cleaned up quickly

Typical usage

Before committing:

make fmt

If CI says “formatting changed”:

make format

Linting (ruff)

Linting looks for potential bugs and bad patterns, for example:

  • unused imports

  • variables shadowing names

  • common correctness issues ruff knows how to detect

Targets

Target

What it does

Does it edit files?

make lint

ruff check-only

no

make lint-fix

ruff with auto-fix

yes

Typical usage

  • Use make lint when you only want to see problems.

  • Use make lint-fix when you want ruff to fix safe issues automatically.


Typing (mypy)

Typing checks help catch issues like:

  • calling functions with wrong argument types

  • returning the wrong types

  • forgetting to handle None

Target

make mypy

It runs:

  • mypy mathxlab tests experiments

Common tip for juniors

If mypy errors look scary, start with the first error in the output. Many later errors are “follow-up noise” caused by an earlier wrong type.


Tests (pytest): fast / slow / perf

This repo separates tests using pytest markers:

  • fast tests: not slow and not perf

  • slow tests: slow and not perf

  • perf tests: perf

Markers are set in tests like:

import pytest

@pytest.mark.slow
def test_big_case():
    ...

Coverage and threshold

Tests collect coverage for the library packages (not for experiment scripts). Coverage fails the target if it drops below 80%.

Stable temp paths

pytest is invoked with stable temp directories:

  • temp_pytest_cache

  • temp_pytest

They are cleaned by make clean.


pytest (fast tests)

make pytest

What it does:

  • runs only fast tests (not slow and not perf)

  • runs with coverage

  • fails if coverage < 80%

This is the main “developer loop” test target.


pytest-xdist (fast tests in parallel)

make pytest-xdist

Runs fast tests using xdist:

  • -n auto --dist=load

Use this when the test suite gets bigger and you want faster local feedback.


pytest-slow (two-phase: fast then slow)

make pytest-slow

What it does (important detail):

  1. deletes .coverage

  2. runs fast tests first in best-effort mode (failures in this phase do not fail the target — this is intentional in the Makefile)

  3. runs slow tests with:

    • xdist by default (PYTEST_XDIST_SLOW=-n auto --dist=load)

    • --cov-append to combine coverage from both phases

    • coverage threshold (80%)

Warning

If you want “fast tests must pass”, run make pytest separately. make pytest-slow is designed to always get through the slow suite even if fast tests are currently failing.


Performance tests (pytest-perf): what they are and how to use them

Performance tests are still pytest tests, but marked with @pytest.mark.perf. They exist to catch accidental slowdowns (e.g. a function becomes 10× slower).

Why the Makefile forces thread counts to 1

Math/scientific libraries sometimes use multiple CPU threads automatically (BLAS/OpenMP/etc.). That makes timings noisy and hard to compare.

So perf targets set:

  • OMP_NUM_THREADS=1

  • MKL_NUM_THREADS=1

  • OPENBLAS_NUM_THREADS=1

  • NUMEXPR_NUM_THREADS=1

This makes timings more reproducible across machines and runs.


pytest-perf (run perf-marked tests)

make pytest-perf

Runs only:

  • -m "perf"

and shows progress output.

Use this when:

  • you changed a performance-sensitive function

  • you want to ensure you didn’t introduce an obvious regression

pytest-perf-baseline (update accepted baseline numbers)

make pytest-perf-baseline

Same as pytest-perf, but also passes:

  • --perf-update-baseline

Use this only when:

  • you intentionally changed performance (e.g. algorithm changed)

  • and you want to accept the new timings as the baseline

Warning

Baseline updates should be done on a reasonably idle machine. If you run baseline updates while your system is busy, you may “bake in” slow/noisy numbers.


Performance suite (non-pytest): snapshots and comparisons

In addition to pytest-perf, the Makefile also provides a small “perf runner” pipeline:

perf (dev snapshot)

make perf

Runs:

  • python mathxlab/tools/run_perf.py --mode dev --overwrite

This is typically used during development to write/update a performance snapshot.

perf-release (release snapshot)

make perf-release

Runs:

  • python mathxlab/tools/run_perf.py --mode release --overwrite

This is usually for “release-ish” measurements (more stable, fewer surprises).

perf-compare (compare two snapshots)

make perf-compare A=v0.1.0 B=v0.2.0

Runs:

  • python mathxlab/tools/compare_perf.py --a --b

Use this when you want a readable report of “what got faster/slower” between two saved snapshots.

Tip

If you’re not sure what valid values for A and B are, run make perf once and look at the output written by the script. It usually prints the snapshot identifiers/paths it created.


Run logs

Experiments are Python modules like:

  • mathxlab/experiments/e001.py

  • mathxlab/experiments/e002.py

Forward CLI arguments

make run EXP=e001 ARGS="--seed 123 --n 200000"

Note

On Windows, always quote ARGS="..." if it contains spaces.

Run all experiments

make out

This finds all mathxlab/experiments/e???.py files and runs them sequentially.


Full reference

For the complete target-by-target reference and troubleshooting, see Makefile.

Formatting, linting, typing, tests

  • Formatting: Ruff formatter

  • Linting: Ruff

  • Typing: mypy

  • Tests: pytest

CI formatting behavior

In CI, formatting runs in check mode (ruff format --check). Locally it formats in place.

Experiment authoring guidelines

When adding a new experiment:

  1. Add a new module under mathxlab/experiments/, e.g. e002_...py.

  2. Prefer deterministic outputs:

    • --seed argument if randomness is involved

    • write results to a single --out directory

  3. Keep the experiment runnable as a module:

    • python -m mathxlab.experiments.e002

  4. Update the docs:

    • add a short entry to Experiments Gallery

    • optionally add a dedicated page under docs/experiments/ later

Report contract for algorithmic experiments (Phase 2 and later)

For experiments involving algorithms (primality tests, factorization, explicit bounds), the out/e###/report.md file is part of the experiment’s scientific contract.

It must state (when applicable):

  • Deterministic vs probabilistic. Always label this explicitly.

  • Probability of error for probabilistic methods (bases / repetitions).

  • Correctness cross-checks against a trusted reference for CI-safe ranges.

  • Known counterexamples / failure modes (e.g., Carmichael numbers for Fermat).

  • Finite-range behavior vs asymptotics (do not oversell small N).

Use this drop-in template for report sections (copy/paste into report.md or generate it in code):

Algorithmic guarantees

  • Method: (name the algorithm)

  • Status: DETERMINISTIC | PROBABILISTIC

If probabilistic

  • Randomness / bases: (list bases used or how randomness was sampled)

  • Conservative error statement:

    • For Miller-Rabin, a common bound is: P(false prime) <= 4^{-k} after k independent random bases.

    • If you use a fixed base set (engineering choice), state the intended input range.

Correctness cross-check

State how you validated correctness for a CI-safe range.

  • Reference: (e.g., sieve ground truth, deterministic trial division)

  • Checked range: (e.g., n <= 1_000_000)

  • Result: mismatches = 0 (or list the smallest mismatch as a witness)

Known counterexamples / failure modes (when applicable)

Use this section when the method is known to fail on structured inputs.

  • Fermat test: Carmichael numbers pass for all coprime bases (smallest: 561).

  • Fermat base-2 pseudoprime: 341 = 11 * 31 passes 2^(n-1) mod n = 1 but is composite.

  • Miller-Rabin: specific bases can be fooled by strong pseudoprimes (state the bases used).

  • Pollard rho: may stall for unlucky seeds; retries and parameter changes are expected.

Runtime knobs (CI-safe)

List the knobs that keep runtime bounded in CI.

  • n_max: (upper bound)

  • sample_size: (how many candidates were tested)

  • max_rounds / max_retries: (for randomized algorithms)

  • seed: (if randomness is involved)

Finite-range behavior (for asymptotics / explicit bounds)

When referencing an asymptotic statement (e.g., PNT), explicitly separate:

  • Theory statement: what is true as x -> infinity.

  • Finite range used here: x in [A, B].

  • Where it becomes meaningful: state a measurable criterion (e.g., relative error < 5%) and the smallest x where that holds.

Documentation

Docs are built with Sphinx + MyST.

Build locally:

make install-docs
make docs

Deployed website:

  • GitHub Pages from the docs workflow

Contributing (high-level)

  • Create a feature branch.

  • Open a PR against main.

  • CI must pass before merge.

  • Keep PRs small and well-scoped.