Development¶
This page describes the development workflow and the conventions used in this repository.
Workflow overview¶
Use the Makefile targets for everything. They wrap uv run ... so you do not need to activate a virtual environment manually.
Typical day-to-day:
make final
Documentation-only build:
make docs
Clean caches and build artifacts:
make clean
Remove the virtual environment (full reset):
make clean-venv
Makefile workflow¶
The Makefile is the single entry point for development tasks (env setup, quality checks, docs builds, and running experiments).
Prefer
make finalbefore pushing.Prefer
make docsto validate documentation changes.Use
make run EXP=<id>to execute an experiment and write artifacts toout/<id>/.
Dependency groups¶
This summary is included from the Makefile documentation:
uv installs dependencies from your pyproject.toml. This project uses “extras”:
default: runtime dependencies (needed to run the package)
dev: developer tools (ruff, mypy, pytest, …)
docs: documentation tools (sphinx, theme, myst, bibtex, …)
What the Makefile uses¶
Most developer commands run via:
uv run --extra dev ...
Documentation build runs via:
uv run --extra docs ...
Why docs-deps uses --all-extras¶
make docs-deps runs:
uv sync --all-extras
because the docs pipeline also runs a small helper under the dev extra before invoking Sphinx.
Virtual environment behavior¶
Where is the venv?¶
The venv is always created at:
.venv/
Minimum Python version¶
The Makefile checks PYTHON_MIN (default is 3.14). If your Python is older, make python-check fails.
Why UV_LINK_MODE=copy exists¶
On Windows, hardlinks can fail or warn on multi-drive setups (e.g. repo on D: but cache on C:).
So the Makefile defaults to:
UV_LINK_MODE=copy
You can override it, but the default is chosen to reduce Windows pain.
Common “reset” if your venv is broken¶
make clean-venv
make install-dev
Formatting (ruff)¶
Formatting is purely about how code looks, not what it does.
Targets¶
Target |
What it does |
When to use it |
|---|---|---|
|
formats selected repo paths |
normal formatting |
|
checks formatting only (no changes) |
CI-like check |
|
“broad auto-fix”: ruff fixes + formats everything |
when you want the repo cleaned up quickly |
Typical usage¶
Before committing:
make fmt
If CI says “formatting changed”:
make format
Linting (ruff)¶
Linting looks for potential bugs and bad patterns, for example:
unused imports
variables shadowing names
common correctness issues ruff knows how to detect
Targets¶
Target |
What it does |
Does it edit files? |
|---|---|---|
|
ruff check-only |
no |
|
ruff with auto-fix |
yes |
Typical usage¶
Use
make lintwhen you only want to see problems.Use
make lint-fixwhen you want ruff to fix safe issues automatically.
Typing (mypy)¶
Typing checks help catch issues like:
calling functions with wrong argument types
returning the wrong types
forgetting to handle
None
Target¶
make mypy
It runs:
mypy mathxlab tests experiments
Common tip for juniors¶
If mypy errors look scary, start with the first error in the output. Many later errors are “follow-up noise” caused by an earlier wrong type.
Tests (pytest): fast / slow / perf¶
This repo separates tests using pytest markers:
fast tests:
not slow and not perfslow tests:
slow and not perfperf tests:
perf
Markers are set in tests like:
import pytest
@pytest.mark.slow
def test_big_case():
...
Coverage and threshold¶
Tests collect coverage for the library packages (not for experiment scripts). Coverage fails the target if it drops below 80%.
Stable temp paths¶
pytest is invoked with stable temp directories:
temp_pytest_cachetemp_pytest
They are cleaned by make clean.
pytest (fast tests)¶
make pytest
What it does:
runs only fast tests (
not slow and not perf)runs with coverage
fails if coverage < 80%
This is the main “developer loop” test target.
pytest-xdist (fast tests in parallel)¶
make pytest-xdist
Runs fast tests using xdist:
-n auto --dist=load
Use this when the test suite gets bigger and you want faster local feedback.
pytest-slow (two-phase: fast then slow)¶
make pytest-slow
What it does (important detail):
deletes
.coverageruns fast tests first in best-effort mode (failures in this phase do not fail the target — this is intentional in the Makefile)
runs slow tests with:
xdist by default (
PYTEST_XDIST_SLOW=-n auto --dist=load)--cov-appendto combine coverage from both phasescoverage threshold (80%)
Warning
If you want “fast tests must pass”, run make pytest separately.
make pytest-slow is designed to always get through the slow suite even if fast tests are currently failing.
Performance tests (pytest-perf): what they are and how to use them¶
Performance tests are still pytest tests, but marked with @pytest.mark.perf.
They exist to catch accidental slowdowns (e.g. a function becomes 10× slower).
Why the Makefile forces thread counts to 1¶
Math/scientific libraries sometimes use multiple CPU threads automatically (BLAS/OpenMP/etc.). That makes timings noisy and hard to compare.
So perf targets set:
OMP_NUM_THREADS=1MKL_NUM_THREADS=1OPENBLAS_NUM_THREADS=1NUMEXPR_NUM_THREADS=1
This makes timings more reproducible across machines and runs.
pytest-perf (run perf-marked tests)¶
make pytest-perf
Runs only:
-m "perf"
and shows progress output.
Use this when:
you changed a performance-sensitive function
you want to ensure you didn’t introduce an obvious regression
pytest-perf-baseline (update accepted baseline numbers)¶
make pytest-perf-baseline
Same as pytest-perf, but also passes:
--perf-update-baseline
Use this only when:
you intentionally changed performance (e.g. algorithm changed)
and you want to accept the new timings as the baseline
Warning
Baseline updates should be done on a reasonably idle machine. If you run baseline updates while your system is busy, you may “bake in” slow/noisy numbers.
Performance suite (non-pytest): snapshots and comparisons¶
In addition to pytest-perf, the Makefile also provides a small “perf runner” pipeline:
perf (dev snapshot)¶
make perf
Runs:
python mathxlab/tools/run_perf.py --mode dev --overwrite
This is typically used during development to write/update a performance snapshot.
perf-release (release snapshot)¶
make perf-release
Runs:
python mathxlab/tools/run_perf.py --mode release --overwrite
This is usually for “release-ish” measurements (more stable, fewer surprises).
perf-compare (compare two snapshots)¶
make perf-compare A=v0.1.0 B=v0.2.0
Runs:
python mathxlab/tools/compare_perf.py --a … --b …
Use this when you want a readable report of “what got faster/slower” between two saved snapshots.
Tip
If you’re not sure what valid values for A and B are, run make perf once and look at the output written by the script.
It usually prints the snapshot identifiers/paths it created.
Run logs¶
Experiments are Python modules like:
mathxlab/experiments/e001.pymathxlab/experiments/e002.py…
Run one experiment (recommended: via Make)¶
make run EXP=e001
This:
creates
out/e001/(if missing)creates
out/e001/logs/writes a log file:
out/e001/logs/run_e001.log
runs the experiment with:
python -m mathxlab.experiments.e001 --out out/e001 -v
Forward CLI arguments¶
make run EXP=e001 ARGS="--seed 123 --n 200000"
Note
On Windows, always quote ARGS="..." if it contains spaces.
Run all experiments¶
make out
This finds all mathxlab/experiments/e???.py files and runs them sequentially.
Full reference¶
For the complete target-by-target reference and troubleshooting, see Makefile.
Formatting, linting, typing, tests¶
Formatting: Ruff formatter
Linting: Ruff
Typing: mypy
Tests: pytest
CI formatting behavior¶
In CI, formatting runs in check mode (ruff format --check). Locally it formats in place.
Algorithmic guarantees¶
Method: (name the algorithm)
Status:
DETERMINISTIC|PROBABILISTIC
If probabilistic¶
Randomness / bases: (list bases used or how randomness was sampled)
Conservative error statement:
For Miller-Rabin, a common bound is:
P(false prime) <= 4^{-k}afterkindependent random bases.If you use a fixed base set (engineering choice), state the intended input range.
Correctness cross-check¶
State how you validated correctness for a CI-safe range.
Reference: (e.g., sieve ground truth, deterministic trial division)
Checked range: (e.g.,
n <= 1_000_000)Result: mismatches =
0(or list the smallest mismatch as a witness)
Known counterexamples / failure modes (when applicable)¶
Use this section when the method is known to fail on structured inputs.
Fermat test: Carmichael numbers pass for all coprime bases (smallest:
561).Fermat base-2 pseudoprime:
341 = 11 * 31passes2^(n-1) mod n = 1but is composite.Miller-Rabin: specific bases can be fooled by strong pseudoprimes (state the bases used).
Pollard rho: may stall for unlucky seeds; retries and parameter changes are expected.
Runtime knobs (CI-safe)¶
List the knobs that keep runtime bounded in CI.
n_max: (upper bound)sample_size: (how many candidates were tested)max_rounds/max_retries: (for randomized algorithms)seed: (if randomness is involved)
Finite-range behavior (for asymptotics / explicit bounds)¶
When referencing an asymptotic statement (e.g., PNT), explicitly separate:
Theory statement: what is true as
x -> infinity.Finite range used here:
x in [A, B].Where it becomes meaningful: state a measurable criterion (e.g., relative error < 5%) and the smallest
xwhere that holds.
Documentation¶
Docs are built with Sphinx + MyST.
Build locally:
make install-docs
make docs
Deployed website:
GitHub Pages from the
docsworkflow
Contributing (high-level)¶
Create a feature branch.
Open a PR against
main.CI must pass before merge.
Keep PRs small and well-scoped.