# Factorization pipelines (trial division + Pollard rho)

Phase 2 includes experiments that do not only test primality but also attempt to **factor** integers.
In practice you rarely use a single algorithm; you build a pipeline:

1. Cheap deterministic filters (small primes, gcd checks).
2. A fast primality test for the remaining cofactor (often Miller-Rabin).
3. A randomized factor finder for hard cases (Pollard rho), with retries.

This page describes a practical, experiment-friendly pipeline and how to report it.

## What is being computed

Given an integer $n \ge 2$, factorization aims to write

$$n = p_1^{e_1} p_2^{e_2} \cdots p_k^{e_k}$$

with distinct primes $p_i$ and exponents $e_i \ge 1$.

## Deterministic front-end: trial division

Trial division tries to find small prime factors by dividing by primes $p \le B$.

- It is **deterministic**.
- It is extremely effective as a first stage.
- It gives you natural runtime knobs (`B`, number of primes).

Implementation notes:

- Precompute primes up to $B$ with a sieve.
- Divide repeatedly to capture multiplicities.
- After removing small primes, the remaining cofactor can be 1, prime, or composite.

## Probabilistic core: Pollard rho

Pollard's rho is a randomized method that often finds a nontrivial factor quickly.
It is the standard next step after trial division for medium-size integers.
See {cite}`pollard1975montecarlofactorization` and the practical improvements in {cite}`brent1980improvedmontecarlofactorization`.

### Idea (high level)

Work modulo $n$ and iterate a polynomial map, commonly

$$f(x) = x^2 + c \pmod n.$$

The sequence $x_{i+1} = f(x_i)$ eventually cycles modulo a hidden prime divisor $p$ of $n$.
Using gcd computations on differences, you can often extract a factor:

$$d = \gcd(|x_i - x_j|, n).$$

If $1 < d < n$ you found a nontrivial factor.

### Why it is probabilistic

- The runtime depends on the chosen function parameter $c$, the start value, and luck.
- For some choices, the method can stall; you must retry with different parameters.

Because of this, Pollard rho is **probabilistic / heuristic**. You must report it that way.

## A practical pipeline

A good experiment pipeline is:

1. **Handle trivial cases:** $n < 2$, even numbers, perfect powers if you want.
2. **Trial division:** divide by primes up to $B$.
3. **Primality check on the cofactor:** if the remaining cofactor is 1 or prime, stop.
4. **Pollard rho for the remaining composite cofactor:** find a factor $d$.
5. **Recurse:** factor $d$ and $n/d$ using the same pipeline.
6. **Normalize:** sort factors, combine multiplicities.
7. **Verify:** multiply the result back and check you recover the original $n$.

The key property for correctness is the final verification step: it turns implementation bugs into test failures.

## Reporting checklist

In your report, always include:

- **Deterministic vs probabilistic:**
  - Trial division: deterministic.
  - Pollard rho: probabilistic / heuristic.
- **Runtime knobs:**
  - trial division bound $B$,
  - max rho iterations per attempt,
  - number of rho restarts,
  - random seed (if used).
- **Correctness cross-check:**
  - for small $n$ compare against a trusted reference (full trial division or a library factorization),
  - verify the product of returned factors equals the original input.

## Common pitfalls

- **Repeated factors:** always divide repeatedly (e.g. $n=12$ has factor 2 twice).
- **Prime cofactors:** do not feed a prime into rho without checking; you can loop unnecessarily.
- **gcd edge cases:** rho can return $d = n$ (failure); treat as "retry".
- **Non-reproducible results:** randomized algorithms should record a seed or a full parameter list.

## How this connects to experiments

- Semiprimes and factorization difficulty: {doc}`semiprimes`
- Primality tests used inside pipelines: {doc}`primality-testing`
- Modular arithmetic building blocks: {doc}`divisibility-and-modular-arithmetic`

## References

See {doc}`../references`.