## The matrix stage of integer factorization sieves.

Modern general-purpose methods of integer factorization consist of two major stages: Sieving, and linear algebra. In the sieving stage, a large set of "relations" are produced of the form "product of primes = something squared modulo N" (the number field sieve uses prime ideals in a number field, but the general idea is the same). In the linear algebra stage, a combination of relations is found which, when multiplied together, produces an equation of the form "something squared = something else squared modulo N" -- in other words, all of the primes on the left side of the equation are raised to even powers -- and from this it is easy to find factors of N. The linear algebra stage is so named because finding the correct combination of relations to multiply together is equivalent to solving a large sparse system of linear equations modulo 2.
Out of these two stages, the linear algebra is generally considered
to be the harder. Whereas the sieving can be distributed across a large
number of machines, current methods for solving the sparse linear system
(for example, block Lanczos) require tightly-coupled processors:
The time taken to solve a matrix with *n* rows and weight *w*
on a network with interprocessor latency *L* is *O(n w + n L)*,
and with the matrices often having tens of millions of rows, even a
network latency of a millisecond will add hours to the computation.
Clearly, if the size of the matrix can be reduced via some preprocessing,
this will make finding the solution much faster -- as long as the matrix
weight doesn't increase too much.

The oldest method for solving systems of linear equations is Gaussian elimination. Find a non-zero element of the matrix, and add multiples of its row to other rows in order to eliminate all the other non-zero entries in the same column. Unfortunately, this naive method results in the weight of the matrix increasing rapidly -- so-called "catastrophic fill-in". A somewhat more intelligent approach called "Structured Gaussian elimination" is also available; in this method, the matrix is split into two parts, labeled as the "dense" and "sparse" parts, and the pivot used is chosen in order to minimize the increase in the weight of the sparse part. This approach also ends up causing rapid matrix fill-in, but at least makes some useful progress first: Depending upon the source of the original matrix, it can often reduce the matrix size by a factor of 10 or more before it reaches the point where continuing would be disadvantageous due to increasing matrix weight. As a result, structured Gaussian elimination is useful as a method of matrix preprocessing, but not as a method of matrix solution.

Now for the interesting question: Why are we trying to *eliminate*
columns at each stage? What if, instead, we view the problem as one of
lattice reduction, and add rows to each other not with a view to reducing
the number of active columns directly, but instead with the intention of
decreasing the matrix weight?

If we consider the rows of the matrix as zero-one vectors, then any time
the dot product of two rows is more than half of the number of nonzero entries
in the lighter row, we can add the lighter row to the heavier row and
reduce the total matrix weight. Instead of taking a row and adding it to
*all* the other rows with nonzero entries in a particular column,
just add it to a few other rows -- the ones whereby you can decrease the
matrix weight. Of course, this relies upon being able to find rows which
are not too close to orthogonal, but there is a good reason to think that
they might exist: If two medium-sized primes *p* and *q* appear
in a relation which is used to eliminate a large prime in several other
relations, then unless either of those primes also appeared in the other
relations, all of the new relations will contain both of those primes.
As a result, groups of primes tend to "stick together", which increases
the largest dot products of pairs of rows.

Can this sort of approach provide a significant benefit over existing methods? It's too early to be certain, but on some "toy problems" -- matrices arising from applying the quadratic sieve to 18-digit numbers using a factor base of a few thousand primes -- I found that structured Gaussian elimination ended up with catastrophic fill-in (not that filling in a matrix of a few hundred rows is particularly catastrophic), while this lattice reduction approach steadily reduced the matrix weight.