MAJOR credit to NachiketUN on github for serving as the inspiration for the program itself. I couldn't have programmed it without his detailed, streamlined implementation of a similar Quadratic Sieve Algorithm.
RSA has been one of the most prominent asymmetric-key (meaning one key is public and one is private) encryption algorithms since its inception in 1977 by MIT colleagues Ron Rivest, Adi Shamir, and Leonard Adleman. As we'll see, RSA is predicated on the difficulty of factoring very large semi-prime numbers. Semi-prime numbers are defined as numbers with only 4 factors: 1, itself, and two primes p and q. Looking at the semi-prime \( N = 12193263122374638001 \) (that's 20 digits!), it's tough to 'see' that its two prime factors are \( p = 1234567891 \) and \( q = 9876543211 \). A computer could exhaustively check divisibility of \( N \) by every integer from \( 2 \) to \( \sqrt{12193263122374638001} = 3.49 \) billion to find \( p \) and \( q \), but there ought to be a better way...
... And there is. Its called the Quadratic Sieve (QS) Algorithm, and it's currently the second most efficient way to factor large semi-primes (behind the General Number Field Sieve, which is far more complex). Of course, once quantum computers advance far enough, Shor's Algorithm will blow it out of the water, but for now, QS remains a useful algorithm for factoring, hence my decision to code it in python to break the RSA encryption scheme.
NOTE: RSA uses semi-primes upwards of 300 decimal digits, my QS Algorithm can only factor semi-primes of no more than 25 decimal digits in reasonable time. If I had a supercomputer, I could've managed the upper limit of QS of about 100 decimal digits, but no one is breaking \( real \) RSA encryption as it's used on the internet.
Sit back, relax, and enjoy some math and computer science!
p = 1234567891
q = 9876543211
N = p*q
def Totient(p, q):
return (p-1)*(q-1)
def mod_inverse(a, mod):
return pow(a, -1, mod)
def KeyGeneration(p, q):
phi = Totient(p, q)
e = 65537 #select a public key 1 < e < phi such that gcd(e, phi) = 1
d = mod_inverse(e, phi) #private key is the modular inverse of e
return e, d
def encrypt(m, e, N):
return pow(m, e, N)
def decrypt(c, d, N):
return pow(c, d, N)
def main():
print("Welcome to the RSA Encryption Program!")
print()
message = int(input("Enter a number: "))
public_key, private_key = KeyGeneration(p, q)
c = encrypt(message, public_key, N)
print(f"Encrypted Message: {c}")
print(f"Decrypted Message: {decrypt(c, private_key, N)}")
main()
Again, the core of RSA is the difficulty of factoring a large semi-prime \( N \), in this case, \( N = 12193263122374638001 \). Since we know the the prime factors of \( N \), \( p = 1234567891 \) and \( q = 9876543211 \), we can compute \( \phi(p, q) \) from Euler's Totient function. \( \phi(p, q) \) represents all of the integers less than \( N \) that are coprime with \( N \). Then we select a public key \( e \) for \( 1 < e < \phi \) such that \( e \) is coprime with \( \phi(p, q) \). \( 65537 \) is a popular choice for \( e \) because it's prime. Encrypting an input message \( m \) in integer form entails raising \( m \) to the power of \( e \) and then taking the remainder of the resulting value modulo \( N \). We'll call this encrypted number \( c \). If we know \( p \) and \( q \), we can find the private key (used for decryption) by finding the modular inverse of \( e \) mod \( N \) such that \( ed \equiv 1 \) mod \( \phi(p, q) \), given in the code by pow(a, -1, mod)
.
To factor a semi-prime \( N \), you can find \( a \) and \( b \), \( a \neq \pm b \), such that $$ a^2 - b^2 = (a-b)(a+b) \equiv 0 \text{ (mod N)}$$ This implies that \( (a-b)(a+b) = kN \text{ for k} \in \mathbb{N} \), meaning that \( gcd(a \pm b, N) \) will yield a factor of \( N \). The chances that this factor is nontrival are around \( \frac{1}{2} \)
So, we have \( a^2 - b^2 \equiv 0 \text{ (mod N)} \rightarrow a^2 \equiv b^2 \text{ (mod N)}\).
We can define a function \( Q(x) = (x + \lfloor{\sqrt{N}}\rfloor)^2 - N\) and compute \( Q(x_1), Q(x_2), Q(x_3)...,Q(x_k)\) for integers \( x_k \) over some sieving interval \( [1, I] \). Ideally, we find a \( Q(x_i) \) such that $$ Q(x_i) = (x_i + \lfloor{\sqrt{N}}\rfloor)^2 - N = (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n})^2 \rightarrow (x_i + \lfloor{\sqrt{N}}\rfloor)^2 \equiv (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n})^2 \text{ (mod N)} \rightarrow a^2 \equiv b^2 \text{ (mod N)}$$
However, \( x_i \) that satisfy this condition are rare, so instead we typically rely on a sieving process that idenitfies so-called smooth \( Q(x_j) \) that factor completely over some pre-defined prime factor base, and then a composition process that finds a perfect square of prime factors modulo \( N \) for some combination of \( (x_j + \lfloor{\sqrt{N}}\rfloor)^2 \) values (each associated with a smooth \( Q(x_j) \)). In symbolic notation (for integers \(e_n\)), $$ Q(x_1)Q(x_2)Q(x_3)...Q(x_j) = (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n})^2 \rightarrow (x_1 + \lfloor{\sqrt{N}}\rfloor)^2(x_2 + \lfloor{\sqrt{N}}\rfloor)^2(x_3 + \lfloor{\sqrt{N}}\rfloor)^2...(x_j + \lfloor{\sqrt{N}}\rfloor)^2 \equiv (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n})^2 \rightarrow a^2 \equiv b^2 \text{ (mod N)} $$
Generating a factor base over which to factor values of \( Q(x) \) is essential for QS because it forms the basis for deciding which values of \( Q(x) \) to eliminate from consideration for possible combination to produce a congruence of squares modulo \( N \).
In order to limit the complexity of the calculations to follow, we'll only consider prime factors up to a smoothness bound \( B \) in our factor base. \( B \) is arbituary, but must be large enough to generate a sufficient amount of smooth \( Q(x) \) that are completely factorable over the factor base. Otherwise, we may not find a combination of \( Q(x) \) that yields a congruence of squares modulo \( N \).
First, we need an efficient way to generate primes up to \( B \), so we'll use the Sieve of Eratosthenes, which generates the first \( n \) primes by excluding all multiples of integers from 2 up to \( \frac{n}{2} \). Here's the python implementation below:
def sieve_of_eratosthenes(n):
if n < 2:
return []
nums = []
isPrime = []
for i in range(0, n+1):
nums.append(i)
isPrime.append(True)
isPrime[0]=False
isPrime[1]=False
for j in range(2,int(n/2)):
if isPrime[j] == True:
for i in range(2*j,n+1,j):
isPrime[i] = False
primes = []
for i in range(0, n+1):
if isPrime[i] == True:
primes.append(nums[i])
return primes
We can also ignore many primes lower than the smoothness bound, though. This is because for a prime to divide \( Q(x) \), $$ (x + \lfloor{\sqrt{N}}\rfloor)^2 - N \equiv 0 \text{ (mod p)} \rightarrow (x + \lfloor{\sqrt{N}}\rfloor)^2 \equiv N \text{ (mod p)} \rightarrow s^2 \equiv N \text{ (mod p)} $$ for some integer \( s \). If there is one solution to this congruence, there are two, since if \( s_1^2 \equiv N \text{ (mod p)} \) then there must be an \( s_2 = p - s_1 \) such that $$ s_2^2 = (p - s_1)^2 = p^2 -2ps_1 + s_1^2 \equiv s_1^2 \equiv N \text{ (mod p)} $$
Therefore, only primes p for which \( N \) is quadratic residue (mod p) should be included in the factor base. We can determine whether or not \( N \) is quadratic residue (mod p) by evaluating the Legendre symbol \( (\frac{N}{p}) = N^\frac{p-1}{2} \text{ (mod p)}\). The following python funtion checks if \( N \) is a quadratic residue mod p by evaluating the Legendre symbol with \( N \) and \( p \):
def quad_residue(N, p):
l=1
q=(p-1)//2
x = q**l
if x==0:
return 1
N = N%p
z=1
while x!= 0:
if x%2==0:
N=(N **2) % p
x//= 2
else:
x-=1
z=(z*N) % p
return z
And the final function of this section generates the factor base:
def factor_base_gen(B):
factor_base = []
primes = sieve_of_eratosthenes(B)
for p in primes:
if quad_residue(N, p) == 1:
factor_base.append(p)
return factor_base
The sieving process itself involves finding values of \( Q(x_s) \) that are completely factorable over the factor base, meaning that they have no other prime factors besides those that occupy the factor base. The function below, smooth_relations
attempts to find these so-called smooth values of \( Q(x) \):
def smooth_relations(factor_base, N, I):
Q_values = []
rootN = int(sqrt(N))
for x in range(rootN, rootN + I):
Q_values.append(pow(x, 2) - N)
Q_values_OG = Q_values.copy()
if factor_base[0] == 2:
i = 0
while Q_values[i] % 2 != 0:
i += 1
for j in range(i, len(Q_values), 2):
while Q_values[j] % 2 == 0:
Q_values[j] //= 2
for p in factor_base[1:]:
residues = STonelli(N, p)
for r in residues:
for i in range((r-rootN) % p, len(Q_values), p):
while Q_values[i] % p == 0:
Q_values[i] //= p
xlist = []
smooth_nums = []
indices = []
T = 1
for i in range(len(Q_values)):
if len(smooth_nums) >= len(factor_base)+T:
break
if Q_values[i] == 1:
smooth_nums.append(Q_values_OG[i])
xlist.append(i+rootN)
indices.append(i)
return smooth_nums, xlist, indices
Q_values = []
rootN = int(sqrt(N))
for x in range(rootN, rootN + I):
Q_values.append(pow(x, 2) - N)
This section creates an empty list to store values of \( Q(x) \) and then evalutes \( Q(x) = (x + \lfloor{\sqrt{N}}\rfloor)^2 - N \) for all \( x \) in the sieving interval \( [0, I] \)
Q_values_OG = Q_values.copy()
if factor_base[0] == 2:
i = 0
while Q_values[i] % 2 != 0:
i += 1
for j in range(i, len(Q_values), 2):
while Q_values[j] % 2 == 0:
Q_values[j] //= 2
This segment is the first step in determining which values of \( Q(x) \) are smooth. The code divides every even value of \( Q(x) \) in our list by 2 until the result is odd, given that 2 is in our factor base to begin with. This process effectively removes every factor of 2 from all the elements of our list that contain them. We take a copy to ensure that our manipulations of the list containing values of \( Q(x) \) don't make us lose these values.
for p in factor_base[1:]:
residues = STonelli(N, p)
for r in residues:
for i in range((r-rootN) % p, len(Q_values), p):
while Q_values[i] % p == 0:
Q_values[i] //= p
This segment continues the process of determining which \( Q(x) \) are smooth by dviding out odd primes p from values of \( Q(x) \) that contain them. The STonelli
function, which is listed below, leverages the fact that all odd primes p in our factor base satisfy the condition that \( N \) is a quadratic residue modulo p to find periodic \( Q(x) \) that contain a certain prime p. In symbolic notation, since \( s^2 \equiv N \text{ (mod p)} \) has exactly two solutions for s, \( s_1 \) and \( s_2 = p - s_1 \), then there must also be \( x_{s_1} = s_1 - \lfloor{\sqrt{N}}\rfloor, x_{s_2} = s_2 - \lfloor{\sqrt{N}}\rfloor\) such that $$ Q(x_{s_1}) = (x_{s_1} + \lfloor{\sqrt{N}}\rfloor)^2 -N = s_1^2 - N \equiv 0 \text{ (mod p)} \rightarrow Q(x_{s_1} + kp) = (x_{s_1} + kp + \lfloor{\sqrt{N}}\rfloor)^2 - N = (s_1 + kp)^2 - N \equiv 0 \text{ (mod p)} $$ and, $$ Q(x_{s_2}) = (x_{s_2} + \lfloor{\sqrt{N}}\rfloor)^2 -N = s_2^2 - N \equiv 0 \text{ (mod p)} \rightarrow Q(x_{s_2} + kp) = (x_{s_2} + kp + \lfloor{\sqrt{N}}\rfloor)^2 - N = (s_2 + kp)^2 - N \equiv 0 \text{ (mod p)} $$ for \( k \in \mathbb{Z} \). This implies that \( Q(x_{s_1}) \equiv Q(x_{s_1} + kp) \equiv 0 \text{ (mod p)}\) and \( Q(x_{s_2}) \equiv Q(x_{s_2} + kp) \equiv 0 \text{ (mod p)}\), or that \( Q(x_{s_1} + kp) \) and \( Q(x_{s_2} + kp) \) are divsibile by p. Once \( z \), where \( z \) = (r-rootN) % p
is one of \( x_{s_1} \) or \( x_{s_2} \) in the code, is found for each prime p, we can iterate through values of \( Q(x) \), going p places each iteration, and divide by p until a given value of \( Q(x) \) is no longer divisble by p.
The STonelli
function is given below:
def STonelli(n, p):
assert quad_residue(n, p) == 1, "not a square (mod p)"
q = p - 1
s = 0
while q % 2 == 0:
q //= 2
s += 1
if s == 1:
r = pow(n, (p + 1) // 4, p)
return r,p-r
for z in range(2, p):
if p - 1 == quad_residue(z, p):
break
c = pow(z, q, p)
r = pow(n, (q + 1) // 2, p)
t = pow(n, q, p)
m = s
t2 = 0
while (t - 1) % p != 0:
t2 = (t * t) % p
for i in range(1, m):
if (t2 - 1) % p == 0:
break
t2 = (t2 * t2) % p
b = pow(c, 1 << (m - i - 1), p)
r = (r * b) % p
c = (b * b) % p
t = (t * c) % p
m = i
return (r,p-r)
xlist = []
smooth_nums = []
indices = []
T = 1
for i in range(len(Q_values)):
if len(smooth_nums) >= len(factor_base)+T:
break
if Q_values[i] == 1:
smooth_nums.append(Q_values_OG[i])
xlist.append(i+rootN)
indices.append(i)
return smooth_nums, xlist, indices
This final segment iterates through the values of \( Q(x) \) from the copied list and determies which ones have been completely factored over the factor base, which would imply that all divsors have been divided out, resulting in a value of 1. Note too that the number of smooth \( Q(x) \) collected doesn't need to exceed the size of factor base itself past some tolerance interval T, since finding a combination of \( Q(x_i) \) becomes possible as soon as the number of smooth \( Q(x) \) is the same as the number of factors in the factor base.
Now that we have all smooth values of \( Q(x) \), we either need to find a \( Q(x_i) \) such that $$ Q(x_i) = (x_i + \lfloor{\sqrt{N}}\rfloor)^2 - N = (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n})^2 \rightarrow (x_i + \lfloor{\sqrt{N}}\rfloor)^2 \equiv (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n})^2 \text{ (mod N)} \rightarrow a^2 \equiv b^2 \text{ (mod N)}$$Which is unfortunitely quite unlikely, or find a combination of \( Q(x) \) such that $$ Q(x_1)Q(x_2)Q(x_3)...Q(x_j) = (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n})^2 \rightarrow (x_1 + \lfloor{\sqrt{N}}\rfloor)^2(x_2 + \lfloor{\sqrt{N}}\rfloor)^2(x_3 + \lfloor{\sqrt{N}}\rfloor)^2...(x_j + \lfloor{\sqrt{N}}\rfloor)^2 \equiv (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n})^2 \rightarrow a^2 \equiv b^2 \text{ (mod N)} $$Which is doable with linear algebra.
Now that we have all smooth values of \( Q(x) \), we will construct a matrix with rows corresponding to each smooth \( Q(x) \), and columns corresponding to each factor in the prime factor base. The values occupying each position in the matrix will reflect the exponent on a given prime factor of a given smooth \( Q(x) \) modulo 2. This has the effect of depicting whether or not the exponent on a prime factor of a given \( Q(x) \) is even or odd. We want to find a combination of rows that yield 0's across the entire column space, which would imply that all exponents of all prime factors of this composite value are even, yielding a perfect square modulo \( N \). The following function exponent_matrix
creates said matrix with the help of a factor
function that records each prime factor and its number of occurences in a list for each smooth \( Q(x) \).
def factor(s, factor_base):
factors = []
for p in factor_base:
while s % p == 0:
factors.append(p)
s //= p
return factors
def exponent_matrix(smooth_nums, factor_base):
A = []
for s in smooth_nums:
exp_vector = [0]*(len(factor_base))
s_factors = factor(s ,factor_base)
for i in range(len(factor_base)):
if factor_base[i] in s_factors:
exp_vector[i] = s_factors.count(factor_base[i]) % 2
if 1 not in exp_vector:
return True, s
else:
pass
A.append(exp_vector)
return [False], transpose(A)
NOTES: (1) returns True
if a column with all 0's is found (in which case we have a solution), (2) Transposition of matrix A is done to aid calculations in the next step, Gaussian Elimination.
We have our exponent matrix modulo 2, now we need to solve for a combination of rows corresponding to smooth values of \( Q(x) \) that, when multiplied, yield a solution. Gaussian Elimination over GF(2) (ie. working modulo 2) does this by first identifying pivot elements in the matrix that can be used to eliminate variables and reduce the system to a simpler form. This step is crucial as it helps us to isolate the rows (i.e., smooth numbers) that can be combined to potentially form a perfect square. Here's the code:
def gaussian_elimination(A):
marks = [False]*len(A[0])
for i in range(len(A)):
row = A[i]
for num in row:
if num == 1:
j = row.index(num)
marks[j] = True
for k in chain(range(0,i),range(i+1,len(A))):
if A[k][j] == 1:
for i in range(len(A[k])):
A[k][i] = (A[k][i] + row[i])%2
break
A = transpose(A)
sol_rows = []
for i in range(len(marks)):
if marks[i]== False:
free_row = [A[i],i]
sol_rows.append(free_row)
if not sol_rows:
return("No solution found. Need more smooth numbers.")
print("Found {} potential solutions".format(len(sol_rows)))
return sol_rows,marks,A
Explination:
marks
of length equal to the number of columns in matrix \(A\) is initialized to False
. This array tracks which columns have been used to eliminate other 1s in the same column, effectively marking the pivot columns.marks[j]
is set to True
, signifying that column \(j\) has been used for elimination.Following the Gaussian Elimination and the transposition of \(A\), we now look for columns (which are originally rows before transposition) that were not marked by any pivot. These unmarked columns, now rows in the transposed matrix, represent the free variables in our system. Each free variable offers a potential solution vector, which is a combination of smooth numbers that, when multiplied together, might yield a perfect square modulo \(N\).
solve_row
FunctionAfter Gaussian Elimination and identifying potential solution vectors, the solve_row
function isolates specific combinations of smooth numbers that may lead to a successful factorization of \(N\). This function takes the solution rows identified earlier and pinpoints which rows in the original matrix correspond to a viable solution vector. Here's the code:
def solve_row(sol_rows,A,marks,K=0):
solution_vec, indices = [],[]
free_row = sol_rows[K][0]
for i in range(len(free_row)):
if free_row[i] == 1:
indices.append(i)
for r in range(len(A)):
for i in indices:
if A[r][i] == 1 and marks[r]:
solution_vec.append(r)
break
solution_vec.append(sol_rows[K][1])
return solution_vec
The solve_row
function operates as follows:
solve_row
scans the entire matrix to find rows that share these indices. It constructs a solution vector comprising rows that contain a 1 in any of the marked positions. This vector signifies a group of smooth numbers that, collectively, are likely to produce a square when multiplied together.With our solution vector, we perform one final step to find the factors of \( N \).
This is the final step in the sieving process! And don't worry, it's not too hard. The task is left to the solve
function, which aims to find \( a= (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n}) \) and \( b = (x_1 + \lfloor{\sqrt{N}}\rfloor)(x_2 + \lfloor{\sqrt{N}}\rfloor)(x_3 + \lfloor{\sqrt{N}}\rfloor)...(x_j + \lfloor{\sqrt{N}}\rfloor) \) in the congruence: $$ Q(x_1)Q(x_2)Q(x_3)...Q(x_j) = (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n})^2 \rightarrow (x_1 + \lfloor{\sqrt{N}}\rfloor)^2(x_2 + \lfloor{\sqrt{N}}\rfloor)^2(x_3 + \lfloor{\sqrt{N}}\rfloor)^2...(x_j + \lfloor{\sqrt{N}}\rfloor)^2 \equiv (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n})^2 \rightarrow a^2 \equiv b^2 \text{ (mod N)} $$ This implies that \( (a-b)(a+b) = kN \text{ for k} \in \mathbb{N} \), meaning that \( gcd(a \pm b, N) \) will yield a factor of \( N \)
def solve(solution_vec, smooth_nums, xlist, N):
solution_nums = [smooth_nums[i] for i in solution_vec]
x_nums = [xlist[i] for i in solution_vec]
square = 1
for n in solution_nums:
square *= n
a = int(sqrt(square))
b = 1
for n in x_nums:
b *= n
factor = gcd(b-a, N)
return factor
With our solution vector, we know exactly which smooth values of \( Q(x) \) should be multiplied to yield a perfect square modulo \( N \). The following snippet of code extracts these values of \( Q(x) \) and their corresponding \(x + \lfloor{\sqrt{N}}\rfloor \) values.
solution_nums = [smooth_nums[i] for i in solution_vec]
x_nums = [xlist[i] for i in solution_vec]
\( a = (p_1^{e_1}p_2^{e_2}p_3^{e_3}...p_n^{e_n}) \) is found simply by taking the square root of the multiplied smooth values of \( Q(x) \) in the solution vector, \( Q(x_1)Q(x_2)Q(x_3)...Q(x_j) \):
square = 1
for n in solution_nums:
square *= n
a = int(sqrt(square))
\(b = (x_1 + \lfloor{\sqrt{N}}\rfloor)(x_2 + \lfloor{\sqrt{N}}\rfloor)(x_3 + \lfloor{\sqrt{N}}\rfloor)...(x_j + \lfloor{\sqrt{N}}\rfloor) \) is found by multiplying each \(x_j + \lfloor{\sqrt{N}}\rfloor \) value associated with a given solution vector component \( Q(x_j) \):
b = 1
for n in x_nums:
b *= n
Finally, \( gcd(b-a, N) \) is computed (yields same factor as gcd(a-b, N)), yielding a (hopefully non-trivial) factor of N
Now that we have an algorithm to factor large semi-primes, lets test it on \( N = 12193263122374638001 \). Here's the terminal logs:
abelobsenz@Abes-MacBook-Air desktop % python3 QuadraticSieveAlgorithm.py
Enter a semiprime number: 12193263122374638001
Enter an appropriate smoothness bound: 100000
Enter an appropriate interval for smoothness testing (should be F + 1 smooth numbers): 100000
Factoring 12193263122374638001 now
Generating B-smooth factor base...
Looking for smooth numbers...
Building exponent matrix...
Unlucky, you need Gaussian elimination...
Found 1407 potential solutions
Solving congruence of squares...
Didn't work. Trying different solution vector...
Didn't work. Trying different solution vector...
Found factors!
Factors of N: (9876543211, 1234567891)
What to do with these factors, what to do? I think we should find the private key in RSA by computing the inverse of the public key (65537) mod \( \phi(9876543211, 1234567891)\)!
It's 6751626181311963136. He he, time to cause some trouble.