# Digital Media Concepts/RSA (cryptosystem)

## What is RSA

It is one of the first asymmetric cryptography algorithms which uses a pair of keys to encrypt and decrypt data. This algorithm is now widely used to transmit sensitive data over insecure network like the Internet.

## History

Before mid-1970s, people used symmetric cryptography algorithms to encrypt data. With these algorithms, data encrypted with a particular cryptographic key could only be decrypted with the same key. Anyone without the key cannot decrypt the data. Therefore people could safely send sensitive information through insecure communication channel. However, people could not find a way to safely exchange their keys between them. Asymmetric cryptography algorithms were invented to solve this problem, and RSA is one of them. In April 1977, RSA was invented by three people from Massachusetts Institute of Technology, including two computer scientists, Ron Rivest, Adi Shamir, and a mathematician Leonard Adleman, and was later publicized in August of the same year.

RSA is now in public domain, and can be freely implemented and used by anyone.

## Steps

#### Generate the keys

As an asymmetric cryptography algorithm, RSA cryptosystem involves two keys, the public key and the private key. Data encrypted by one key can only be decrypted with the other.

1. Randomly choose two prime numbers, $p$ and $q$ .
2. Multiply them together:
$n=p\cdot q$ 3. Get the least common multiple of $p-1$ and $q-1$ :
$\phi =lcm(p-1,q-1)$ 4. Randomly choose a positive integer $e$ which is less than $\phi$ and coprime to $\phi$ .
5. Calculate modular multiplicative inverse $d$ of $e$ modulo $\phi$ . Which means, find a positive integer $d$ which is less than $\phi$ that satisfies:
$e\cdot d\equiv 1{\pmod {\phi }}$ 6. Now the key pair is generated. $(d,n)$ is the private key, and $(e,n)$ is the public key
###### Sample Code

Here's a sample implementation of generating RSA keys written in Python 3.6

Code
import secrets

def gcd(x: int, y: int) -> tuple:
''' Extended Euclidean Algorithm

return: a, b, gcd
ax + by = gcd
'''
def f(x: int, y: int) -> tuple:
l = []
while y != 0:
q, r = divmod(x, y)
l.append(q)
x, y = y, r
x, y, g = (1, 0, x) if x >= 0 else (-1, 0, -x)
while l:
x, y = y, x - l.pop() * y
return x, y, g
# below is a recursive approach. easier to understand
# but it may exceeds python recursion limit with large numbers
# if y == 0:
#     return (1, 0, x) if x >= 0 else (-1, 0, -x)
# quot, rem = divmod(x, y)         # rem + quot * y = x . . . . . (1)
# _a, _b, _gcd = f(y, rem)         # _a * y + _b * rem = _gcd . . (2)
# return _b, _a - quot * _b, _gcd  # <- plug rem from (1) into (2)
assert isinstance(x, int) and isinstance(y, int), 'Integers expected'
return f(x, y)

def miller_rabin(n: int, k: int=100):
''' Miller-Rabin primality test

https://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test :

Input #1: n > 3, an odd integer to be tested for primality
Input #2: k, the number of rounds of testing to perform
Output: “composite” if n is found to be composite,
“probably prime” otherwise

write n as 2^r·d + 1 with d odd (by factoring out powers of 2 from n − 1)
WitnessLoop: repeat k times:
pick a random integer a in the range [2, n − 2]
x ← a^d mod n
if x = 1 or x = n − 1 then
continue WitnessLoop
repeat r − 1 times:
x ← x^2 mod n
if x = n − 1 then
continue WitnessLoop
return “composite”
return “probably prime”
'''
assert isinstance(n, int) and isinstance(k, int) and k > 0
if n < 4:
return n > 1
if n & 1 == 0:   # even numbers
return False
r = 0
n_1 = n - 1
d = n_1
while d & 1 == 0:
d >>= 1
r += 1
for i in range(k):
a = secrets.randbelow(n - 3) + 2
x = pow(a, d, n)
if x == 1 or x == n_1:
continue
for j in range(r):
x = pow(x, 2, n)
if x == n_1:
break
else:
return False
return True

def get_prime(bits: int) -> int:
''' Get a pseudo-prime with given bits

This function randomly generates a number with given bits, and test its
primality using the miller-rabin test.
'''
assert isinstance(bits, int) and bits > 1
low = 1 << bits - 1
while True:
r = secrets.randbits(bits - 1) + low
if miller_rabin(r):
return r

def generate_rsa_keys(bits: int):
''' Generate RSA key pair

It returns e, d, n, which makes
(m^e mod n)^d mod n == m
'''
def lcm(x: int, y: int) -> int:
a, b, g = gcd(x, y)
return x * y // g
assert isinstance(bits, int) and bits > 5
p = q = 0
while p == q:
p = get_prime(bits // 2)
q = get_prime(bits // 2)
n = p * q
l = lcm(p - 1, q - 1)
_g = 0
while _g != 1:
e = secrets.randbelow(l - 3) + 3  # [3, l)
d, _not_used, _g = gcd(e, l)
if d < 0: # Got negative d
d += l
return e, d, n


#### Send the public key to others

Let's say Alice and Bob want to have a secure communication. They may exchange their public keys without encryption. After that, a sender should always encrypt the data with the receiver's public key before sending it.

For example, Bob's data is encrypted with Bob's public key, and only people who know Alice's private key can decrypt the data. So Alice is the only person that meets this requirement. The same works with Bob.

#### Encryption and Decryption

1. Let's say the original data is an positive integer $m$ which should be less than $n$ . Encrypted data, positive integer $c$ , can be generated with a public key:
$c=m^{e}{\bmod {n}}$ 2. The encrypted data $c$ can be decrypted with the corresponding private key:
$m=c^{d}{\bmod {n}}$ #### Proof of correctness

Proof

Because the encrypted data $c$ is calculated as:

$c=m^{e}{\bmod {n}}$ When decrypting the encrypted data $c$ , the result is:

$r=c^{d}{\bmod {n}}=(m^{e}{\bmod {n}})^{d}{\bmod {n}}=m^{e\cdot d}{\bmod {n}}$ If $m^{e\cdot d}{\bmod {n}}$ equals to the original unencrypted data $m$ , then this algorithm is correct.

To prove $m^{e\cdot d}{\bmod {n}}=m$ , Chinese remainder theorem is needed.

It says when $a$ and $b$ are coprime ($gcd(a,b)=1$ ), if both of these statements are true:

$x\equiv y{\pmod {a}}$ $x\equiv y{\pmod {b}}$ Then this is also true:

$x\equiv y{\pmod {a\cdot b}}$ As $p$ and $q$ are both prime numbers, they are obviously coprime. So if both of these statements could be proved to be true:

$m^{e\cdot d}\equiv m{\pmod {p}}$ $m^{e\cdot d}\equiv m{\pmod {q}}$ Then this is also true:

$m^{e\cdot d}\equiv m{\pmod {n}}$ Then the correctness of RSA algorithm could be proved:

$r=m^{e\cdot d}{\bmod {n}}=m{\bmod {n}}=m$ Because $p$ and $q$ are identical, only one of $m^{e\cdot d}\equiv m{\pmod {p}}$ and $m^{e\cdot d}\equiv m{\pmod {q}}$ need to be proved. The same works with the other one.

Prove the correctness of $m^{e\cdot d}\equiv m{\pmod {p}}$ :

To accomplish this, Fermat's little theorem is necessary.

It says, if $y$ is a prime, and $x$ is not a multiple of $y$ ($y\nmid x$ ), then this statement is true:

$x^{y-1}\equiv 1{\pmod {y}}$ As the relationship between $m$ and $p$ is unknown, the problem need to be divided into two cases.

Case 1: $m$ and $p$ are not coprime, which means $m$ is a multiple of $p$ ($p\mid m$ ). Obviously,

$m^{e\cdot d}\equiv 0\equiv m{\pmod {p}}$ Case 2: $m$ and $p$ are coprime. In this case, Fermat's little theorem could be used. So $m^{p-1}\equiv 1{\pmod {p}}$ Because:

$e\cdot d\equiv 1{\pmod {\phi }}$ As $\phi$ is a multiple of both $p-1$ and $q-1$ , both of these statements are true:

$e\cdot d\equiv 1{\pmod {p-1}}$ $e\cdot d\equiv 1{\pmod {q-1}}$ So $e\cdot d$ can be expressed as:

$e\cdot d=1+k\cdot (p-1)$ Therefore:

$m^{e\cdot d}\equiv m^{1+k\cdot (p-1)}\equiv m\cdot (m^{p-1})^{k}\equiv m\cdot 1^{k}\equiv m{\pmod {p}}$ Q.E.D.

## Safety

The public key, which is $e$ and $n$ , can be known to everyone. As far as $d$ is kept private, no one except the private key owner is able to decrypt the data encrypted by the public key. However, number $n$ has two prime factors $p$ and $q$ . If one can find $p$ and $q$ by factoring $n$ , then this person can also find out $\phi =lcm(p-1,q-1)$ , and eventually $d$ . So the problem of how safe RSA is, is equivalent to how hard it is to factor $n$ .

It is proved that, currently with traditional (non-quantum) computer, factoring a big number is an NP problem. It may or may not be an NP-complete problem. RSA remains safe when integer factorization can't be solved in polynomial time.