Reed–Solomon codes for coders
Reed–Solomon error correction is widely used in applications such as digital data storage (like CDs) and transmission. However, in these situations, the codes themselves are hidden inside an electronic device, so there's no opportunity to take a peek at them and see how they work. Some of the more complicated barcode designs also incorporate Reed–Solomon codes, which puts everything in plain view. This makes them an interesting subject for a hobbyist who wants to see firsthand how this kind of technology works.
In this essay, I will attempt to introduce the principles of Reed–Solomon codes from the point of view of a programmer rather than a mathematician. I will provide real-world examples taken from the popular QR code barcode system as well as working code samples. I chose to use Python for the samples (mainly because it looks pretty), but I will try to explain any non-obvious features for those who are not familiar with it. The math involved is advanced in the sense that it is not usually taught below the university level, but it should be understandable to someone with a good grasp of high-school algebra.
- 1 QR code structure
- 2 BCH codes
- 3 Finite field arithmetic
- 4 Reed–Solomon codes
- 5 Conclusion
QR code structure
This section introduces the structure of QR codes in detail. The information in this section is deliberately incomplete. Only the more common features of the small 21×21 size symbols (also known as version 1) are presented here, but see the appendix for additional information.
Here is a QR symbol that will be used as an example. It consists of dark and light squares, known as modules in the barcoding world. The three square locator patterns in the corners are a visually distinctive feature of QR symbols.
A masking process is used to avoid features in the symbol that might confuse a scanner, such as misleading shapes that look like the locator patterns and large blank areas. Masking inverts certain modules (white becomes black and black becomes white) while leaving others alone.
In the diagram below, the red areas encode format information and use a fixed masking pattern. The data area (in black and white) is masked with a variable pattern. When the code is created, the encoder tries a number of different masks and chooses the one that minimizes undesirable features in the result. The chosen mask pattern is then indicated in the format information so that the decoder knows which one to use. The light gray areas are fixed patterns which do not encode any information. In addition to the obvious locator patterns, there are also timing patterns which contain alternating light and dark modules.
The masking transformation is easily applied (or removed) using the exclusive-or operation (denoted by a caret ^ in many programming languages). The unmasking of the format information is shown below. Reading counter-clockwise around the upper-left locator pattern, we have the following sequence of bits. White modules represent 0 and black modules represent 1.
Input 101101101001011 Mask ^ 101010000010010 Output 000111101011001
There are two identical copies of the format information, so that the symbol can still be decoded even if it is damaged. The second copy is broken in two pieces and placed around the other two locators, and is also read in a counter-clockwise direction (upwards in the lower-left corner, then left-to-right in the upper-right corner).
The first two bits of format information give the error correction level used for the message data. A QR symbol this size contains 26 bytes of information. Some of these are used to store the message and some are used for error correction, as shown in the table below. The left-hand column is simply a name given to that level.
|Error Correction Level||Level Indicator||Error Correction Bytes||Message Data Bytes|
The next three bits of format information select the masking pattern to be used in the data area. The patterns are illustrated below, including the mathematical formula that tells whether a module is black (i and j are the row and column numbers, respectively, and start with 0 in the upper-left hand corner).
The remaining ten bits of format information are for correcting errors in the format itself. This will be explained in a later section.
Here is a larger diagram showing the "unmasked" QR code. Different regions of the symbol are indicated, including the boundaries of the message data bytes.
Data bits are read starting from the lower-right corner and moving up the two right-hand columns in a zig-zag pattern. The first three bytes are 01000000 11010010 01110101. The next two columns are read in a downward direction, so the next byte is 01000111. Upon reaching the bottom, the two columns after that are read upward. Proceed in this up-and-down fashion all the way to the left side of the symbol (skipping over the timing pattern where necessary). Here is the complete message in hexadecimal notation.
- Message data bytes: 40 d2 75 47 76 17 32 06 27 26 96 c6 c6 96 70 ec
- Error correction bytes: bc 2a 90 13 6b af ef fd 4b e0
The final step is to decode the message bytes into something readable. The first four bits indicate how the message is encoded. QR codes use several different encoding schemes, so that different kinds of messages can be stored efficiently. These are summarized in the table below. After the mode indicator is a length field, which tells how many characters are stored. The size of the length field depends on the specific encoding.
|Mode Name||Mode Indicator||Length Bits||Data Bits|
|Numeric||0001||10||10 bits per 3 digits|
|Alphanumeric||0010||9||11 bits per 2 characters|
|Byte||0100||8||8 bits per character|
|Kanji||1000||8||13 bits per character|
(The length field sizes above are valid only for smaller QR codes.)
Our sample message starts with 0100, indicating that there are 8 bits per character. The next 8 bits are the length field, 00001101, or 13 in decimal notation. After that are the actual characters of the message. The first two are 00100111 and 01010100 (the ASCII codes for apostrophe and T). Interested readers may want to decode the rest of the message for themselves.
After the last of the data bits is another 4-bit mode indicator. It can be different from the first one, allowing different encodings to be mixed within the same QR symbol. When there is no more data to store, the special end-of-message code 0000 is given. (Note that the standard allows the end-of-message code to be omitted if it wouldn't fit in the available number of data bytes.)
The format information is encoded with a BCH code which allows a certain number of bit-errors to be detected and corrected. BCH codes are a generalization of Reed–Solomon codes (all Reed–Solomon codes are also BCH codes). In the case of QR codes, the BCH code used for the format information is much simpler than the Reed–Solomon code used for the message data, so it makes sense to start here.
BCH error detection
The process for checking the encoded information is similar to long division, but uses exclusive-or instead of subtraction. The format code should produce a remainder of zero when it is is "divided" by the so-called generator of the code. QR format codes use the generator 10100110111. This process is demonstrated for the format information in the example code (000111101011001) below.
00011 10100110111 ) 000111101011001 ^ 10100110111 010100110111 ^ 10100110111 00000000000
Here is a Python function which implements this calculation.
def qr_check_format(fmt): g = 0x537 # = 0b10100110111 in python 2.6+ for i in range(4,-1,-1): if fmt & (1 << (i+10)): fmt ^= g << i return fmt
Python note: The range function may not be clear to non-Python programmers. It produces a list of numbers counting down from 4 to 0. In C-derived languages, the for loop might be written as for (i = 4; i >= 0; i--); in Pascal-derived languages, for i := 4 downto 0.
This function can also be used to encode the 5-bit format information.
encoded_format = (format<<10) ^ qr_check_format(format<<10)
Readers may find it an interesting exercise to generalize this function to divide by different numbers. For example, larger QR codes contain six bits of version information with 12 error correction bits using the generator 1111100100101.
In mathematical formalism, these binary numbers are described as polynomials whose coefficients are integers mod 2. Each bit of the number is a coefficient of one term. For example:
- 10100110111 = 1 x10 + 0 x9 + 1 x8 + 0 x7 + 0 x6 + 1 x5 + 1 x4 + 0 x3 + 1 x2 + 1 x + 1 = x10 + x8 + x5 + x4 + x2 + x + 1
BCH error correction
If the remainder produced by qr_check_format is not zero, then the code has been damaged or misread. The next step is to determine which format code is most likely the one that was intended. Although sophisticated algorithms for decoding BCH codes exist, they are probably overkill in this case. Since there are only 32 possible format codes, it's much easier to simply try each one and pick the one that has the smallest number of bits different from the code in question (this is known as the Hamming distance).
def hamming_weight(x): weight = 0 while x > 0: weight += x & 1 x >>= 1 return weight def qr_decode_format(fmt): best_fmt = -1 best_dist = 15 for test_fmt in range(0,32): test_code = (test_fmt<<10) ^ qr_check_format(test_fmt<<10) test_dist = hamming_weight(fmt ^ test_code) if test_dist < best_dist: best_dist = test_dist best_fmt = test_fmt elif test_dist == best_dist: best_fmt = -1 return best_fmt
The function qr_decode_format returns -1 if the format code could not be unambiguously decoded. This happens when two or more format codes have the same distance from the input.
To run this code in Python, first start IDLE, Python's integrated development environment. You should see a version message and the interactive input prompt >>>. Open a new window, copy the functions qr_check_format, hamming_weight, and qr_decode_format into it, and save as qr.py. Return to the prompt and type the lines following >>> below.
>>> from qr import * >>> qr_decode_format(int("000111101011001",2)) # no errors 3 >>> qr_decode_format(int("111111101011001",2)) # 3 bit-errors 3 >>> qr_decode_format(int("111011101011001",2)) # 4 bit-errors -1
You can also start Python by typing python at a command prompt.
Finite field arithmetic
Before discussing the Reed–Solomon codes used for the message, it will be useful to introduce a bit more math. We will define new operations analogous to multiplication and division that act on 8-bit bytes and always produce 8-bit bytes as a result. Many of the familiar rules of arithmetic still apply with the new definitions. For example, one times anything results in that same thing, zero times anything is zero, and division by zero is still not allowed. Other useful mathematical properties (such as the distributive property) also hold. This is known as a finite field or Galois field.
Multiplication is based on the polynomial definition above. Simply write the inputs as polynomials and multiply them out using the distributive law as normal. As an example, 10001001 times 00101010 is calculated as follows.
- (x7 + x3 + 1) (x5 + x3 + x) = x7 (x5 + x3 + x) + x3 (x5 + x3 + x) + 1 (x5 + x3 + x)
- = x12 + x10 + 2 x8 + x6 + x5 + x4 + x3 + x
- = x12 + x10 + x6 + x5 + x4 + x3 + x
The same result can be obtained by a modified version of the standard grade-school multiplication procedure, in which we replace addition with exclusive-or.
10001001 * 00101010 10001001 ^ 10001001 ^ 10001001 1010001111010
Here is a Python function which implements this polynomial multiplication.
def poly_mult(x,y): z = 0 i = 0 while (y>>i) > 0: if y & (1<<i): z ^= x<<i i += 1 return z
Of course, the result no longer fits in an 8-bit byte (in this example, it is 13 bits long), so we need to perform one more step before we are finished. The result is reduced modulo 100011101, using the long division process described previously. This produces the final answer 11000011 in our example.
Multiplication with logarithms
The procedure described above is not the most convenient way to implement Galois field multiplication. Multiplying two numbers takes up to eight iterations of the multiplication loop, followed by up to eight iterations of the division loop (these loops can be combined to make the calculation somewhat simpler). However, we can multiply with no looping by using lookup tables. One solution would be to construct the entire multiplication table in memory, but that would require a bulky 64k table. The solution described below is much more compact.
First, notice that it is particularly easy to multiply by 00000010 (by convention, this number is referred to as α): simply left-shift by one place, then exclusive-or with the modulus 100011101 if necessary. Here are the first few powers of α.
|α0 = 00000001||α4 = 00010000||α8 = 00011101||α12 = 11001101|
|α1 = 00000010||α5 = 00100000||α9 = 00111010||α13 = 10000111|
|α2 = 00000100||α6 = 01000000||α10 = 01110100||α14 = 00010011|
|α3 = 00001000||α7 = 10000000||α11 = 11101000||α15 = 00100110|
If this table is continued in the same fashion, the powers of α do not repeat themselves until α255 = 00000001. Thus, every element of the field except zero is equal to some power of α. The element α is known as a primitive element or generator of the Galois field.
This observation suggests another way to implement multiplication: by adding the exponents of α.
- 10001001 * 00101010 = α74 * α142 = α74 + 142 = α216 = 11000011
The problem is, how do we find the power of α that corresponds to 10001001? This is known as the discrete logarithm problem, and no efficient general solution is known. However, since there are only 256 elements in this field, we can easily construct a table of logarithms. While we're at it, a corresponding table of antilogs (exponentials) will also be useful.
gf_exp =  * 512 # Create list of 512 elements. In Python 2.6+, consider using bytearray gf_log =  * 256 gf_exp = 1 x = 1 for i in range(1,255): x <<= 1 if x & 0x100: x ^= 0x11d gf_exp[i] = x gf_log[x] = i for i in range(255,512): gf_exp[i] = gf_exp[i-255] gf_log[gf_exp] = 255 # Set last missing elements in gf_log
Python note: The range operator's upper bound is exclusive, so gf_exp is not set twice by the above.
The gf_exp table is oversized in order to simplify the multiplication function. This way, we don't have to check to make sure that gf_log[x] + gf_log[y] is within the table size.
def gf_mul(x,y): if x==0 or y==0: return 0 return gf_exp[gf_log[x] + gf_log[y]]
Another advantage of the logarithm table approach is that it allows us to define division using the difference of logarithms. In the code below, 255 is added to make sure the difference isn't negative.
def gf_div(x,y): if y==0: raise ZeroDivisionError() if x==0: return 0 return gf_exp[gf_log[x] + 255 - gf_log[y]]
Python note: The raise statement throws an exception and aborts execution of the gf_div function.
With this definition of division, gf_div(gf_mul(x,y),y)==x for any x and any nonzero y.
Readers who are more advanced programmers may find it interesting to write a class encapsulating Galois field arithmetic. Operator overloading can be used to replace calls to gf_mul and gf_div with the familiar operators * and /, but this can lead to confusion as to exactly what type of operation is being performed. Certain details can be generalized in ways that would make the class more widely useful. For example, Aztec codes use five different Galois fields with element sizes ranging from 4 to 12 bits.
Before moving on to Reed–Solomon codes, we need to define several operations on polynomials whose coefficients are Galois field elements. This is a potential source of confusion, since the elements themselves are described as polynomials; my advice is not to think about it too much. Adding to the confusion is the fact that x is still used as the placeholder. This x has nothing to do with the x mentioned previously, so don't mix them up.
The binary notation used previously for Galois field elements starts to become inconveniently bulky at this point, so I will switch to hexadecimal instead.
- 00000001 x4 + 00001111 x3 + 00110110 x2 + 01111000 x + 01000000 = 01 x4 + 0f x3 + 36 x2 + 78 x + 40
In Python, polynomials will be represented by a list of numbers in descending order of powers of x, so the polynomial above becomes [ 0x01, 0x0f, 0x36, 0x78, 0x40 ]. (The reverse order could have been used instead; both choices have their advantages and disadvantages.)
The first function multiplies a polynomial by a scalar.
def gf_poly_scale(p,x): r =  * len(p) for i in range(0, len(p)): r[i] = gf_mul(p[i], x) return r
Note to Python programmers: This function is not written in a "pythonic" style. It could be expressed quite elegantly as a list comprehension, but I have limited myself to language features that are easier to translate to lesser programming languages.
This function "adds" two polynomials (using exclusive-or, as usual).
def gf_poly_add(p,q): r =  * max(len(p),len(q)) for i in range(0,len(p)): r[i+len(r)-len(p)] = p[i] for i in range(0,len(q)): r[i+len(r)-len(q)] ^= q[i] return r
The next function multiplies two polynomials.
def gf_poly_mul(p,q): r =  * (len(p)+len(q)-1) for j in range(0, len(q)): for i in range(0, len(p)): r[i+j] ^= gf_mul(p[i], q[j]) return r
Finally, we need a function to evaluate a polynomial at a particular value of x, producing a scalar result. Horner's method is used to avoid explicitly calculating powers of x.
- 01 x4 + 0f x3 + 36 x2 + 78 x + 40 = (((01 x + 0f) x + 36) x + 78) x + 40
def gf_poly_eval(p,x): y = p for i in range(1, len(p)): y = gf_mul(y,x) ^ p[i] return y
Now that the preliminaries are out of the way, we are ready to begin looking at Reed–Solomon codes.
RS generator polynomial
Reed–Solomon codes use a generator polynomial similar to BCH codes. The generator is the product of factors (x - αn), starting with n=0 for QR codes. For example:
- g4(x) = (x - α0) (x - α1) (x - α2) (x - α3) = 01 x4 + 0f x3 + 36 x2 + 78 x + 40
Here is a function that computes the generator polynomial for a given number of error correction symbols.
def rs_generator_poly(nsym): g =  for i in range(0,nsym): g = gf_poly_mul(g, [1, gf_exp[i]]) return g
This function is somewhat inefficient in that it allocates successively larger arrays for g. While this is unlikely to be a performance problem in practice, readers who are inveterate optimizers may find it interesting to rewrite it so that g is only allocated once.
Like BCH codes, Reed–Solomon codes are encoded with a process analogous to long division. This example shows the calculation for the message 12 34 56.
12 da df 01 0f 36 78 40 ) 12 34 56 00 00 00 00 ^ 12 ee 2b 23 f4 da 7d 23 f4 00 ^ da a2 85 79 84 df a6 8d 84 00 ^ df 91 6b fc d9 37 e6 78 d9
The remainder is concatenated with the message, so the encoded message is 12 34 56 37 e6 78 d9. Here is a function that implements this encoding procedure.
def rs_encode_msg(msg_in, nsym): gen = rs_generator_poly(nsym) msg_out =  * (len(msg_in) + nsym) for i in range(0, len(msg_in)): msg_out[i] = msg_in[i] for i in range(0, len(msg_in)): coef = msg_out[i] if coef != 0: for j in range(0, len(gen)): msg_out[i+j] ^= gf_mul(gen[j], coef) for i in range(0, len(msg_in)): msg_out[i] = msg_in[i] return msg_out
This example shows the encode function applied to the message in the sample QR code introduced earlier. The calculated error correction symbols (on the second line) match the values decoded from the QR code.
>>> msg_in = [ 0x40, 0xd2, 0x75, 0x47, 0x76, 0x17, 0x32, 0x06, ... 0x27, 0x26, 0x96, 0xc6, 0xc6, 0x96, 0x70, 0xec ] >>> msg = rs_encode_msg(msg_in, 10) >>> for i in range(0,len(msg)): ... print(hex(msg[i]), end=' ') ... 0x40 0xd2 0x75 0x47 0x76 0x17 0x32 0x6 0x27 0x26 0x96 0xc6 0xc6 0x96 0x70 0xec 0xbc 0x2a 0x90 0x13 0x6b 0xaf 0xef 0xfd 0x4b 0xe0
Python version note: The syntax for the print function has changed, and this example uses the Python 3.0+ version. In previous versions of Python, replace the print line with print hex(msg[i]), (including the final comma).
Decoding a Reed–Solomon message involves several steps. The first step is to calculate the "syndrome" of the message. Treat the message as a polynomial and evaluate it at α0, α1, α2, ..., αn. Since these are the zeros of the generator polynomial, the result should be zero if the scanned message is undamaged. If not, the syndromes contain all the information necessary to determine the correction that should be made. It is simple to write a function to calculate the syndromes.
def rs_calc_syndromes(msg, nsym): synd =  * nsym for i in range(0, nsym): synd[i] = gf_poly_eval(msg, gf_exp[i]) return synd
Continuing the example, we see that the syndromes of the message are indeed zero. Introducing a change into the message gives nonzero syndromes.
>>> synd = rs_calc_syndromes(msg, 10) >>> print(synd) [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] >>> msg = 0 # deliberately damage the message >>> synd = rs_calc_syndromes(msg, 10) >>> print(synd) [64, 192, 93, 231, 52, 92, 228, 49, 83, 245]
It is simplest to correct mistakes in the code if the locations of the mistakes are already known. This is known as erasure correction. It is possible to correct one erased symbol for each error-correction symbol added to the code. If the error locations are not known, two EC symbols are needed for each symbol error. This makes erasure correction useful in practice if part of the QR code being scanned is covered or physically torn away. It may be difficult for a scanner to determine that this has happened, though, so not all QR code scanners can perform erasure correction.
The Forney algorithm is used to calculate the correction values. It is implemented in the function below.
def rs_correct_errata(msg, synd, pos): # calculate error locator polynomial q =  for i in range(0, len(pos)): x = gf_exp[len(msg)-1-pos[i]] q = gf_poly_mul(q, [x,1]) # calculate error evaluator polynomial p = synd[0:len(pos)] p.reverse() p = gf_poly_mul(p, q) p = p[len(p)-len(pos):len(p)] # formal derivative of error locator eliminates even terms qprime = q[len(q)&1:len(q):2] # compute corrections for i in range(0, len(pos)): x = gf_exp[pos[i]+256-len(msg)] y = gf_poly_eval(p, x) z = gf_poly_eval(qprime, gf_mul(x,x)) msg[pos[i]] ^= gf_div(y, gf_mul(x,z))
Math note: The denominator of the expression for the error value is the formal derivative of the error locator polynomial q. This is calculated by the usual procedure of replacing each term cn xn with n cn xn-1. Since we're working in a field of characteristic two, n cn is equal to cn when n is odd, and 0 when n is even. Thus, we can simply remove the even coefficients (resulting in the polynomial qprime) and evaluate qprime(x2).
Python note: This function uses array slicing to extract parts of an array. The expression synd[0:len(pos)] returns the first few elements of synd, while p[len(p)-len(pos):len(p)] returns the last few elements of p. The more complicated expression q[len(q)&1:len(q):2] returns every second element of q, skipping the first element if the length of q is odd.
Continuing the example, here we use rs_correct_errata to restore the first byte of the message.
>>> msg = 0 >>> synd = rs_calc_syndromes(msg, 10) >>> rs_correct_errata(msg, synd, ) >>> print(hex(msg)) 0x40
In the more likely situation where the error locations are unknown, the first step is to find them. The Berlekamp–Massey algorithm is used to calculate the error locator polynomial. Then we simply use trial substitution to find the zeros of this polynomial, which identifies the error locations.
def rs_find_errors(synd, nmess): # find error locator polynomial with Berlekamp-Massey algorithm err_poly =  old_poly =  for i in range(0,len(synd)): old_poly.append(0) delta = synd[i] for j in range(1,len(err_poly)): delta ^= gf_mul(err_poly[len(err_poly)-1-j], synd[i-j]) if delta != 0: if len(old_poly) > len(err_poly): new_poly = gf_poly_scale(old_poly, delta) old_poly = gf_poly_scale(err_poly, gf_div(1,delta)) err_poly = new_poly err_poly = gf_poly_add(err_poly, gf_poly_scale(old_poly, delta)) errs = len(err_poly)-1 if errs*2 > len(synd): return None # too many errors to correct # find zeros of error polynomial err_pos =  for i in range(0, nmess): if gf_poly_eval(err_poly, gf_exp[255-i]) == 0: err_pos.append(nmess-1-i) if len(err_pos) != errs: return None # couldn't find error locations return err_pos
Math note: When the error locator polynomial is linear (err_poly has length 2), it can be solved easily without resorting to a brute-force approach. The function presented above does not take advantage of this fact, but the interested reader may wish to implement the more efficient solution. Similarly, when the error locator is quadratic, it can be solved by using a generalization of the quadratic formula. A more ambitious reader may wish to implement this procedure as well.
Here is an example where three errors in the message are corrected.
>>> print(hex(msg)) 0x96 >>> msg = 6 >>> msg = 7 >>> msg = 8 >>> synd = rs_calc_syndromes(msg, 10) >>> pos = rs_find_errors(synd, len(msg)) >>> print(pos) [20, 10, 0] >>> rs_correct_errata(msg, synd, pos) >>> print(hex(msg)) 0x96
Error and erasure correction
In order to correct both errors and erasures, we must prevent the erasures from interfering with the error location process. This can be done by calculating the Forney syndromes, as follows.
def rs_forney_syndromes(synd, pos, nmess): fsynd = list(synd) # make a copy for i in range(0, len(pos)): x = gf_exp[nmess-1-pos[i]] for i in range(0,len(fsynd)-1): fsynd[i] = gf_mul(fsynd[i], x) ^ fsynd[i+1] fsynd.pop() return fsynd
The Forney syndromes can then be used in place of the regular syndromes in the error location process.
The function rs_correct_msg below brings the complete procedure together. Erasures are indicated by negative values in msg_in.
def rs_correct_msg(msg_in, nsym): msg_out = list(msg_in) # copy of message # find erasures erase_pos =  for i in range(0, len(msg_out)): if msg_out[i] < 0: msg_out[i] = 0 erase_pos.append(i) if len(erase_pos) > nsym: return None # too many erasures to correct synd = rs_calc_syndromes(msg_out, nsym) if max(synd) == 0: return msg_out # no errors fsynd = rs_forney_syndromes(synd, erase_pos, len(msg_out)) err_pos = rs_find_errors(fsynd, len(msg_out)) if err_pos == None: return None # error location failed rs_correct_errata(msg_out, synd, erase_pos + err_pos) synd = rs_calc_syndromes(msg_out, nsym) if max(synd) > 0: return None # message is still not right return msg_out
Python note: The lists erase_pos and err_pos are concatenated with the + operator.
This is the last piece needed for a fully operational error-and-erasure correcting Reed–Solomon decoder.
The basic principles of Reed–Solomon codes has been presented in this essay. Working Python code for a particular implementation (QR codes) has been included. Many variations and refinements of these ideas are possible, since coding theory is a very rich field of study.