Reed–Solomon codes for coders

From Wikiversity
Jump to: navigation, search

Reed–Solomon error correction is widely used in applications such as digital data storage (like CDs) and transmission. However, in these situations, the codes themselves are hidden inside an electronic device, so there's no opportunity to take a peek at them and see how they work. Some of the more complicated barcode designs also incorporate Reed–Solomon codes, which puts everything in plain view. This makes them an interesting subject for a hobbyist who wants to see firsthand how this kind of technology works.

In this essay, I will attempt to introduce the principles of Reed–Solomon codes from the point of view of a programmer rather than a mathematician. I will provide real-world examples taken from the popular QR code barcode system as well as working code samples. I chose to use Python for the samples (mainly because it looks pretty), but I will try to explain any non-obvious features for those who are not familiar with it. The math involved is advanced in the sense that it is not usually taught below the university level, but it should be understandable to someone with a good grasp of high-school algebra.

QR code structure[edit]

This section introduces the structure of QR codes in detail. The information in this section is deliberately incomplete. Only the more common features of the small 21×21 size symbols (also known as version 1) are presented here, but see the appendix for additional information.

Here is a QR symbol that will be used as an example. It consists of dark and light squares, known as modules in the barcoding world. The three square locator patterns in the corners are a visually distinctive feature of QR symbols.

QR Code Example.svg


A masking process is used to avoid features in the symbol that might confuse a scanner, such as misleading shapes that look like the locator patterns and large blank areas. Masking inverts certain modules (white becomes black and black becomes white) while leaving others alone.

In the diagram below, the red areas encode format information and use a fixed masking pattern. The data area (in black and white) is masked with a variable pattern. When the code is created, the encoder tries a number of different masks and chooses the one that minimizes undesirable features in the result. The chosen mask pattern is then indicated in the format information so that the decoder knows which one to use. The light gray areas are fixed patterns which do not encode any information. In addition to the obvious locator patterns, there are also timing patterns which contain alternating light and dark modules.

QR Code Masking Example.svg

The masking transformation is easily applied (or removed) using the exclusive-or operation (denoted by a caret ^ in many programming languages). The unmasking of the format information is shown below. Reading counter-clockwise around the upper-left locator pattern, we have the following sequence of bits. White modules represent 0 and black modules represent 1.

Input       101101101001011
Mask      ^ 101010000010010
Output      000111101011001

Format information[edit]

There are two identical copies of the format information, so that the symbol can still be decoded even if it is damaged. The second copy is broken in two pieces and placed around the other two locators, and is also read in a counter-clockwise direction (upwards in the lower-left corner, then left-to-right in the upper-right corner).

The first two bits of format information give the error correction level used for the message data. A QR symbol this size contains 26 bytes of information. Some of these are used to store the message and some are used for error correction, as shown in the table below. The left-hand column is simply a name given to that level.

Error Correction Level Level Indicator Error Correction Bytes Message Data Bytes
L 01 7 19
M 00 10 16
Q 11 13 13
H 10 17 9

The next three bits of format information select the masking pattern to be used in the data area. The patterns are illustrated below, including the mathematical formula that tells whether a module is black (i and j are the row and column numbers, respectively, and start with 0 in the upper-left hand corner).

QR Code Mask Patterns.svg

The remaining ten bits of format information are for correcting errors in the format itself. This will be explained in a later section.

Message data[edit]

Here is a larger diagram showing the "unmasked" QR code. Different regions of the symbol are indicated, including the boundaries of the message data bytes.

QR Code Unmasked.svg

Data bits are read starting from the lower-right corner and moving up the two right-hand columns in a zig-zag pattern. The first three bytes are 01000000 11010010 01110101. The next two columns are read in a downward direction, so the next byte is 01000111. Upon reaching the bottom, the two columns after that are read upward. Proceed in this up-and-down fashion all the way to the left side of the symbol (skipping over the timing pattern where necessary). Here is the complete message in hexadecimal notation.

Message data bytes: 40 d2 75 47 76 17 32 06 27 26 96 c6 c6 96 70 ec
Error correction bytes: bc 2a 90 13 6b af ef fd 4b e0


The final step is to decode the message bytes into something readable. The first four bits indicate how the message is encoded. QR codes use several different encoding schemes, so that different kinds of messages can be stored efficiently. These are summarized in the table below. After the mode indicator is a length field, which tells how many characters are stored. The size of the length field depends on the specific encoding.

Mode Name Mode Indicator Length Bits Data Bits
Numeric 0001 10 10 bits per 3 digits
Alphanumeric 0010 9 11 bits per 2 characters
Byte 0100 8 8 bits per character
Kanji 1000 8 13 bits per character

(The length field sizes above are valid only for smaller QR codes.)

Our sample message starts with 0100, indicating that there are 8 bits per character. The next 8 bits are the length field, 00001101, or 13 in decimal notation. After that are the actual characters of the message. The first two are 00100111 and 01010100 (the ASCII codes for apostrophe and T). Interested readers may want to decode the rest of the message for themselves.

After the last of the data bits is another 4-bit mode indicator. It can be different from the first one, allowing different encodings to be mixed within the same QR symbol. When there is no more data to store, the special end-of-message code 0000 is given. (Note that the standard allows the end-of-message code to be omitted if it wouldn't fit in the available number of data bytes.)

BCH codes[edit]

The format information is encoded with a BCH code which allows a certain number of bit-errors to be detected and corrected. BCH codes are a generalization of Reed–Solomon codes (all Reed–Solomon codes are also BCH codes). In the case of QR codes, the BCH code used for the format information is much simpler than the Reed–Solomon code used for the message data, so it makes sense to start with the BCH code for format information.

BCH error detection[edit]

The process for checking the encoded information is similar to long division, but uses exclusive-or instead of subtraction. The format code should produce a remainder of zero when it is is "divided" by the so-called generator of the code. QR format codes use the generator 10100110111. This process is demonstrated for the format information in the example code (000111101011001) below.

10100110111 ) 000111101011001
               ^ 10100110111 
                ^ 10100110111

Here is a Python function which implements this calculation.

def qr_check_format(fmt):
   g = 0x537 # = 0b10100110111 in python 2.6+
   for i in range(4,-1,-1):
      if fmt & (1 << (i+10)):
         fmt ^= g << i
   return fmt

Python note: The range function may not be clear to non-Python programmers. It produces a list of numbers counting down from 4 to 0. In C-derived languages, the for loop might be written as for (i = 4; i >= 0; i--); in Pascal-derived languages, for i := 4 downto 0.

Python note 2: The & operator performs bitwise and, while << is a left bit-shift. This is consistent with C-like languages.

This function can also be used to encode the 5-bit format information.

encoded_format = (format<<10) ^ qr_check_format(format<<10)

Readers may find it an interesting exercise to generalize this function to divide by different numbers. For example, larger QR codes contain six bits of version information with 12 error correction bits using the generator 1111100100101.

In mathematical formalism, these binary numbers are described as polynomials whose coefficients are integers mod 2. Each bit of the number is a coefficient of one term. For example:

10100110111 = 1 x10 + 0 x9 + 1 x8 + 0 x7 + 0 x6 + 1 x5 + 1 x4 + 0 x3 + 1 x2 + 1 x + 1 = x10 + x8 + x5 + x4 + x2 + x + 1

Principles of error correction[edit]

If the remainder produced by qr_check_format is not zero, then the code has been damaged or misread. The next step is to determine which format code is most likely the one that was intended. Before detailing the code, let's discuss a bit about how error correction generally works.

The main idea that makes error correction work is quite simple but yet very clever: instead of using a whole dictionary of words, let's use a smaller set of carefully selected words, so that each word is as different as any other.

Let's take a simple example: we have a dictionary with only three words of 4 letters: "this", "that" and "corn". Let's say we receive a corrupted word: "co**", where "*" is an erasure. Since we have only 3 words in our dictionary, we can easily compare our received word with our dictionary to find the word that is the closest. In this case, it's "corn". Thus the missing letters are "rn".

Now let's say we receive the word "th**". Here the problem is that we have two words in our dictionary that match the received word: "this" and "that". In this case, we cannot be sure which one it is, and thus we cannot decode. This means that our dictionary is not very good, and we should replace "that" with another more different word, such as "dash" to maximize the difference between each word.

The same principle is used for most error correcting codes: we generate only a limited dictionary containing only words with maximum separability (maximum Hamming distance, see next section), and then we communicate only with the words of this limited dictionary. If a word gets corrupted in the communication, that's no big deal since we can easily fix it by looking inside our dictionary and find the closest word, which is probably the correct one (there is however a chance of choosing a wrong one if the input message is heavily corrupted, but the probability is very small). Also, the longer our words are, the more separable they are, since more characters can be corrupted without any impact.

BCH error correction[edit]

Although sophisticated algorithms for decoding BCH codes exist, they are probably overkill in this case. Since there are only 32 possible format codes, it's much easier to simply try each one and pick the one that has the smallest number of bits different from the code in question (the number of different bits is known as the Hamming distance). This method of finding the closest code is known as exhaustive search, and is possible only because we have very few codes (a code is a valid message, and here there are only 32, all other binary numbers aren't correct).

(Note that Reed–Solomon is also based on this principle, but since the number of possible codewords is simply too big, we can't afford to do an exhaustive search, and that's why clever but complicated algorithms have been devised, such as Berlekamp-Massey.)

def hamming_weight(x):
   weight = 0
   while x > 0:
      weight += x & 1
      x >>= 1
   return weight

def qr_decode_format(fmt):
   best_fmt = -1
   best_dist = 15
   for test_fmt in range(0,32):
      test_code = (test_fmt<<10) ^ qr_check_format(test_fmt<<10)
      test_dist = hamming_weight(fmt ^ test_code)
      if test_dist < best_dist:
         best_dist = test_dist
         best_fmt = test_fmt
      elif test_dist == best_dist:
         best_fmt = -1
   return best_fmt

The function qr_decode_format returns -1 if the format code could not be unambiguously decoded. This happens when two or more format codes have the same distance from the input.

To run this code in Python, first start IDLE, Python's integrated development environment. You should see a version message and the interactive input prompt >>>. Open a new window, copy the functions qr_check_format, hamming_weight, and qr_decode_format into it, and save as Return to the prompt and type the lines following >>> below.

>>> from qr import *
>>> qr_decode_format(int("000111101011001",2))  # no errors
>>> qr_decode_format(int("111111101011001",2))  # 3 bit-errors
>>> qr_decode_format(int("111011101011001",2))  # 4 bit-errors

You can also start Python by typing python at a command prompt.

Finite field arithmetic[edit]

Before discussing the Reed–Solomon codes used for the message, it will be useful to introduce a bit more math.

We'd like to define addition, subtraction, multiplication, and division for 8-bit bytes and always produce 8-bit bytes as a result, so as to avoid any overflow. Naively, we might attempt to use the normal definitions for these operations, and then mod by 256 to keep results from overflowing. And this is exactly what we will be doing, and is what is called a Galois Field 2^8. You can easily imagine why it works for everything, except for division: what is 7/5 ?

Here's a brief introduction to Galois Fields: a finite field is a set of numbers, and a field need to have six properties: Closure, Associative, Commutative, Distributive, Identity and Inverse. More simply put, using a field allow to study the relationship between numbers of this field, and apply the result to any other field that follows the same properties. For example, the set of reals ℝ is a field.

However, integers ℤ aren't, because as we said above, not all divisions are defined (such as 7/5), which violates multiplicative inverse property (x such as 7*x=5 does not exist). One simply way to fix that is to use modulo using a prime number, such as 2: in this way, we are guaranteed that 7*x=5 exists since we will just wrap around. ℤ modulo 2 is called a Galois Field, and any number divisible by 2 is a Galois Field (since we need to modulo using a prime number), thus 256, the value of an 8-bit symbol, can be reduced to 2^8, and thus we say that we use a Galois Field of 2^8, or GF(2^8). More information on finite fields can be found here.

Here we will define the usual mathematical operations that you are used to do on integers, but adapted to GF(2^8), which is basically doing usual operations but modulo 2^8.

Another way to consider the link between GF(2) and GF(2^8) is to think that GF(2^8) represents a polynomial of 8 binary coefficients. For example, in GF(2^8), 170 is equivalent to 10101010 = 1*x^7 + 0*x^6 + 1*x^5 + 0*x^4 + 1*x^3 + 0*x^2 + 1*x + 0 = x^7 + x^5 + x^3 + x. Both representations are equivalent, it's just that in the first case, 170, the representation is decimal, and in the other case it's binary, which can be thought as representing a polynomial by convention (only used in GF(2^p) as explained here). The latter is often the representation used in academic books and in hardware implementations (because of logical gates and registers, which work at the binary level). For a software implementation, the decimal representation can be preferred for clearer and more close-to-the-maths code (this is what we will use for the code in this tutorial, except for some examples that will use the binary representation).

In any case, try to not confuse the polynomial representing a single GF(2^p) symbol (each coefficient is a bit/boolean: either 0 or 1), and the polynomial representing a list of GF(2^p) symbols (in this case the polynomial is equivalent to the message+RScode, each coefficient is a value between 0 and 2^p and represent one character of the message+RScode). We will first describe operations on single symbol, then polynomial operations on a list of symbols.

Addition and Subtraction[edit]

Both addition and subtraction are replaced with exclusive-or. This is logical: addition modulo 2 is exactly like an XOR, and subtraction modulo 2 is exactly the same as addition modulo 2. This is possible because additions and subtractions in this Galois Field are carry-less.

Thinking of our 8-bit values as polynomials with coefficients mod 2:

   0101 + 0110 = 0011

The same way (in binary representation of two single GF(2^8) integers):

(x2 + 1) + (x2 + x) = 2 x2 + x + 1 = 0 x2 + x + 1 = x + 1

Since (a ^ a) = 0, every number is its own opposite, so (x - y) is the same as (x + y).

Note that in books, you will find additions and subtractions to define some mathematical operations on GF integers, but in practice, you can just XOR (as long as you are in a Galois Field, this is not true in other fields).


Multiplication is likewise based on polynomial multiplication. Simply write the inputs as polynomials and multiply them out using the distributive law as normal. As an example, 10001001 times 00101010 is calculated as follows.

(x7 + x3 + 1) (x5 + x3 + x) = x7 (x5 + x3 + x) + x3 (x5 + x3 + x) + 1 (x5 + x3 + x)
= x12 + x10 + 2 x8 + x6 + x5 + x4 + x3 + x
= x12 + x10 + x6 + x5 + x4 + x3 + x

The same result can be obtained by a modified version of the standard grade-school multiplication procedure, in which we replace addition with exclusive-or.

*      00101010
^   10001001
^ 10001001

Here is a Python function which implements this polynomial multiplication on single GF(2^8) integers.

def cl_mul(x,y):
    '''Bitwise carry-less multiplication on integers'''
    z = 0
    i = 0
    while (y>>i) > 0:
        if y & (1<<i):
            z ^= x<<i
        i += 1
    return z

Of course, the result no longer fits in an 8-bit byte (in this example, it is 13 bits long), so we need to perform one more step before we are finished. The result is reduced modulo 100011101, using the long division process described previously. In this instance, this is called "modular reduction", because basically what we do is that we divide and keep only the remainder, using a modulo. This produces the final answer 11000011 in our example.

^ 100011101
  ^ 100011101
    ^ 100011101

Here is the Python code to do the whole Galois Field multiplication with modular reduction:

def gf_mult_noLUT(x, y, prim=0):
    '''Multiplication in Galois Fields without using a precomputed look-up table (and thus it's slower)'''

    ### Define bitwise carry-less operations as inner functions ###
    def cl_mult(x,y):
        '''Bitwise carry-less multiplication on integers'''
        z = 0
        i = 0
        while (y>>i) > 0:
            if y & (1<<i):
                z ^= x<<i
            i += 1
        return z

    def bit_length(n):
        '''Compute the position of the most significant bit (1) of an integer. Equivalent to int.bit_length()'''
        bits = 0
        while n >> bits: bits += 1
        return bits

    def cl_div(dividend, divisor=None):
        '''Bitwise carry-less long division on integers and returns the remainder'''
        # Compute the position of the most significant bit for each integers
        dl1 = bit_length(dividend)
        dl2 = bit_length(divisor)
        # If the dividend is smaller than the divisor, just exit
        if dl1 < dl2:
            return dividend
        # Else, align the most significant 1 of the divisor to the most significant 1 of the dividend (by shifting the divisor)
        for i in xrange(dl1-dl2,-1,-1):
            # Check that the dividend is divisible (useless for the first iteration but important for the next ones)
            if dividend & (1 << i+dl2-1):
                # If divisible, then shift the divisor to align the most significant bits and XOR (carry-less subtraction)
                dividend ^= divisor << i
        return dividend
    ### Main GF multiplication routine ###

    # Multiply the gf numbers
    result = cl_mult(x,y)
    # Then do a modular reduction (ie, remainder from the division) with an irreducible primitive polynomial so that it stays inside GF bounds
    if prim > 0:
        result = cl_div(result, prim)

    return result


>>> print bin(gf_mult_noLUT(a, b, 0)) # multiplication only
>>> print bin(gf_mult_noLUT(a, b, 0x11d)) # multiplication + modular reduction

Why mod 100011101 (in hexadecimal: 0x11d)? The math is a little complicated here, but in short, 100011101 represents an 8th degree polynomial which is "irreducible" (meaning it can't represented as the product of two smaller polynomials). This number is called a primitive polynomial. This is necessary for division to be well-behaved. There are other numbers we could have chosen, but they're all essentially the same, and 100011101 (0x11d) is a common primitive polynomial for Reed–Solomon codes.

Note for the interested reader: as an example of what you can achieve with clever algorithms, here is another way to achieve multiplication of GF numbers in a more concise and faster way, using the Russian Peasant Multiplication algorithm:

def gf_mult_noLUT(x, y, prim=0):
    '''Galois Field integer multiplication using Russian Peasant Multiplication algorithm'''
    r = 0
    while y: # while y is above 0
        if y & 1: r = r ^ x # y is odd, then add the corresponding x to r (the sum of all x's corresponding to odd y's gives the final product)
        y = y >> 1 # equivalent to y // 2
        x = x << 1 # equivalent to x*2
        if x & 0x100: x = x ^ prim # GF modulo: if x >= 256 then apply modular reduction using the primitive polynomial
                                   # (we just subtract, but since the primitive number can be above 256 then we directly XOR).
                                   # If you comment this line out (and change XORs by addition), you get the same result
                                   # as standard multiplication on integers.

    return r

Multiplication with logarithms[edit]

The procedure described above is not the most convenient way to implement Galois field multiplication. Multiplying two numbers takes up to eight iterations of the multiplication loop, followed by up to eight iterations of the division loop. However, we can multiply with no looping by using lookup tables. One solution would be to construct the entire multiplication table in memory, but that would require a bulky 64k table. The solution described below is much more compact.

First, notice that it is particularly easy to multiply by 00000010 (by convention, this number is referred to as α): simply left-shift by one place, then exclusive-or with the modulus 100011101 if necessary (why xor is sufficient for taking the mod in this case is an exercise left to the reader). Here are the first few powers of α.

α0 = 00000001 α4 = 00010000 α8  = 00011101 α12 = 11001101
α1 = 00000010 α5 = 00100000 α9  = 00111010 α13 = 10000111
α2 = 00000100 α6 = 01000000 α10 = 01110100 α14 = 00010011
α3 = 00001000 α7 = 10000000 α11 = 11101000 α15 = 00100110

If this table is continued in the same fashion, the powers of α do not repeat themselves until α255 = 00000001. Thus, every element of the field except zero is equal to some power of α. The element α is known as a primitive element or generator of the Galois field.

This observation suggests another way to implement multiplication: by adding the exponents of α.

10001001 * 00101010 = α74 * α142 = α74 + 142 = α216 = 11000011

The problem is, how do we find the power of α that corresponds to 10001001? This is known as the discrete logarithm problem, and no efficient general solution is known. However, since there are only 256 elements in this field, we can easily construct a table of logarithms. While we're at it, a corresponding table of antilogs (exponentials) will also be useful.

gf_exp = [0] * 512 # Create list of 512 elements. In Python 2.6+, consider using bytearray
gf_log = [0] * 256

def init_tables(prim):
    '''Precompute the logarithm and anti-log tables for faster computation later, using the provided primitive polynomial.'''
    # prim is the primitive (binary) polynomial. Since it's a polynomial in the binary sense,
    # it's only in fact a single galois field value between 0 and 255, and not a list of gf values.
    global gf_exp, gf_log
    gf_exp = [1] * 512 # anti-log (exponential) table
    gf_log = [0] * 256 # log table
    # For each possible value in the galois field 2^8, we will pre-compute the logarithm and anti-logarithm (exponential) of this value
    x = 1
    for i in range(1, 255):
        x <<= 1
        if x & 0x100: # equivalent to x mod 255 == 0, but a lot faster (because 0x100 == 256)
            x ^= prim # subtract the primary polynomial to the current value (instead of 255, so that we get a unique set
                      # made of coprime numbers), this is the core of the tables generation
        gf_exp[i] = x # compute anti-log for this value and store it in a table
        gf_log[x] = i # compute log at the same time
    # Optimization: double the size of the anti-log table so that we don't need to mod 255 to
    # stay inside the bounds (because we will mainly use this table for the multiplication of two GF numbers, no more).
    for i in range(255, 512):
        gf_exp[i] = gf_exp[i - 255]
    return [gf_log, gf_exp]

Python note: The range operator's upper bound is exclusive, so gf_exp[255] is not set twice by the above.

The gf_exp table is oversized in order to simplify the multiplication function. This way, we don't have to check to make sure that gf_log[x] + gf_log[y] is within the table size.

def gf_mul(x,y):
   if x==0 or y==0:
      return 0
   return gf_exp[gf_log[x] + gf_log[y]]


Another advantage of the logarithm table approach is that it allows us to define division using the difference of logarithms. In the code below, 255 is added to make sure the difference isn't negative.

def gf_div(x,y):
   if y==0:
      raise ZeroDivisionError()
   if x==0:
      return 0
   return gf_exp[gf_log[x] + 255 - gf_log[y]]

Python note: The raise statement throws an exception and aborts execution of the gf_div function.

With this definition of division, gf_div(gf_mul(x,y),y)==x for any x and any nonzero y.

Readers who are more advanced programmers may find it interesting to write a class encapsulating Galois field arithmetic. Operator overloading can be used to replace calls to gf_mul and gf_div with the familiar operators * and /, but this can lead to confusion as to exactly what type of operation is being performed. Certain details can be generalized in ways that would make the class more widely useful. For example, Aztec codes use five different Galois fields with element sizes ranging from 4 to 12 bits.


Before moving on to Reed–Solomon codes, we need to define several operations on polynomials whose coefficients are Galois field elements. This is a potential source of confusion, since the elements themselves are described as polynomials; my advice is not to think about it too much. Adding to the confusion is the fact that x is still used as the placeholder. This x has nothing to do with the x mentioned previously, so don't mix them up.

The binary notation used previously for Galois field elements starts to become inconveniently bulky at this point, so I will switch to hexadecimal instead.

00000001 x4 + 00001111 x3 + 00110110 x2 + 01111000 x + 01000000 = 01 x4 + 0f x3 + 36 x2 + 78 x + 40

In Python, polynomials will be represented by a list of numbers in descending order of powers of x, so the polynomial above becomes [ 0x01, 0x0f, 0x36, 0x78, 0x40 ]. (The reverse order could have been used instead; both choices have their advantages and disadvantages.)

The first function multiplies a polynomial by a scalar.

def gf_poly_scale(p,x):
   r = [0] * len(p)
   for i in range(0, len(p)):
      r[i] = gf_mul(p[i], x)
   return r

Note to Python programmers: This function is not written in a "pythonic" style. It could be expressed quite elegantly as a list comprehension, but I have limited myself to language features that are easier to translate to other programming languages.

This function "adds" two polynomials (using exclusive-or, as usual).

def gf_poly_add(p,q):
   r = [0] * max(len(p),len(q))
   for i in range(0,len(p)):
      r[i+len(r)-len(p)] = p[i]
   for i in range(0,len(q)):
      r[i+len(r)-len(q)] ^= q[i]
   return r

The next function multiplies two polynomials.

def gf_poly_mul(p,q):
   r = [0] * (len(p)+len(q)-1)
   for j in range(0, len(q)):
      for i in range(0, len(p)):
         r[i+j] ^= gf_mul(p[i], q[j])
   return r

Finally, we need a function to evaluate a polynomial at a particular value of x, producing a scalar result. Horner's method is used to avoid explicitly calculating powers of x.

01 x4 + 0f x3 + 36 x2 + 78 x + 40 = (((01 x + 0f) x + 36) x + 78) x + 40
def gf_poly_eval(p,x):
   y = p[0]
   for i in range(1, len(p)):
      y = gf_mul(y,x) ^ p[i]
   return y

Reed–Solomon codes[edit]

Now that the preliminaries are out of the way, we are ready to begin looking at Reed–Solomon codes.

Insight of the coding theory[edit]

But first, why did we have to learn about finite fields and polynomials? Because this is the main insight of error-correcting codes like Reed–Solomon: instead of just seeing a message as a series of (ASCII) numbers, we see it as a polynomial following the very well-defined rules of finite field arithmetic. In other words, by representing the data using polynomials and finite fields arithmetic, we added a structure to the data. This structure, that we always know because it's outside and independent of the data, is what allows us to repair a corrupted message.

Thus, even if in your code implementation you may choose to not explicitly represent the polynomials and the finite field arithmetic, these notions are essential for the error-correcting codes to work, and you will find these notions to underlie (even if implicitly) any implementation.

And now we will put these notions into practice!

RS generator polynomial[edit]

Reed–Solomon codes use a generator polynomial similar to BCH codes. The generator is the product of factors (x - αn), starting with n=0 for QR codes. For example:

g4(x) = (x - α0) (x - α1) (x - α2) (x - α3) = 01 x4 + 0f x3 + 36 x2 + 78 x + 40

Here is a function that computes the generator polynomial for a given number of error correction symbols.

def rs_generator_poly(nsym):
   g = [1]
   for i in range(0,nsym):
      g = gf_poly_mul(g, [1, gf_exp[i]])
   return g

This function is somewhat inefficient in that it allocates successively larger arrays for g. While this is unlikely to be a performance problem in practice, readers who are inveterate optimizers may find it interesting to rewrite it so that g is only allocated once.

RS encoding[edit]

Like BCH codes, Reed–Solomon codes are encoded by dividing the polynomial representing the message by an irreducible generator polynomial, and then the remainder is the RS code, which we will just append to the original message.

Why? We previously said that the principle behind BCH codes, and most other error correcting codes, is to use a limited dictionary with very different words as to maximize the distance between words, and that longer words have greater distance: here it's the same principle, first because we lengthen the original message with additional symbols (the remainder) which raise the distance, and secondly because the remainder is almost unique (thanks to the carefully designed irreducible generator polynomial), so that it can be exploited by clever algorithms to deduce parts of the original message.

Several algorithms for polynomial division exist, the simplest one that is often taught in high school is long division. This example shows the calculation for the message 12 34 56.

                             12 da df
01 0f 36 78 40 ) 12 34 56 00 00 00 00
               ^ 12 ee 2b 23 f4
                    da 7d 23 f4 00
                  ^ da a2 85 79 84
                       df a6 8d 84 00
                     ^ df 91 6b fc d9
                          37 e6 78 d9

The remainder is concatenated with the message, so the encoded message is 12 34 56 37 e6 78 d9.

However, long division is quite slow as it requires a lot of recursive iterations to terminate. More efficient strategies can be devised, such as using synthetic division (also called Horner's method, a good tutorial video can be found on Khan Academy). Here is a function that implements extended synthetic division of GF(2^p) polynomials (extended because the divisor is a polynomial instead of a monomial):

def gf_poly_div(dividend, divisor):
    '''Fast polynomial division by using Extended Synthetic Division and optimized for GF(2^p) computations
    (doesn't work with standard polynomials outside of this galois field, see the Wikipedia article for generic algorithm).'''

    msg_out = bytearray(dividend) # Copy the dividend list and pad with 0 where the ecc bytes will be computed
    for i in xrange(len(dividend)-(len(divisor)-1)):
        coef = msg_out[i] # precaching
        if coef != 0: # log(0) is undefined, so we need to avoid that case explicitly (and it's also a good optimization)
            for j in xrange(1, len(divisor)): # the divisor is usually monic, thus we can skip the first coefficient
                msg_out[i + j] ^= gf_mul(divisor[j], coef) # equivalent to the more mathematically correct
                                                           # (but xoring is faster): msg_out[i + j] += -divisor[j] * coef

    # The resulting msg_out contains both the quotient and the remainder, the remainder being the size of the divisor
    # (the remainder has necessarily the same degree as the divisor -- not length but degree == length-1 -- since it's
    # what we couldn't divide from the dividend), so we compute the index where this separation is, and return the quotient and remainder.
    separator = -(len(divisor)-1)
    return msg_out[:separator], msg_out[separator:] # return quotient, remainder.

And now, here's how to encode a message to get its RS code:

def rs_encode_msg(msg_in, nsym):
    '''Reed-Solomon main encoding function'''
    gen = rs_generator_poly(nsym)

    # Pad the message, then divide it by the irreducible generator polynomial
    _, remainder = gf_poly_div(bytearray(msg_in) + bytearray(len(gen)-1), gen)
    # The remainder is our RS code! Just append it to our original message to get our full codeword (this represents a polynomial of max 256 terms)
    msg_out = msg_in + remainder
    # Return the codeword
    return msg_out

Simple, isn't it? Encoding is in fact the easiest part in Reed–Solomon, and it's always the same approach (polynomial division). Decoding is the tough part of Reed–Solomon, and you will find a lot of different algorithms depending on your needs, but we will touch on that later on.

This function is quite fast, but since encoding is quite critical, here is an enhanced encoding function that inlines the polynomial synthetic division, which is the form that you will most often find in Reed–Solomon software libraries:

def rs_encode_msg(msg_in, nsym):
   '''Reed-Solomon main encoding function, using polynomial division (algorithm Extended Synthetic Division)'''
   gen = rs_generator_poly(nsym)
   # Init msg_out with the values inside msg_in and pad with len(gen)-1 bytes (which is the number of ecc symbols).
   msg_out = [0] * (len(msg_in) + len(gen)-1)
   # Initializing the Synthetic Division with the dividend (= input message polynomial)
   msg_out[:len(msg_in)] = msg_in

   # Synthetic division main loop
   for i in range(len(msg_in)):
      # Note that it's msg_out here, not msg_in. Thus, we reuse the updated value at each iteration
      # (this is how Synthetic Division works, but instead of storing in a temporary register the intermediate values,
      # we directly commit them to the output).
      coef = msg_out[i]

      # log(0) is undefined, so we need to manually check for this case. There's no need to check
      # the divisor here because we know it can't be 0 since we generated it.
      if coef != 0:
         # We can skip the first coefficient of the divisor here because it's always monic (meaning that it's always 1).
         for j in range(1, len(gen)):
            msg_out[i+j] ^= gf_mul(gen[j], coef) # equivalent to msg_out[i+j] += gf_mul(gen[j], coef)

   # At this point, the Extended Synthetic Divison is done, msg_out contains the quotient in msg_out[:len(msg_in)]
   # and the remainder in msg_out[len(msg_in):]. Here for RS encoding, we don't need the quotient but only the remainder
   # (which represents the RS code), so we can just overwrite the quotient with the input message, so that we get
   # our complete codeword composed of the message + code.
   msg_out[:len(msg_in)] = msg_in

   return msg_out

This algorithm is faster, but it's still quite slow for practical use, particularly in Python. There are some ways to optimize the speed by using various tricks, such as inlining (instead of gf_mul, replace by the operation to avoid a call), by precomputing (the logarithm of gen and of coef, or even by generating a multiplication table – but it seems the latter does not work well in Python), by using statically typed constructs (assign gf_log and gf_exp to array.array('i', [...])), by using memoryviews (like by changing all your lists to bytearrays), by running it with PyPy, or by converting the algorithm into a Cython or a C extension.

This example shows the encode function applied to the message in the sample QR code introduced earlier. The calculated error correction symbols (on the second line) match the values decoded from the QR code.

>>> msg_in = [ 0x40, 0xd2, 0x75, 0x47, 0x76, 0x17, 0x32, 0x06,
...            0x27, 0x26, 0x96, 0xc6, 0xc6, 0x96, 0x70, 0xec ]
>>> msg = rs_encode_msg(msg_in, 10)
>>> for i in range(0,len(msg)):
...    print(hex(msg[i]), end=' ')
0x40 0xd2 0x75 0x47 0x76 0x17 0x32 0x6 0x27 0x26 0x96 0xc6 0xc6 0x96 0x70 0xec
0xbc 0x2a 0x90 0x13 0x6b 0xaf 0xef 0xfd 0x4b 0xe0

Python version note: The syntax for the print function has changed, and this example uses the Python 3.0+ version. In previous versions of Python (particularly Python 2.x), replace the print line with print hex(msg[i]), (including the final comma) and range by xrange.

Syndrome calculation[edit]

Decoding a Reed–Solomon message involves several steps. The first step is to calculate the "syndrome" of the message. Treat the message as a polynomial and evaluate it at α0, α1, α2, ..., αn. Since these are the zeros of the generator polynomial, the result should be zero if the scanned message is undamaged (this can be used to check if the message is corrupted, and after correction of a corrupted message if the message was completely repaired). If not, the syndromes contain all the information necessary to determine the correction that should be made. It is simple to write a function to calculate the syndromes.

def rs_calc_syndromes(msg, nsym):
   synd = [0] * nsym
   for i in range(0, nsym):
      synd[i] = gf_poly_eval(msg, gf_exp[i])
   return synd

Continuing the example, we see that the syndromes of the original codeword without any corruption are indeed zero. Introducing a corruption of at least one character into the message or its RS code gives nonzero syndromes.

>>> synd = rs_calc_syndromes(msg, 10)
>>> print(synd)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> msg[0] = 0  # deliberately damage the message
>>> synd = rs_calc_syndromes(msg, 10)
>>> print(synd)
[64, 192, 93, 231, 52, 92, 228, 49, 83, 245]

Erasure correction[edit]

It is simplest to correct mistakes in the code if the locations of the mistakes are already known. This is known as erasure correction. It is possible to correct one erased symbol (ie, character) for each error-correction symbol added to the code. If the error locations are not known, two EC symbols are needed for each symbol error (so you can correct twice less errors than erasures). This makes erasure correction useful in practice if part of the QR code being scanned is covered or physically torn away. It may be difficult for a scanner to determine that this has happened, though, so not all QR code scanners can perform erasure correction.

The most common way to correct erasures (or errors) with Reed-Solomon is to process through 4 stages:

  1. Compute the syndromes (done above). This allows us to analyze what characters are in error using Berlekamp-Massey.
  2. Compute the erasure/error locator polynomial. This is computed by Berlekamp-Massey.
  3. Compute the erasure/error evaluator polynomial (from the syndromes and erasure/error locator polynomial). This will tell us where are the errors exactly.
  4. Compute the erasure/error magnitude polynomial (from all 3 polynomials above), which is in fact the values that need to be subtracted from the received message to get the original, correct message (i.e., with correct values for erased characters).

Now that we already have the syndromes, we need to compute the locator polynomial. This is easy:

def rs_find_errata_locator(e_pos, x=None):
    '''Compute the erasures/errors/errata locator polynomial from the erasures/errors/errata positions
       (the positions must be relative to the x coefficient, eg: "hello worldxxxxxxxxx" is tampered to "h_ll_ worldxxxxxxxxx"
       with xxxxxxxxx being the ecc of length n-k=9, here the string positions are [1, 4], but the coefficients are reversed
       since the ecc characters are placed as the first coefficients of the polynomial, thus the coefficients of the
       erased characters are n-1 - [1, 4] = [18, 15] = erasures_loc to be specified as an argument.'''

    e_loc = [1] # just to init because we will multiply, so it must be 1 so that the multiplication starts correctly without nulling any term
    # erasures_loc = product(1 - x*alpha**i) for i in erasures_pos and where alpha is the alpha chosen to evaluate polynomials.
    for x in e_pos:
        e_loc = gf_poly_mul( e_loc, gf_poly_add([1], [gf_exp[x], 0]) )
    return e_loc

Next, computing the erasure/error evaluator polynomial from the locator polynomial is easy, it's simply a polynomial multiplication followed by a polynomial division (that you can replace by a list slicing because that's the effect we want in the end):

def rs_find_error_evaluator(synd, err_loc, nsym):
    '''Compute the error (or erasures if you supply sigma=erasures locator polynomial, or errata) evaluator polynomial Omega
       from the syndrome and the error/erasures/errata locator Sigma.'''

    # Omega(x) = [ Synd(x) * Error_loc(x) ] mod x^(n-k+1)
    _, remainder = gf_poly_div( gf_poly_mul(synd, err_loc), ([1] + [0]*(nsym+1)) ) # first multiply syndromes * errata_locator, then do a
                                                                                   # polynomial division to truncate the polynomial to the
                                                                                   # required length

    # Faster way that is equivalent
    #remainder = gf_poly_mul(synd, err_loc) # first multiply the syndromes with the errata locator polynomial
    #remainder = remainder[len(remainder)-(nsym+1):] # then slice the list to truncate it (which represents the polynomial), which
                                                     # is equivalent to dividing by a polynomial of the length we want

    return remainder

Finally, the Forney algorithm is used to calculate the correction values (also called the error magnitude polynomial). It is implemented in the function below.

def rs_correct_errata(msg, synd, pos): # pos is the positions of the errors/erasures/errata
    '''Forney algorithm, computes the values (error magnitude) to correct the input message.'''
    # calculate errata locator polynomial to correct both errors and erasures
    # (by combining the positions given by the error locator polynomial found by BM with the erasures positions)
    coef_pos = [len(msg) - 1 - pos[i] for i in pos] # need to convert the positions to coefficients degrees for the errata locator algo to work
                                                    # (eg: instead of [0, 1, 2] it will become [len(msg)-1, len(msg)-2, len(msg) -3])
    loc = rs_find_errata_locator(coef_pos)
    # calculate errata evaluator polynomial (also called Omega in academic papers)
    eval = rs_find_error_evaluator(synd[0:len(pos)][::-1], loc, len(pos)-1)
    # computing formal derivative of errata locator, which is simple: we just eliminates even terms (because derivative in GF(2) is
    # just eliminating even coefficients)
    # the formal derivative of the errata locator is used as the denominator of the Forney Algorithm, which simply says that
    # the ith error value is given by error_evaluator(gf_inverse(Xi)) / error_locator_derivative(gf_inverse(Xi)).
    # See Blahut, Algebraic codes for data transmission, pp 196-197.
    locprime = loc[len(loc)&1:len(loc):2]
    # compute corrections using Forney algorithm
    # Forney algorithm compute the errata magnitude, it means that we calculate the value than needs to be subtracted/added
    # to each errata character to repair it
    for i in range(len(pos)):
        x = gf_exp[pos[i] + 256 - len(msg)] # value to evaluate the polynomials at
        y = gf_poly_eval(eval, x) # numerator of the Forney algorithm (errata evaluator evaluated)
        z = gf_poly_eval(locprime, gf_mul(x,x)) # denominator of the Forney algorithm (errata locator derivative)
        magnitude = gf_div(y, gf_mul(x, z)) # Forney algorithm: dividing the errata evaluator with the errata locator derivative
                                            # gives us the errata magnitude (ie, value to repair) the ith symbol
        # Apply on the message, same as gf_poly_add(msg, magnitude) (this isn't the Forney algorithm, we just apply the result here)
        msg[pos[i]] ^= magnitude # equivalent to Ci = Ri - Ei where Ci is the correct message, Ri the received (senseword) message,
                                 # and Ei the errata magnitudes. So in fact here we subtract from the received message the errors magnitude,
                                 # which logically corrects the value to what it should be.
    return msg

Math note: The denominator of the expression for the error value is the formal derivative of the error locator polynomial q. This is calculated by the usual procedure of replacing each term cn xn with n cn xn-1. Since we're working in a field of characteristic two, n cn is equal to cn when n is odd, and 0 when n is even. Thus, we can simply remove the even coefficients (resulting in the polynomial qprime) and evaluate qprime(x2).

Python note: This function uses array slicing to extract parts of an array. The expression synd[0:len(pos)] returns the first few elements of synd, while p[len(p)-len(pos):len(p)] returns the last few elements of p. The more complicated expression q[len(q)&1:len(q):2] returns every second element of q, skipping the first element if the length of q is odd.

Continuing the example, here we use rs_correct_errata to restore the first byte of the message.

>>> msg[0] = 0
>>> synd = rs_calc_syndromes(msg, 10)
>>> rs_correct_errata(msg, synd, [0])
>>> print(hex(msg[0]))

Error correction[edit]

In the more likely situation where the error locations are unknown, the first step is to find them. The Berlekamp–Massey algorithm is used to calculate the error locator polynomial.

def rs_gen_error_poly(synd):
    '''find error locator polynomial with Berlekamp-Massey algorithm'''
    # The idea is that BM will iteratively estimate the error locator polynomial.
    # To do this, it will compute a Discrepancy term called Delta, which will tell us if the error locator polynomial needs an update or not
    # (hence why it's called discrepancy: it tells us when we are getting off board from the correct value).

    # Init the polynomials
    err_loc = [1] # This is the main variable we want to fill, also called sigma in other notations
    old_loc = [1] # BM is an iterative algorithm, and we need the error locator polynomial of the previous iteration in order to update
    for i in range(len(synd)):
        # Shift the polynomial to compute the next degree

        # Compute the discrepancy Delta
        # In fact it's a polynomial multiplication gf_poly_mul(err_loc, synd), but we need only one item at the position i.
        # Thus to optimize, we compute the polymul only at the item we need, skipping the rest (avoiding a nested loop, thus we are linear time).
        delta = synd[i]
        for j in range(1, len(err_loc)):
            delta ^= gf_mul(err_loc[len(err_loc) - 1 - j], synd[i - j])

        # Iteratively estimate the error locator polynomial
        if delta != 0: # Rule B (rule A is implicitly defined because rule A just says that we skip any modification for this iteration)
            if len(old_loc) > len(err_loc):
                # Computing error locator polynomial
                new_loc = gf_poly_scale(old_loc, delta)
                old_loc = gf_poly_scale(err_loc, gf_inverse(delta)) # effectively we are doing err_loc * 1/delta = err_loc // delta
                err_loc = new_loc
            # Update with the discrepancy
            err_loc = gf_poly_add(err_loc, gf_poly_scale(old_loc, delta))

    # Check if the result is correct, that there's not too many errors to correct
    err_loc = list(itertools.dropwhile(lambda x: x == 0, err_loc)) # drop leading 0s, else errs will not be of the correct size
    errs = len(err_loc) - 1
    if errs * 2 > len(synd):
        return None    # too many errors to correct

    return err_loc

Then, using the error locator polynomial, we simply use a brute-force approach called trial substitution to find the zeros of this polynomial, which identifies the error locations (ie, the index of the characters that need to be corrected). A more efficient algorithm called Chien search exists, which avoids recomputing the whole evaluation at each iteration step, but this algorithm is left as an exercise to the reader.

def rs_find_errors(err_loc, nmess): # nmess is len(msg_in)
    '''Find the roots (ie, where evaluation = zero) of error polynomial by brute-force trial, this is a sort of Chien's search
    (but less efficient, Chien's search is a way to evaluate the polynomial such that each evaluation only takes constant time).'''
    errs = len(err_loc) - 1
    err_pos = []
    for i in range(nmess):
        if gf_poly_eval(err_loc, gf_exp[255 - i]) == 0: # It's a 0? Bingo, it's a root of the error locator polynomial,
                                                        # in other terms this is the location of an error
            err_pos.append(nmess - 1 - i)
    if len(err_pos) != errs:
        return None    # couldn't find error locations
    return err_pos

Math note: When the error locator polynomial is linear (err_poly has length 2), it can be solved easily without resorting to a brute-force approach. The function presented above does not take advantage of this fact, but the interested reader may wish to implement the more efficient solution. Similarly, when the error locator is quadratic, it can be solved by using a generalization of the quadratic formula. A more ambitious reader may wish to implement this procedure as well.

Here is an example where three errors in the message are corrected:

>>> print(hex(msg[10]))
>>> msg[0] = 6
>>> msg[10] = 7
>>> msg[20] = 8
>>> synd = rs_calc_syndromes(msg, 10)
>>> err_loc = rs_find_errors(synd)
>>> pos = rs_find_errors(err_loc, len(msg))
>>> print(pos)
[20, 10, 0]
>>> rs_correct_errata(msg, synd, pos)
>>> print(hex(msg[10]))

Error and erasure correction[edit]

It is possible for a Reed–Solomon decoder to decode both erasures and errors at the same time, up to a limit (called the Singleton Bound) of 2*e+v <= (n-k), where e is the number of errors, v the number of erasures and (n-k) the number of RS code characters (called nsym in the code). Basically, it means that for every erasures, you just need one RS code character to repair it, while for every errors you need two RS code characters (because you need to find the position in addition of the value/magnitude to correct). Such a decoder is called an errors-and-erasures decoder, or an errata decoder.

In order to correct both errors and erasures, we must prevent the erasures from interfering with the error location process. This can be done by calculating the Forney syndromes, as follows.

def rs_forney_syndromes(synd, pos, nmess):
   fsynd = list(synd)      # make a copy
   for i in range(0, len(pos)):
      x = gf_exp[nmess-1-pos[i]]
      for j in range(0,len(fsynd)-1):
         fsynd[j] = gf_mul(fsynd[j], x) ^ fsynd[j+1]
   return fsynd

The Forney syndromes can then be used in place of the regular syndromes in the error location process.

The function rs_correct_msg below brings the complete procedure together. Erasures are indicated by negative values in msg_in.

def rs_correct_msg(msg_in, nsym):
   msg_out = list(msg_in)     # copy of message
   # find erasures and set them to null bytes for easier decoding (it's up to you to signal them by assigning these
   # characters in msg_in with the special value -1, which is not a valid string character nor a galois field value).
   erase_pos = []
   for i in range(0, len(msg_out)):
      if msg_out[i] < 0:
         msg_out[i] = 0
   # check if there are too many erasures to correct (beyond the Singleton bound)
   if len(erase_pos) > nsym: return None
   # prepare the syndrome polynomial using only errors (ie: errors = characters that were either replaced by null byte
   # or changed to another character, but we don't know their positions)
   synd = rs_calc_syndromes(msg_out, nsym)
   # check if there's any error/erasure in the input codeword. If not (all syndromes coefficients are 0), then just return the message as-is.
   if max(synd) == 0:
      return msg_out  # no errors
   # compute the Forney syndromes, which hide the erasures from the original syndrome (so that BM will just have to deal with errors, not erasures)
   fsynd = rs_forney_syndromes(synd, erase_pos, len(msg_out))
   # compute the error locator polynomial using Berlekamp-Massey
   err_poly = rs_gen_error_poly(fsynd)
   # locate the message errors using Chien search (or brute-force search)
   err_pos = rs_find_errors(fsynd, len(msg_out))
   if err_pos == None:
      return None    # error location failed
   # compute errata evaluator and errata magnitude polynomials, then correct errors and erasures
   rs_correct_errata(msg_out, synd, erase_pos + err_pos) # note that we here use the original syndrome, not the forney syndrome
                                                         # (because we will correct both errors and erasures, so we need the full syndrome)
   # check if the final message is fully repaired
   synd = rs_calc_syndromes(msg_out, nsym)
   if max(synd) > 0:
      return None     # message could not be repaired
   # return the successfully decoded message
   return msg_out

Python note: The lists erase_pos and err_pos are concatenated with the + operator.

This is the last piece needed for a fully operational error-and-erasure correcting Reed–Solomon decoder. If you want to delve more into the inner workings of errata (errors-and-erasures) decoders, you can read the excellent book "Algebraic Codes for Data Transmission" (2003) by Richard E. Blahut.

Math note: in some software implementations, particularly the ones using a language optimized for linear algebra and matrix operations, you will find that the algorithms (encoding, Berlekamp-Massey, etc.) will seem totally different and use the Fourier Transform. This is because this is totally equivalent: when stated in the jargon of spectral estimation, decoding Reed–Solomn consists of a Fourier transform (syndrome computer), followed by a spectral analysis (Berlekamp-Massey or Euclidian algorithm), followed by an inverse Fourier transform (Chien search). See the Blahut book for more info[1]. Indeed, if you are using a programming language optimized for linear algebra, or if you want to use fast linear algebra libraries, it can be a very good idea to use Fourier Transform since it's very fast nowadays (particularly the Fast Fourier Transform).

Conclusion and going further[edit]

The basic principles of Reed–Solomon codes has been presented in this essay. Working Python code for a particular implementation (QR codes) has been included. The code presented here is quite generic and can be used for any purpose beyond QR codes where you need to correct errors/erasures, such as file protection, networking, etc. Many variations and refinements of these ideas are possible, since coding theory is a very rich field of study.

One immediate issue that you may have noticed is that we can only encode messages of up to 256 characters. This limit can be circumvented by several ways, the three most common being:

  • using a higher Galois Field, for example 216 which would allow for 65536 characters, or 232, 264, 2128, etc. The issue here is that polynomial computations required to encode and decode Reed–Solomon become very costly with big polynomials (most algorithms being in quadratic time, the most efficient being in n log n).
  • by "chunking", which means that you simply encode your big data stream by chunks of 256 characters.
  • using a variant algorithm that includes a packet size such as Cauchy Reed–Solomon (see below).

If you want to go further, there are a lot of books and scientific articles on Reed–Solomon codes, a good starting point is the author Richard Blahut who is notable in the domain. Also, there are a lot of different ways that Reed–Solomon codes can be encoded and decoded, and thus you will find many different algorithms, in particular for decoding (Berlekamp-Massey, Berlekamp-Welch, Euclidian algorithm, etc.). There are also variants that are more fit for certain use cases, such as Cauchy-Reed-Solomon which adds an additional packet-size parameter which controls the number of symbols that are to be put in each ECC block (so that you are not anymore limited to 28 = 256 characters but to (28 × packet size) and which is a lot faster and resilient to burst errors than standard Reed–Solomon.

Even if near-optimal forward error correction algorithms are all the rage nowadays (such as LDPC codes, Turbo codes, etc.) because of their great speed, Reed–Solomon is an optimal FEC, which means that it can attain the theoretical limit known as the Singleton bound. In practice, this means that RS can correct up to 2*e+v <= (n-k) errors and erasures at the same time, where e is the number of errors, v the number of erasures, k the message size, n the message+code size and (n-k) the minimum distance. This is not to say that near-optimal FEC are useless: they are unimaginably faster than Reed–Solomon could ever be, and they may suffer less from the cliff effect (which means they may still partially decode parts of the message even if there are too many errors to correct all errors), but they surely can't correct as many errors as Reed–Solomon. Choosing between a near-optimal and an optimal FEC is mainly a concern of speed.

Lately, the research field on Reed–Solomon has regained some vitality since the discovery of w:List_decoding (not to confuse with soft decoding), which allows to decode/repair more symbols than the theoretical optimal limit. The core idea is that, instead of standard Reed–Solomon which only do a unique decoding (meaning that it always results in a single solution, if it cannot because it's above the theoretical limit the decoder will return an error or a wrong result), Reed–Solomon with list decoding will still try to decode beyond the limit and get several possible results, but by a clever examination of the different results, it's often possible to discriminate only one polynomial that is probably the correct one.

A few list decoding algorithms are already available that allows to repair up to n - sqrt(n*k)[2] instead of 2*e+v <= (n-k), and other list decoding algorithms (more efficient or decoding more symbols) are currently being investigated.

Third-party implementations[edit]

Here are a few implementations of Reed–Solomon if you want to see practical examples:

External links[edit]


  1. Richard E. Blahut, "Algebraic Codes for Data Transmission", 2003, chapter 7.6 "Decoding in Time Domain"
  2. "Reed-Solomon Error-correcting Codes - The Deep Hole Problem", by Matt Keti, Nov 2012