# Python Concepts/List Comprehension

## Objective To understand and use List Comprehensions, including Nested List Comprehensions.

## Lesson

 To create a list of squares of numbers in `range(1,6)` we can do something like: ```>>> a = [] >>> for x in range(1,6) : a += [x*x] ... >>> a [1, 4, 9, 16, 25] ``` This approach has the perhaps undesirable side-effect of creating or re-assigning both `a` and `x.` It would be convenient if we could do something like: ```[for x in range(1,6) : [x*x]] # This produces SyntaxError: invalid syntax. ``` If we change the syntax slightly, List Comprehensions come to the rescue.

## List Comprehensions

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.

Consider the task mentioned above, to create a list of squares of numbers in `range(1,6).` Modify the syntax slightly, and put it into the form of a List Comprehension or "listcomp":

```>>> [x*x for x in range(1,6)]
[1, 4, 9, 16, 25]
>>>
```

In the listcomp above, `x` is local to the listcomp, and the final bracket `']'` tells python that it should expect no more input.

A list comprehension consists of brackets containing an expression followed by a `for` clause, then zero or more `for` or `if` clauses. The result will be a new list. Consider some examples:

Given:

```>>> a = list(range(-5,7)) ; a
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6]
>>>
```

### Create a deep copy of list `a`:

```>>> b = [p for p in a] ; b
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6]
>>> a is b
False # a and b are different entities.
>>>
```

### Display the position and value of all odd numbers in list `a`:

```>>> [(i,a[i]) for i in range(len(a)) if a[i] % 2]
[(0, -5), (2, -3), (4, -1), (6, 1), (8, 3), (10, 5)]
>>>
```

The listcomp above is equivalent to:

```b = [
( 0,  a),
#( 1,  a), not included. a is even.
( 2,  a),
#( 3,  a), not included. a is even.
( 4,  a),
#( 5,  a), not included. a is even.
( 6,  a),
#( 7,  a), not included. a is even.
( 8,  a),
#( 9,  a), not included. a is even.
(10, a),
#(11, a) not included. a is even.
]

print ('b =', b)
```
```b = [(0, -5), (2, -3), (4, -1), (6, 1), (8, 3), (10, 5)]
```

### Display all numbers in list `a` that are an exact integer power of `2`:

```>>> [num for num in a for exponent in range(5) if num == 2 ** exponent ]
[1, 2, 4]
>>>
>>> [ # Same listcomp again with some formatting.
... num
...     for num in a
...         for exponent in range(5)
...   if num == 2 ** exponent ] # Between the brackets white space is almost irrelevant.
[1, 2, 4]
>>>
```

The above listcomp is equivalent to:

```>>> b = []
>>> a = list(range(-5,7)) ; a
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6]
>>> for num in a :
...     for exponent in range(5) :
...          if num == 2 ** exponent : b += [num]
...
>>> b
[1, 2, 4]
>>>
```

### Apply a math function to all elements:

```>>> [(2*x*x + 3*x - 2) for x in a]
[33, 18, 7, 0, -3, -2, 3, 12, 25, 42, 63, 88]
>>>
```

### Create a list of random positive integers:

```import random

b = [
random.getrandbits(p+q-r)
for p in range(10,20,4)
for q in range(3,17,5)
for r in range(4,16,7)
if ((p+q-r) > 12) and ((p+q-r) % 3)
]

print ('b =', b)
```

Successive invocations of the above produce:

```b = [1028, 193714, 7439, 6543953, 54358, 116875, 3156193, 644588]
b = [1665, 267591, 813, 6463934, 35700, 112421, 2943558, 640640]
b = [4370, 330632, 8045, 5037055, 5526, 44136, 2528805, 612774]
b = [14764, 21409, 8140, 3436253, 55785, 96832, 1355388, 162655]
```

### Call a method on each element:

```>>> fruits = ['  apples  ', '  pears   ', ' grapes   ', '  WaterMELons   ', ' peaches  ' ]
>>> [fruit.strip().lower() for fruit in fruits ]
['apples', 'pears', 'grapes', 'watermelons', 'peaches']
>>>
```

### To flatten a list:

```>>> L1 = [32, 97, [192, 128], 98, 99, 32, [192, 128], 32]
>>>
>>> [num for elem in L1 for num in elem]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
TypeError: 'int' object is not iterable
>>>
>>> L1a = [ ([p], p)[isinstance(p, list)] for p in L1 ] ; L1a
[, , [192, 128], , , , [192, 128], ]
>>>
>>> [num for elem in L1a for num in elem]
[32, 97, 192, 128, 98, 99, 32, 192, 128, 32]
```

## Nested List Comprehensions

Create a useful list:

```matrix = [p for p in range (3,23)] # A listcomp does this nicely.

print ('len(matrix) =', len(matrix))
print ('matrix =', matrix)
```
```len(matrix) = 20
matrix = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
```

### New list of 3 rows

Create a list containing three rows in which each row contains every third value of `matrix.`

```b = [ [matrix[p] for p in range(q,len(matrix),3)] for q in range(3) ]
print ('''
b =
{}, length={}
{}, length={}
{}, length={}
'''.format( b,len(b),  b,len(b),  b,len(b) ))
```
```b =
[3, 6, 9, 12, 15, 18, 21], length=7
[4, 7, 10, 13, 16, 19, 22], length=7
[5, 8, 11, 14, 17, 20], length=6
```

The nested listcomp above is equivalent to:

```b = [ [matrix[p] for p in range(0,len(matrix),3)] ,
[matrix[p] for p in range(1,len(matrix),3)] ,
[matrix[p] for p in range(2,len(matrix),3)] ]
```

Fill each row as necessary so that all rows have same length:

```c = [ b[p]+[None]*q for p in range(len(b)) for q in [ len(b) - len(b[p]) ] ]
```

The syntax of the listcomp requires a `for` statement. The second `for` statement above assigns a specific value to `q,` either `0` or `1.`

```print ('''
c =
{}, length={}
{}, length={}
{}, length={}
'''.format( c,len(c),  c,len(c),  c,len(c) ))
```
```c =
[3, 6, 9, 12, 15, 18, 21], length=7
[4, 7, 10, 13, 16, 19, 22], length=7
[5, 8, 11, 14, 17, 20, None], length=7
```

### Create a dictionary

Create a dictionary from a list in which each value at an even position is a key and each value at an odd position is the associated value:

```data = [p%q for p in range (3,10) for q in range (2,7)]
print ('data =', data)
```
```data = [1, 0, 3, 3, 3, 0, 1, 0, 4, 4, 1, 2, 1, 0, 5, 0, 0, 2, 1, 0, 1, 1, 3, 2, 1, 0, 2, 0, 3, 2, 1, 0, 1, 4, 3]
```

Create a list containing two rows in which the first row contains keys and the second contains values.

```b = [ [data[p] for p in range(q,len(data),2)] for q in range(2) ]
print ('''
b =
{}, length={}
{}, length={}
'''.format( b,len(b),  b,len(b) ))
```
```b =
[1, 3, 3, 1, 4, 1, 1, 5, 0, 1, 1, 3, 1, 2, 3, 1, 1, 3], length=18
[0, 3, 0, 0, 4, 2, 0, 0, 2, 0, 1, 2, 0, 0, 2, 0, 4], length=17
```

Fill as necessary:

```c = [ b[p]+[None]*q for p in range(len(b)) for q in range(len(b) - len(b[p]), 1+len(b) - len(b[p])) ]
print ('''
c =
{}, length={}
{}, length={}
'''.format( c,len(c),  c,len(c) ))
```
```c =
[1, 3, 3, 1, 4, 1, 1, 5, 0, 1, 1, 3, 1, 2, 3, 1, 1, 3], length=18
[0, 3, 0, 0, 4, 2, 0, 0, 2, 0, 1, 2, 0, 0, 2, 0, 4, None], length=18
```

Transpose rows and columns and create input to dictionary:

```d = [[row[i] for row in c] for i in range(len(c))]

print (' d =', d);
```
``` d = [[1, 0], [3, 3], [3, 0], [1, 0], [4, 4], [1, 2], [1, 0], [5, 0], [0, 2], [1, 0], [1, 1], [3, 2], [1, 0], [2, 0], [3, 2], [1, 0], [1, 4], [3, None]]
```

List d is equivalent to:

```d1 = [ [ c[i], c[i] ] for i in range(18) ]

print ('d1 =', d1)
```
```d1 = [[1, 0], [3, 3], [3, 0], [1, 0], [4, 4], [1, 2], [1, 0], [5, 0], [0, 2], [1, 0], [1, 1], [3, 2], [1, 0], [2, 0], [3, 2], [1, 0], [1, 4], [3, None]]
```

List d1 is equivalent to:

```d2 = [
[ c[ 0], c[ 0] ] ,
[ c[ 1], c[ 1] ] ,
[ c[ 2], c[ 2] ] ,
........................
[ c, c ] ,
[ c, c ] ,
[ c, c ]
]

print ('d2 =', d2)
```
```d2 = [[1, 0], [3, 3], [3, 0], [1, 0], [4, 4], [1, 2], [1, 0], [5, 0], [0, 2], [1, 0], [1, 1], [3, 2], [1, 0], [2, 0], [3, 2], [1, 0], [1, 4], [3, None]]
```
```e = dict(d)

print ('e =', e);
```
```e = {1: 4, 3: None, 4: 4, 5: 0, 0: 2, 2: 0}
```

#### Listcomps simplified

In practice this code will do the job:

```data = [p%q for p in range (3,10) for q in range (2,7)]
print ('data =', data)
```
```data = [1, 0, 3, 3, 3, 0, 1, 0, 4, 4, 1, 2, 1, 0, 5, 0, 0, 2, 1, 0, 1, 1, 3, 2, 1, 0, 2, 0, 3, 2, 1, 0, 1, 4, 3]
```

#### Check the input:

```if isinstance(data,list) and len(data) >= 1: pass # Input must contain at least one key.
else : exit(99)

status = {
((isinstance(data[p], int)) or (isinstance(data[p], float)) or (isinstance(data[p], str)))
for p in range(0, len(data), 2)
} # A set comprehension. In this dictionary each key must be int or float or str.

print ('status =', status)
```
```status = {True}
```
```if False in status :
print ("'data' contains unrecognized key.")
exit (98)
```

#### Create the dictionary

```b = dict(
[
(data+[None])[p:p+2] for p in range(0, len(data), 2)
]
)

print (
'\nlen(data) =',  len(data),
'\nInput to dict()\n   =', [ (data+[None])[p:p+2] for p in range(0, len(data), 2) ],
'\nDictionary =', b)
```
```len(data) = 35
Input to dict()
= [[1, 0], [3, 3], [3, 0], [1, 0], [4, 4], [1, 2], [1, 0], [5, 0], [0, 2], [1, 0], [1, 1], [3, 2], [1, 0], [2, 0], [3, 2], [1, 0], [1, 4], [3, None]]
Dictionary = {1: 4, 3: None, 4: 4, 5: 0, 0: 2, 2: 0}
```

 Although a listcomp recognizes statements beginning with only `for` or `if`, with a little dexterity the equivalent of assignments and `else` statements can be contained within a `listcomp.` The advantage is that a listcomp accepts free-format Python. A "Unix date" has format ```\$ date Wed Feb 14 08:24:24 CST 2018 ``` The code in this section uses list comprehensions to recognize valid dates. All of the following are considered valid dates: ```Wed Feb 14 08:24:24 CST 2018 Wednes Feb 14 08:24:24 CST 2018 # More than 3 letters in name of day. Wed Febru 14 08:24:24 CST 2018 # More than 3 letters in name of month. Wed Feb 14 8:24 : 24 CST 2018 # White space in hh:mm:ss. wed FeB 14 8:24 : 24 cSt 2018 # Bad punctuation. ``` Build dictionary `months:` ```mo = '''January February March April May June July August September October November December ''' L1 = [ [ month[:3], {month[:p] for p in range (3,len(month)+1)} ] for month in mo.title().split() ] months = dict(L1) ``` Display dictionary `months:` ```L1 = [ '''months['{}'] = {}'''.format(key, months[key]) for key in months ] print ( '\n'.join(L1) ) ``` ```months['Jan'] = {'Januar', 'Janua', 'Janu', 'Jan', 'January'} months['Feb'] = {'Februa', 'Febru', 'Februar', 'Feb', 'Febr', 'February'} months['Mar'] = {'Mar', 'March', 'Marc'} months['Apr'] = {'April', 'Apri', 'Apr'} months['May'] = {'May'} months['Jun'] = {'June', 'Jun'} months['Jul'] = {'Jul', 'July'} months['Aug'] = {'Augus', 'Augu', 'Aug', 'August'} months['Sep'] = {'Sep', 'Septemb', 'September', 'Septem', 'Septe', 'Septembe', 'Sept'} months['Oct'] = {'Octo', 'Octobe', 'Oct', 'Octob', 'October'} months['Nov'] = {'Nove', 'November', 'Novemb', 'Novem', 'Nov', 'Novembe'} months['Dec'] = {'Decembe', 'Decemb', 'Dece', 'Decem', 'December', 'Dec'} ``` Build dictionary `days:` ```da=''' Sunday Monday Tuesday Wednesday Thursday Friday Saturday ''' L1 = [ [ day[:3], {day[:p] for p in range (3,len(day)+1)} ] for day in da.title().split() ] days = dict(L1) ``` Display dictionary `days:` ```L1 = [ '''days['{}'] = {}'''.format(key, days[key]) for key in days ] print ( '\n'.join(L1) ) ``` ```days['Sun'] = {'Sund', 'Sunda', 'Sun', 'Sunday'} days['Mon'] = {'Monday', 'Mon', 'Mond', 'Monda'} days['Tue'] = {'Tuesda', 'Tues', 'Tuesday', 'Tue', 'Tuesd'} days['Wed'] = {'Wednesda', 'Wednesd', 'Wedn', 'Wednes', 'Wed', 'Wedne', 'Wednesday'} days['Thu'] = {'Thursday', 'Thur', 'Thu', 'Thursd', 'Thurs', 'Thursda'} days['Fri'] = {'Friday', 'Fri', 'Frida', 'Frid'} days['Sat'] = {'Saturday', 'Satu', 'Saturda', 'Sat', 'Saturd', 'Satur'} ``` The regular expression: ```reg2 = ( r'''\b # Word boundary. (?P\w{3,}) # At least 3 word characters. \s+ (?P\w{3,}) # At least 3 word characters. \s+ (?P\d{1,2}) # 1 or 2 numbers. \s+ (?P\d{1,2}) # 1 or 2 numbers. \s*:\s* (?P\d{1,2}) # 1 or 2 numbers. \s*:\s* (?P\d{1,2}) # 1 or 2 numbers. \s+ (?P\w{3}) # 3 word characters \s+ (?P\d{4}) # 4 numbers \b # Word boundary.''' ) ``` Dictionary that contains number of days per month: ```d1 = dict (( ('Jan', 31), ('May', 31), ('Sep', 30), ('Feb', 28), ('Jun', 30), ('Oct', 31), ('Mar', 31), ('Jul', 31), ('Nov', 30), ('Apr', 30), ('Aug', 31), ('Dec', 31), )) ``` List all valid dates in string `dates` below. ```dates = ''' MON Februar 12 0:30 : 19 CST 2018 Tue Feb 33 00:30:19 CST 2018 # Invalid. Wed Feb 29 00:30:19 CST 1900 # Invalid. Thursda feb 29 00:30:19 CST 1944 ''' ``` The list comprehension that does it all: ```L1 = [ '\n'.join(( str(m), m, str(m.groupdict()) )) for m in re.finditer(reg2, dates, re.IGNORECASE|re.VERBOSE|re.ASCII) for day in (m['day'].title(),) # Equivalent to assignment: day = m['day'].title() if day[:3] in days if day in days[day[:3]] for month in ( m['month'].title() ,) if month[:3] in months if month in months[month[:3]] for date in ( int(m['date']) ,) if date >= 1 for hours in ( int(m['hours']) ,) if hours <= 23 for minutes in ( int(m['minutes']) ,) if minutes <= 59 for seconds in ( int(m['seconds']) ,) if seconds <= 59 for zone in (m['time_zone'] ,) if zone.upper() in ('EST', 'EDT', 'CST', 'CDT', 'MST', 'MDT', 'PST', 'PDT' ) for year in ( int(m['year']) ,) if year >= 1900 and year <= 2020 for leap_year in ( # 'else' in a listcomp ( # equivalent to: year % 4 == 0, # if year % 100 == 0: year % 400 == 0 # leap_year = year % 400 == 0 )[year % 100 == 0] # else : ,) # leap_year = year % 4 == 0 for max_date in ( # if (month[:3] == 'Feb') and leap_year : ( # max_date = 29 d1[month[:3]], # else : 29 # max_date = d1[month[:3]] )[(month[:3] == 'Feb') and leap_year] # ,) if date <= max_date ] print ( '\n\n'.join(L1) ) ``` ```<_sre.SRE_Match object; span=(2, 35), match='MON Februar 12 0:30 : 19 CST 2018'> MON Februar 12 0:30 : 19 CST 2018 {'day': 'MON', 'month': 'Februar', 'date': '12', 'hours': '0', 'minutes': '30', 'seconds': '19', 'time_zone': 'CST', 'year': '2018'} <_sre.SRE_Match object; span=(159, 255), match='Thursda feb 29 > Thursda feb 29 00:30:19 CST 1944 {'day': 'Thursda', 'month': 'feb', 'date': '29', 'hours': '00', 'minutes': '30', 'seconds': '19', 'time_zone': 'CST', 'year': '1944'} ```