Python Concepts/Bytes objects and Bytearrays

From Wikiversity
Jump to navigation Jump to search

Contents

Objective[edit]

Books-aj.svg aj ashton 01f.png
  • What is a bytes object?
  • Why is a bytes object important?
  • What is the difference between a bytes object and a bytearray?
  • How is a bytes object created/used?
  • How to convert from a bytes object to other sequences based on bytes?
  • How to avoid errors when using bytes objects or bytearrays?

Lesson[edit]

One byte is a memory location with a size of 8 bits. A bytes object is an immutable sequence of bytes, conceptually similar to a string.

Because each byte must fit into 8 bits, each member of a bytes object is an unsigned int that satisfies

The bytes object is important because data written to disk is written as a stream of bytes, and because integers and strings are sequences of bytes. How the sequence of bytes is interpreted or displayed makes it an integer or a string.

bytes objects[edit]

bytes object displayed[edit]

A bytes object is displayed as a sequence of bytes between quotes and preceded by 'b' or 'B':

>>> bytes(3) # This initialization produces an empty sequence of 3 bytes.
b'\x00\x00\x00'
>>> 
>>> bytes([3]) # Initialized with a list containing 1 member.
b'\x03'
>>> 
>>> B'\x00\x00\x00'
b'\x00\x00\x00'
>>> 
>>> isinstance(b'\x00\x00\x00', bytes)
True
>>> len(b'\x00\x00\x00')
3 # bytes
>>>

The representation '\x00' is not read literally. This representation means a byte with value 0x00.

If a member of a bytes object can be displayed as a printable ASCII character, then it is so displayed.

>>> B"123ABC"
b'123ABC'
>>> 
>>> B'''\x03 1 2 x y \xE7'''
b'\x03 1 2 x y \xe7'
>>> 
>>> b"""\x41\x42\x43"""
b'ABC'
>>>

When you look at the contents of a bytes object, it is easy to overlook embedded ASCII characters:

>>> B'\x05\x97\xa3q\xf9\x17\x83' == b'\x05\x97\xa3' + b'q' + B'\xf9\x17\x83'
True
>>>

Parts of a bytes object:

>>> b'0123456789'[5]
53 # Individual member is returned as int.
>>> chr(53)
'5'
>>> 
>>> b'0123456789'[5:8]
b'567' # Sequence of members is returned as slice.
>>> 
>>> b'0123456789'[5:6]
b'5' # A slice containing 1 byte.
>>> 
>>> b'0123456789'[2::3] # A slice containing the byte in position 2, then every 3rd until end.
b'258'
>>> b'0123456789'[2::3] == b'0123456789'[2:3] + b'0123456789'[5:6] + b'0123456789'[8:9] 
True
>>> 
>>> b1 = b'0123456789'
>>> b1[7:3:-1]
b'7654' # slice of b1 reversed
>>> b1[-1: -len(b1)-1: -1]
b'9876543210' # b1 reversed
>>> 
>>> (    b1[-1: -len(b1)-1: -1] 
...   == b1[-1:: -1] 
...   == b1[len(b1)-1:: -1] 
...   == b1[:: -1] # A simple way to reverse the bytes object.
...   == b1[len(b1)-1: -len(b1)-1: -1] )
True # Slicing works as expected.
>>>

Some control characters are recognized as such but not displayed as such:

>>> b' \v \f \n \t '
b' \x0b \x0c \n \t '
>>>

bytes object initialized[edit]

>>> B'\x20\x19\x61\x62\x39\x40' # hex values
b' \x19ab9@'
>>> 
>>> b'\101\102\103\104' # octal values
b'ABCD'
>>>
>>> b'The quick, brown fox ...' # ASCII values
b'The quick, brown fox ...'
>>> 
>>> b'\x54\x68\x65 \161\165\151\143\153, brown \x2E\x2e\056\056' # Mixture.
b'The quick, brown ....'
>>> 
>>> b'\056\056\366\367'
b'..\xf6\xf7' # If ASCII cannot be displayed, hex is displayed.
>>>

The bytes object can contain recognized control characters:

>>> a = b'''line1
...     line2    
...   line3                     '''
>>> a
b'line1\n    line2    \n  line3  \t\t\t'
>>>

As for strings prefix 'r' or 'R' may be used:

>>> b1 = B'\x41\x42' ; b1 ; len(b1)
b'AB'
2
>>> b2 = rB'\x41\x42' ; b2 ; len(b2)
b'\\x41\\x42'
8
>>>
>>> b2 == b'\x5C' + b'x41' + B"""\134""" + b'x42'
True
>>>

A suitable sequence can be converted to bytes:

>>> b1 = bytes([16, 0x20, ord('a'), 98, 0x63, 0o144, ord(' '), 0xEe]) # Input to bytes() is list.
>>> b1
b'\x10 abcd \xee'
>>> 
>>> b2 = bytes( tuple(b1) ) ; b2 # Input to bytes() is tuple.
b'\x10 abcd \xee'
>>> b2 == b1
True
>>> b2 is b1
False # A deep copy.
>>> 
>>> b2 = bytes( [ ord(p) for p in '123 abc DEF' ] ) ; b2 # Converting string to bytes.
b'123 abc DEF'
>>> 
>>> bytes( [ ord(p) for p in 'xyz £ § « ® ½ Ø ñ' ] )
b'xyz \xa3 \xa7 \xab \xae \xbd \xd8 \xf1' # Characters chr(0x80) through chr(0xFF) are not displayable ASCII.
>>> 
>>> bytes([0x123])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: bytes must be in range(0, 256)
>>>

Like a string the bytes object doesn't support item assignment:

>>> a
b'line1\n    line2    \n  line3  \t\t\t'
>>> a[3]=6
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'bytes' object does not support item assignment
>>> 

Like a string the bytes object can be repeated:

>>> B"1,2," * 3
b'1,2,1,2,1,2,'
>>>

Behavior of str and behavior of bytes can be significantly different:

>>> c1 = '\u0041' ; c1 ; len(c1)
'A'
1
>>> b1 = b'\u0041' ; b1 ; len(b1)
b'\\u0041'
6
>>>

The concatenation of 2 or more bytes objects:

>>> b'abc' + B''' ''' + b"""\x44\x45\x46""" + b"\040" + B'\06123'
b'abc DEF 123'
>>>

bytes object as iterable[edit]

The bytes object accepts the usual operations over iterables:

>>> b1 = b'\t\n abcd'
>>> 
>>> if 'a' in b1 : print ('found')
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: a bytes-like object is required, not 'str'
>>> if b'a' in b1 : print ('found')
... 
found
>>> 
>>> if b'\n ab' in b1 : print ('found')
... 
found
>>> 
ord('c')
99
>>> if 99 in b1 : print ('found')
... 
found
>>> 
>>> for p in b1 : print (p)
... 
9    # p is returned and displayed as int.
10
32
97
98
99
100
>>>

Conversion to bytes object[edit]

from int[edit]

The following code illustrates the process for a positive integer:

def int_to_bytes (input_int) :
    isinstance(input_int, int) or exit (99)
    (input_int >= 0) or exit (98)
    if (input_int == 0) : return bytes([0])
    L1 = []

    num_bits = input_int.bit_length()

    while input_int :
        L1[0:0] = [(input_int & 0xFF)]
        input_int >>= 8

    if (num_bits % 8) == 0 :
        L1[0:0] = [0]

    return bytes(L1)

for i1 in [0, 0x8_9a_bc_de, 0xa8_9a_bc_de] :
    b1 = int_to_bytes (i1)
    print ('''i1 = {}, b1 = {}'''.format(hex(i1), b1))
i1 = 0x0, b1 = b'\x00'
i1 = 0x89abcde, b1 = b'\x08\x9a\xbc\xde'
i1 = 0xa89abcde, b1 = b'\x00\xa8\x9a\xbc\xde'

Method int.to_bytes(length, ....)[edit]

Method int.to_bytes(length, byteorder, *, signed=False) returns a bytes object representing an integer.

>>> 0x12_84.to_bytes(2, 'big', signed=True)
b'\x12\x84'
>>> (-0xE2_04).to_bytes(3, 'big', signed=True) # Note the parentheses: (-0xE2_04).
b'\xff\x1d\xfc' # 3 bytes for a signed, negative int.
>>>

from str[edit]

If each character of the string fits into one byte, the process is simple:

>>> s1 = '\011\012\015\016 123 abc \345\346\347' ; s1
'\t\n\r\x0e 123 abc åæç'
>>> 
>>> L1 = []
>>> for p in s1 : L1 += [ord(p)] 
... 
>>> L1
[9, 10, 13, 14, 32, 49, 50, 51, 32, 97, 98, 99, 32, 229, 230, 231]
>>> 
>>> bytes( L1 )
b'\t\n\r\x0e 123 abc \xe5\xe6\xe7'
>>>

A listcomp simplifies the process:

>>> b1 = bytes( [ ord(p) for p in s1 ] ) ; b1
b'\t\n\r\x0e 123 abc \xe5\xe6\xe7'
>>>

The above implements encoding 'Latin-1':

>>> b1a = s1.encode('Latin-1') ; b1a
b'\t\n\r\x0e 123 abc \xe5\xe6\xe7'
>>> b1 == b1a
True
>>>

Method str.encode() creates a bytes object containing the string str encoded:

>>> s1
'\t\n\r\x0e 123 ሴ abc åæç'
>>> s1.encode() # Each character 'åæç' occupies one byte but is encoded as 2 bytes.
b'\t\n\r\x0e 123 \xe1\x88\xb4 abc \xc3\xa5\xc3\xa6\xc3\xa7' # Character 'ሴ' is encoded as 3 bytes.
>>> 
>>> s1.encode() == s1.encode('utf-8') # Encoding 'utf-8' is default.
True
>>> 
>>> len(s1.encode('utf-8'))
23
>>> len(s1.encode('utf-16')) # 'utf-16' and 'utf-32' are optional encodings.
38
>>> len(s1.encode('utf-32'))
76
>>>

from str containing international text[edit]

>>> s1 = 'Γ γ Δ δ Ζ ζ Ξ ξ' # Greek
>>> s1.encode()
b'\xce\x93 \xce\xb3 \xce\x94 \xce\xb4 \xce\x96 \xce\xb6 \xce\x9e \xce\xbe' # Greek encoded
>>> len(s1)
15 # 8 Greek characters + 7 spaces
>>> len(s1.encode())
23 # 8*2 + 7
>>>

Each Greek character occupies 2 bytes and is encoded as 2 bytes. Note for example:

>>> hex(ord('ξ'))
'0x3be'
>>> chr(0x3be)
'ξ'
>>> chr(0x3be).encode()
b'\xce\xbe'
>>>
>>> s1 = 'А а Б б В в Г г Щ щ Я я' # Cyrillic
>>> s1.encode()
b'\xd0\x90 \xd0\xb0 \xd0\x91 \xd0\xb1 \xd0\x92 \xd0\xb2 \xd0\x93 \xd0\xb3 \xd0\xa9 \xd1\x89 \xd0\xaf \xd1\x8f' # Cyrillic encoded.
>>>
>>> s1 = 'A Α А' # English 'A', Greek 'Α', Cyrillic 'А'
>>> s1.encode()
b'A \xce\x91 \xd0\x90'
>>>
>>> s1 = 'ウ  ィ  キ  ペ  デ  ィ ア' # Japanese
>>> s1.encode()
b'\xe3\x82\xa6 \xe3\x82\xa3 \xe3\x82\xad \xe3\x83\x9a \xe3\x83\x87 \xe3\x82\xa3 \xe3\x82\xa2' # Japanese encoded.
>>> len(s1)
13 # 7 Japanese characters + 6 spaces.
>>> len(s1.encode())
27 # 7*3 + 6
>>>
>>> s1
'Ξ ξ ウ ィ Щ щ' # Mixture.
>>> s1.encode()
b'\xce\x9e \xce\xbe \xe3\x82\xa6 \xe3\x82\xa3 \xd0\xa9 \xd1\x89'
>>> len(s1)
11 # characters
>>> 
>>> len(s1.encode())
19 # bytes
>>>

from str containing hexadecimal digits[edit]

classmethod bytes.fromhex(string) returns a bytes object, decoding the given string object:

>>> bytes.fromhex( '  12  04  E6  d5  ' )
b'\x12\x04\xe6\xd5' # ASCII whitespace in string is ignored provided that hex digits are grouped as bytes.
>>> bytes.fromhex( '  1204  E6  d5  ' )
b'\x12\x04\xe6\xd5'
>>> bytes.fromhex( '  12  04E6d5  ' )
b'\x12\x04\xe6\xd5'
>>> bytes.fromhex( '  41     42        432044     4546  ' ) # To include a space, add '20' meaning b'\x20' or b' '.
b'ABC DEF'
>>> 
>>> bytes.fromhex( '  12  20 04  20 E6  20 d5  ' )
b'\x12 \x04 \xe6 \xd5'
>>> 
>>> bytes.fromhex( '  12  20 04  20 E6  20 d  ' ) # The string must contain exactly two hexadecimal digits per byte.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: non-hexadecimal number found in fromhex() arg at position 24
>>>

classmethod bytes.fromhex(string) can be used to convert from positive int to bytes object:

>>> i1 = 0xF12B4 ; i1
987828
>>> h1 = hex(i1)[2:] ; h1
'f12b4'
>>> h2 = ('0' * ((len(h1))%2)) + h1 ; h2 # Prepend '0' if necessary.
'0f12b4' # Length is even.
>>> b1 = bytes.fromhex( h2 ) ; b1
b'\x0f\x12\xb4'
>>>

Some technical information about encoding standard 'utf-8'[edit]

Strings encoded according to encoding standard 'utf-8' conform to the following table:

Range Encoding Description
U-00000000 ... U-0000007F 0xxxxxxx Marker bit '0' and 7 payload bits (suitable for ASCII)
U-00000080 ... U-000007FF 110xxxxx 10xxxxxx Marker bits '110', '10' and 11 payload bits
U-00000800 ... U-0000FFFF 1110xxxx 10xxxxxx 10xxxxxx Marker bits '1110', '10', '10' and 16 payload bits
U-00010000 ... U-0010FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx Marker bits '11110', '10', '10', '10' and 21 payload bits


Encoding standard 'utf-8' is a good choice for default encoding because:

  • 2, 3 or 4 bytes are used only if necessary,
  • it doesn't depend on byte ordering, big or little, and
  • arbitrary binary data is not likely to conform to the above specification.
Examples of characters encoded with 'utf-8'[edit]

>>> bin(ord('Q'))
'0b101_0001' # 7 bits
>>> 
>>> 'Q'.encode()
b'Q' # ASCII fits into 1 byte.
>>> 
>>> 
>>> bin(ord('Ξ')) # Greek
'0b11_1001_1110'
>>> ord('Ξ').bit_length()
10
>>> 'Ξ'.encode()
b'\xce\x9e' 
# 1100_1110_1001_1110 0xCE_9E
# 110_01110,10_011110, markers '110' and '10', payload bits 01110_011110 or 011_1001_1110 0x39E
>>> 
>>> 
>>> c1 = '先' # Chinese
>>> len(c1)
1
>>> bin(ord(c1))
'0b101_0001_0100_1000'
>>> ord(c1).bit_length()
15
>>> c1.encode()
b'\xe5\x85\x88' 
# 1110_0101_1000_0101_1000_1000 0xE5_85_88
# 1110_0101,10_000101,10_001000, 3 markers and payload bits 0101_000101_001000 or 0101_0001_0100_1000 0x5148
>>>

The following code examines chr(0x10006), encoded in 4 bytes:

c1 = chr(0x10006)

print ('c1 = ', c1, '\nord(c1) = ', hex(ord(c1)), sep='')

c1_encoded = c1.encode()

print ('c1_encoded =', c1_encoded)

print ([hex(p) for p in c1_encoded], '# each byte of c1_encoded')

print (
'''
The marker bits:
c1_encoded[0] & 0b11111_000 == 0b11110_000 : {}
c1_encoded[1] & 0b11_000000 == 0b10_000000 : {}
c1_encoded[2] & 0b11_000000 == 0b10_000000 : {}
c1_encoded[3] & 0b11_000000 == 0b10_000000 : {}
'''.format(
c1_encoded[0] & 0b11111_000 == 0b11110_000 ,
c1_encoded[1] & 0b11_000000 == 0b10_000000 ,
c1_encoded[2] & 0b11_000000 == 0b10_000000 ,
c1_encoded[3] & 0b11_000000 == 0b10_000000
)
)

# Produce the payload bits:
mask0 = 0b111; mask123 = 0b11_1111
payload = [c1_encoded[0] & mask0]
for p in range (1,4) : payload += [c1_encoded[p] & mask123]

print (
'''
The payload bits:
payload[0] = c1_encoded[0] & 0x07 = {} & 0x07 = {}
payload[1] = c1_encoded[1] & 0x3F = {} & 0x3F = {}
payload[2] = c1_encoded[2] & 0x3F = {} & 0x3F = {}
payload[3] = c1_encoded[3] & 0x3F = {} & 0x3F = {}
'''.format(
hex(c1_encoded[0]), hex(payload[0]),
hex(c1_encoded[1]), hex(payload[1]),
hex(c1_encoded[2]), hex(payload[2]),
hex(c1_encoded[3]), hex(payload[3])
)
)

s1 = 'payload[3] + (payload[2] << 6) + (payload[1] << 12) + (payload[0] << 18)'
i1 = eval(s1)

print (
'''
Building c1:
i1 = {} = {}
i1 == ord(c1) : {}
'''.format(
s1, hex(i1),
i1 == ord(c1)
)
)
c1 = 𐀆
ord(c1) = 0x10006
c1_encoded = b'\xf0\x90\x80\x86'
['0xf0', '0x90', '0x80', '0x86'] # each byte of c1_encoded

The marker bits:
c1_encoded[0] & 0b11111_000 == 0b11110_000 : True
c1_encoded[1] & 0b11_000000 == 0b10_000000 : True
c1_encoded[2] & 0b11_000000 == 0b10_000000 : True
c1_encoded[3] & 0b11_000000 == 0b10_000000 : True

The payload bits:
payload[0] = c1_encoded[0] & 0x07 = 0xf0 & 0x07 = 0x0
payload[1] = c1_encoded[1] & 0x3F = 0x90 & 0x3F = 0x10
payload[2] = c1_encoded[2] & 0x3F = 0x80 & 0x3F = 0x0
payload[3] = c1_encoded[3] & 0x3F = 0x86 & 0x3F = 0x6

Building c1:
i1 = payload[3] + (payload[2] << 6) + (payload[1] << 12) + (payload[0] << 18) = 0x10006
i1 == ord(c1) : True

Theoretically 21 payload bits can contain '\U001FFFFF' but the standard stops at '\U0010FFFF':

>>> chr(0x110006)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: chr() arg not in range(0x110000)
>>>
A disadvantage of 'utf-8'[edit]

A bytes object produced with encoding 'utf-8' can contain the null byte b'\x00'. This could cause a problem if you are sending a stream of bytes through a filter that interprets b'\x00' as end of data. Standard 'utf-8' never produces b'\xFF'. If your bytes object must not contain b'\x00' after encoding, you could convert the null byte to b'\xFF', then convert b'\xFF' to b'\x00' before decoding:

>>> s1 = 'Ξ ξ a\000bc \000 Я я 建 页' ; s1
'Ξ ξ a\x00bc \x00 Я я 建 页'
>>> b1 = s1.encode() ;b1
b'\xce\x9e \xce\xbe a\x00bc \x00 \xd0\xaf \xd1\x8f \xe5\xbb\xba \xe9\xa1\xb5'
>>> {'Found 0' for p in b1 if p == 0}
{'Found 0'}
>>> 
# Convert b'\x00' to b'\xFF'
>>> b2 = bytes([ (p,0xFF)[p == 0] for p in b1 ]) ; b2
b'\xce\x9e \xce\xbe a\xffbc \xff \xd0\xaf \xd1\x8f \xe5\xbb\xba \xe9\xa1\xb5'
>>> {'Found 0' for p in b2 if p == 0}
set()
>>> 
>>> # Check conversion from b1 to b2:
>>> {(p == (0,0xFF)) for p in zip(b1,b2) if p[0] != p[1]} # Difference between b1 and b2.
{True}
>>> 
>>> b2.decode() # b2 is not standard 'utf-8'.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 7: invalid start byte
>>>
# Before decoding convert b'\xFF' to b'\x00'.
>>> b3 = bytes([ (p,0)[p == 0xFF] for p in b2 ]) ; b3
b'\xce\x9e \xce\xbe a\x00bc \x00 \xd0\xaf \xd1\x8f \xe5\xbb\xba \xe9\xa1\xb5'
>>> s3 = b3.decode() ; s3
'Ξ ξ a\x00bc \x00 Я я 建 页'
>>> s3 == s1
True
>>>

Conversion from bytes object[edit]

to int[edit]

The following code illustrates the process for a positive integer:

def bytes_to_int (input_bytes) :
    isinstance(input_bytes, bytes) or exit (99)
    if (len(input_bytes) == 0) : return 0
    (input_bytes[0] < 0x80) or exit (98)

    shift = i1 = 0
    for p in range(1, len(input_bytes)+1) :
        i1 += (input_bytes[-p] << shift)
      	shift += 8

    return i1

for b1 in [b'', B"\x00\x00\x00", b'''\x13\xd8''', b"""\x00\xf7\x14"""] :
    i1 = bytes_to_int (b1)
    print ('''b1 = {}, i1 = {}'''.format(b1, hex(i1)))
b1 = b'', i1 = 0x0
b1 = b'\x00\x00\x00', i1 = 0x0
b1 = b'\x13\xd8', i1 = 0x13d8
b1 = b'\x00\xf7\x14', i1 = 0xf714

Class method int.from_bytes(bytes, ....)[edit]

Class method int.from_bytes(bytes, byteorder, *, signed=False) simplifies the conversion from bytes to int:

>>> hex( int.from_bytes(b'\x13\xf8', 'big', signed=True) )
'0x13f8'
>>> hex( int.from_bytes(b'\xd3\xf8', 'big', signed=True) )
'-0x2c08'
>>> hex( int.from_bytes(b'\x00\xd3\xf8', 'big', signed=True) )
'0xd3f8'
>>>

The following code ensures that the integer produced after encoding and decoding is the same as the original int:

def int_to_bytes (input) : # input is int.
    num_bits = input.bit_length()	
    num_bytes = (num_bits + 7) // 8
    if ((num_bits % 8) == 0) : num_bytes += 1

    return input.to_bytes(num_bytes, byteorder='big', signed=True)

def int_from_bytes (input) : # input is bytes.
    return int.from_bytes(input, byteorder='big', signed=True)

to str[edit]

If the bytes object contains only characters that fit into one byte:

>>> b2
b'\t\n\r\x0e 123 abc \xe5\xe6\xe7'
>>> [chr(p) for p in b2]
['\t', '\n', '\r', '\x0e', ' ', '1', '2', '3', ' ', 'a', 'b', 'c', ' ', 'å', 'æ', 'ç']
>>> s2 = ''.join([chr(p) for p in b2]) ; s2
'\t\n\r\x0e 123 abc åæç'
>>> 
>>> s2a = b2.decode('Latin-1') ; s2a
'\t\n\r\x0e 123 abc åæç'
>>> s2 == s2a
True
>>>

Method bytes.decode() creates a string representing the bytes object bytes decoded:

>>> b1
b'\t\n\r\x0e 123 \xe1\x88\xb4 abc \xc3\xa5\xc3\xa6\xc3\xa7'
>>> s1a = b1.decode() ; s1a
'\t\n\r\x0e 123 ሴ abc åæç'
>>>

It is important to use the correct decoding:

>>> s1b = b1.decode('utf-16') ; s1b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0xa7 in position 22: truncated data
>>>

to str containing international text[edit]

>>> b1 = b'\xce\x93 \xce\xb3 \xce\x94 \xce\xb4 \xce\x96 \xce\xb6 \xce\x9e \xce\xbe'
>>> s1 = b1.decode() ; s1
'Γ γ Δ δ Ζ ζ Ξ ξ' # Greek decoded
>>>
>>> b1 = b'\xd0\x90 \xd0\xb0 \xd0\x91 \xd0\xb1 \xd0\x92 \xd0\xb2 \xd0\x93 \xd0\xb3 \xd0\xa9 \xd1\x89 \xd0\xaf \xd1\x8f'
>>> s1 = b1.decode() ; s1
'А а Б б В в Г г Щ щ Я я' # Cyrillic decoded
>>>
>>> b1 = b'\xe3\x82\xa6 \xe3\x82\xa3 \xe3\x82\xad \xe3\x83\x9a \xe3\x83\x87 \xe3\x82\xa3 \xe3\x82\xa2' 
>>> s1 = b1.decode() ; s1
'ウ ィ キ ペ デ ィ ア' # Japanese decoded.
>>> 
>>> len(s1) ; len(b1)
13 # Length of s1 in characters.
27 # Length of s1 in bytes.
>>>

It is possible to produce different results depending on encoding/decoding:

>>> s1 = 'б в е ж п Т У Х Щ Ю'
>>> b1 = s1.encode('Latin-1') ; b1
b'\xd0\xb1 \xd0\xb2 \xd0\xb5 \xd0\xb6 \xd0\xbf \xd0\xa2 \xd0\xa3 \xd0\xa5 \xd0\xa9 \xd0\xae'
>>> s1a = b1.decode() ; s1a
'б в е ж п Т У Х Щ Ю' # Cyrillic.
>>> 
>>> s3
'维 帮 建 页 任 计 个'
>>> b3 = s3.encode('Latin-1') ; b3
b'\xe7\xbb\xb4 \xe5\xb8\xae \xe5\xbb\xba \xe9\xa1\xb5 \xe4\xbb\xbb \xe8\xae\xa1 \xe4\xb8\xaa'
>>> s3a = b3.decode() ; s3a
'维 帮 建 页 任 计 个' # Chinese
>>> 
>>> s3 == s3a
False # As strings s3 and s3a are not equal. However, they are equal as bytes.
>>>

to str containing hexadecimal digits[edit]

method bytes.hex() returns a string object containing two hexadecimal digits for each byte in the instance.

>>> b'\xf0\xf1\xf2'.hex()
'f0f1f2'
>>> b'\xf0\xf1\xf2'*3.hex()
  File "<stdin>", line 1
    b'\xf0\xf1\xf2'*3.hex()
                        ^
SyntaxError: invalid syntax
>>> (b'\xf0\xf1\xf2'*3).hex() # Use parentheses to enforce correct syntax.
'f0f1f2f0f1f2f0f1f2'
>>> 
>>> (b'\xf0\xf1\xf2'*3)[3].hex() # Individual member is returned as int.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'hex'
>>> (b'\xf0\xf1\xf2'*3)[3:4].hex() # A slice containing 1 byte.
'f0'
>>>

method bytes.hex() can be used to convert from bytes object to positive int:

>>> b1 = b'\xF0\x23\x95' ; b1
b'\xf0#\x95'
>>> h1 = b1.hex() ; h1
'f02395'
>>> i1 = int(h1, 16) ; i1
15737749
>>> hex(i1)
'0xf02395'
>>>

Operations with methods on bytes objects[edit]

Operations on strings usually require str arguments. Similarly, operations on bytes objects usually require bytes arguments. Occasionally, a suitable int may be substituted.

The following methods on bytes are representative of methods described in the reference. All can be used with arbitrary binary data.

bytes.count(sub[, start[, end]])


>>> b'abcd abcd abcd'.count(b' ')
2
>>> b'abcd abcd abcd'.count(b'bcd')
3
>>> b'abcd abcd abcd'[3:].count(b'bcd')
2
>>> b'abcd abcd abcd'[3:10].count(b'bcd')
1
>>> ord('c')
99
>>> b'abcd abcd abcd'.count(99) # chr(99) = 'c'
3
>>> 
>>> b'\x0a\x0a\x0A'.count(b'\n')
3
>>> b'\x0a\012\n'.count(10)
3
>>>

Creating and using a translation table:[edit]

static bytes.maketrans(from, to) returns a translation table to map a byte in from into the byte in the same position in to.

>>> tt1 = bytes.maketrans(b'', bytes(0)) # A translation table without mapping.
>>> tt2 = bytes([p for p in range(256)]) # Same again.
>>> tt1 == tt2
True
>>> tt1 is tt2
False
>>> 
>>> bytes.maketrans(b'aaaaa', b'1234A') == bytes.maketrans(b'a', b'A') # Duplicates are processed without error.
True                                                                   # Last in sequence wins.
>>>

bytes.translate(table, delete=bytes(0)) returns a copy of the bytes object where all bytes occurring in the optional argument delete are removed, and the remaining bytes have been mapped through the given translation table, which must be a bytes object of length 256.

To invert the case of all alphabetic characters:

>>> UPPER = b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> lower = b'abcdefghijklmnopqrstuvwxyz'
>>> tt1 = bytes.maketrans(UPPER+lower, lower+UPPER)
>>> 
>>> b'The Quick, Brown FOX jumps ....'.translate(tt1)
b'tHE qUICK, bROWN fox JUMPS ....'
>>>

To delete specified bytes:

>>> b'The Quick, Brown FOX jumps ....'.translate(None, b'., oO')
b'TheQuickBrwnFXjumps'
>>>

Deletion is completed before translation:

>>> b'The Quick, Brown FOX jumps ....'.translate(tt1, b'., o')
b'tHEqUICKbRWNfoxJUMPS'
>>>

bytes objects and disk files[edit]

Data is written to disk as a stream of bytes. Therefore the bytes object is ideal for this purpose.

The following code writes a stream of bytes to disk and then reads the data on disk as text.

Python automatically performs the appropriate decoding (default 'utf-8') when reading text.

$ cat test.py
b1 = b'English (cur | prev)'
b2 = b'Chinese \xef\xbc\x88\xe5\xbd\x93\xe5\x89\x8d | \xe5\x85\x88\xe5\x89\x8d\xef\xbc\x89'
b3 = b'Japanese (\xe6\x9c\x80\xe6\x96\xb0 | \xe5\x89\x8d)'
b4 = b'Greek (\xcf\x80\xce\xb1\xcf\x81\xcf\x8c\xce\xbd | \xcf\x80\xcf\x81\xce\xbf\xce\xb7\xce\xb3.)'
b5 = b'Russian (\xd1\x82\xd0\xb5\xd0\xba\xd1\x83\xd1\x89. | \xd0\xbf\xd1\x80\xd0\xb5\xd0\xb4.)'

number_written = 0

try:
    with open("test.bin", "wb") as ofile: # Write bytes to disk.
        for bx in b1,b2,b3,b4,b5 :
            nw = ofile.write(bx + b'\n') # in bytes
            number_written += nw
        end_of_file = ofile.tell() # in bytes
except:
    print ('Error1 detected in "with" statement.')

(number_written == end_of_file) or exit (99)

try:
    with open("test.bin", "rt") as ifile: # Read characters from disk.
        for line in ifile :
            print (len(line), line, end='')
except:
    print ('Error2 detected in "with" statement.')

exit (0)
$ python3.6 test.py >test.sout 2>test.serr 
$
$ od -t x1 test.bin # The contents of disk file test.bin (edited for clarity):
0000000     E   n   g   l   i   s   h ' '   (   c   u   r ' '   | ' '   p # English
0000016     r   e   v   )'\n'

                                C   h   i   n   e   s   e ' '  ef  bc  88 # Chinese
0000032    e5  bd  93  e5  89  8d ' '   |  20  e5  85  88  e5  89  8d  ef
0000048    bc  89'\n'

                        J   a   p   a   n   e   s   e ' '   (  e6  9c  80 # Japanese
0000064    e6  96  b0 ' '   | ' '  e5  89  8d   )'\n'

                                                        G   r   e   e   k # Greek
0000080   ' '   (  cf  80  ce  b1  cf  81  cf  8c  ce  bd ' '   | ' '  cf
0000096    80  cf  81  ce  bf  ce  b7  ce  b3   .   )'\n'

                                                            R   u   s   s # Russian
0000112     i   a   n ' '   (  d1  82  d0  b5  d0  ba  d1  83  d1  89   .
0000128   ' '   | ' '  d0  bf  d1  80  d0  b5  d0  b4   .   )'\n'
0000142 # Values in left hand column are decimal.
$
$ ls -la test.bin
-rw-r--r--  1 user  staff  142 Nov 12 08:46 test.bin
$ 
$ cat test.bin
English (cur | prev)
Chinese (当前 | 先前)
Japanese (最新 | 前)
Greek (παρόν | προηγ.)
Russian (текущ. | пред.)
$
$ cat test.sout
21 English (cur | prev)
18 Chinese (当前 | 先前)# 18 characters including '\n'
18 Japanese (最新 | 前)
23 Greek (παρόν | προηγ.)
25 Russian (текущ. | пред.)
$

bytearrays[edit]

The bytearray is a mutable sequence of bytes, similar to the bytes object in that each member of the bytearray fits into one byte, and similar to a list in that the bytearray or any slice of it may be changed dynamically.

bytearray displayed[edit]

The bytearray is displayed as a bytes object within parentheses prepended by the word bytearray:

>>> ba1 = bytearray() ; ba1 ; len(ba1) ; isinstance(ba1, bytearray)
bytearray(b'') # An empty bytearray.
0
True
>>> ba1 = bytearray(3) ; ba1 ; len(ba1) ; isinstance(ba1, bytearray)
bytearray(b'\x00\x00\x00')
3
True
>>> ba1 = bytearray([3]) ; ba1 ; len(ba1) ; isinstance(ba1, bytearray)
bytearray(b'\x03')
1
True
>>> ba1 = bytearray([3,4,5,6]) ; ba1 ; len(ba1) ; isinstance(ba1, bytearray)
bytearray(b'\x03\x04\x05\x06')
4
True
>>> ba1 = bytearray(B'''\x00\001\x02\003''') ; ba1 ; len(ba1) ; isinstance(ba1, bytearray)
bytearray(b'\x00\x01\x02\x03')
4
True
>>> 
>>> ba1 = bytearray(b"""\040abc \x33\x34\x35 \130\131\132\x20""") ; ba1 ; len(ba1) ; isinstance(ba1, bytearray)
bytearray(b' abc 345 XYZ ')
13
True
>>>

Individual member is returned as int:

>>> ba1[3] ; chr(ba1[3])
99
'c'
>>>

Slices of bytearray ba1:

>>> ba1[1:2]
bytearray(b'a') # A slice containing 1 byte.
>>> ba1[4:8]
bytearray(b' 345') # A slice of 4 bytes.
>>> ba1[7:3:-1]
bytearray(b'543 ') # Above slice reversed.
>>> ba1[::-1]
bytearray(b' ZYX 543 cba ') # ba1 reversed.
>>> 
>>> ba1[4:8] == (ba1[7:3:-1])[::-1]
True
>>>

bytearray initialized[edit]

Any bytes object may be converted to a bytearray:

>>> ba1 = bytearray(b'\x54\x68\x65 \161\165\151\143\153, brown \x2E\x2e\056\056')
>>> ba1; len(ba1) ; isinstance(ba1, bytearray)
bytearray(b'The quick, brown ....')
21
True
>>> ba1 = bytearray ( B""" line1
... \v \f \n \t line2
...                     line3""")
>>> ba1
bytearray(b' line1\n\x0b \x0c \n \t line2\n\t\t\tline3')
>>> 
>>> ba2 = bytearray(rB'\x41\x42') ; ba2 ; len(ba2) # Note the rB'..'.
bytearray(b'\\x41\\x42')
8
>>> isinstance(ba2, bytearray)
True
>>> b1 = b'\x5C' + b'x41' + B"""\134""" + b'x42' ; b1
b'\\x41\\x42'
>>> isinstance(b1, bytes)
True
>>> b1 == ba2 # Comparing bytes object and bytearray.
True
>>>

A suitable sequence can be converted to bytearray:

>>> ba1 = bytearray([16, 0x20, ord('a'), 98, 0x63, 0o144, ord(' '), 0xEe]) # Input to bytearray() is list.
>>> ba1; len(ba1) ; isinstance(ba1, bytearray)
bytearray(b'\x10 abcd \xee')
8
True
>>> 
>>> bytearray( [ ord(p) for p in 'xyz £ § « ® ½ Ø ñ' ] )
bytearray(b'xyz \xa3 \xa7 \xab \xae \xbd \xd8 \xf1')
>>>

Concatenation of bytearray and bytes object:

>>> bytearray(b'abc') + B''' ''' + b"""\x44\x45\x46""" + b"\040" + B'\06123'
bytearray(b'abc DEF 123')

Because the bytearray is a mutable sequence, the bytearray accepts assignment:

>>> ba1 = bytearray(b'The quick, brown ') ; ba1
bytearray(b'The quick, brown ')
>>> 
>>> ba1 += b'fox' ; ba1
bytearray(b'The quick, brown fox')
>>> 
>>> ba1[4:9] = b'lazy' ; ba1
bytearray(b'The lazy, brown fox')
>>> 
>>> ba1[4:8] = ba1[4:8].upper() ; ba1
bytearray(b'The LAZY, brown fox')
>>> 
>>> ba1[4:10] = b'' ; ba1
bytearray(b'The brown fox')
>>>

bytearray as iterable[edit]

>>> ba1 = bytearray(b'The quick, brown ')
>>> if b'quick' in ba1 : print ('found')
... 
found
>>> if bytearray(b'quick') in ba1 : print ('found')
... 
found
>>> if 32 in ba1 : print ('found') # chr(32) = ' '.
... 
found
>>> 
>>> for p in ba1 :
...     print (p,' ',end='')
... 
84  104  101  32  113  117  105  99  107  44  32  98  114  111  119  110  32  >>> 
>>> 
>>> for p in ba1.upper() :
...     print (chr(p),end='')
... 
THE QUICK, BROWN >>> 
>>>

Conversion to bytearray[edit]

from int[edit]

The following code illustrates the process for a positive integer:

>>> I1 = 0x18b9e4
>>> bytearray ( [ (I1 >> 16) & 255, (I1 >> 8) & 255, I1 & 255 ] )
bytearray(b'\x18\xb9\xe4')
>>>

Method int.to_bytes(length, ....)[edit]

Method int.to_bytes(length, byteorder, *, signed=False) returns a bytes object representing an integer. If a bytearray is required, convert the bytes object to bytearray.

>>> bytearray ( 0x12_84.to_bytes(2, 'big', signed=True) )
bytearray(b'\x12\x84')
>>> 
>>> bytearray ( (-0xE2_04).to_bytes(3, 'big', signed=True) ) # Note the parentheses: (-0xE2_04).
bytearray(b'\xff\x1d\xfc') # 3 bytes for a signed, negative int.
>>>

from str[edit]

If each character of the string fits into one byte, the process is simple:

>>> s1 = '\011\012\015\016 123 abc \345\346\347' ; s1
'\t\n\r\x0e 123 abc åæç'
>>>
>>> ba1 = bytearray( [ ord(p) for p in s1 ] ) ; ba1
bytearray(b'\t\n\r\x0e 123 abc \xe5\xe6\xe7')
>>> 
>>> s1.encode('Latin-1') == ba1 # Comparing bytes object and bytearray.
True
>>>

Method str.encode() creates a bytes object containing the string str encoded. If a bytearray is required, convert the bytes object to bytearray.

>>> s1 = '\t\n\r\x0e 123 ሴ abc åæç' ; s1
'\t\n\r\x0e 123 ሴ abc åæç'
>>> 
>>> bytearray ( s1.encode() ) # Each character 'åæç' occupies one byte but is encoded as 2 bytes.
bytearray(b'\t\n\r\x0e 123 \xe1\x88\xb4 abc \xc3\xa5\xc3\xa6\xc3\xa7') # Character 'ሴ' is encoded as 3 bytes.
>>> 
>>> bytearray ( s1.encode() ) == s1.encode('utf-8')
True
>>>

from str containing international text[edit]

>>> s1
'Ξ ξ ウ ィ Щ щ' # Mixture of Greek, Japanese, Russian.
>>> 
>>> ba1 = bytearray ( s1.encode() ) ; ba1
bytearray(b'\xce\x9e \xce\xbe \xe3\x82\xa6 \xe3\x82\xa3 \xd0\xa9 \xd1\x89')
>>> 
>>> len(s1)
11 # characters
>>> 
>>> len(ba1)
19 # bytes
>>> 
>>> ba1 == s1.encode('utf-8')
True
>>>

from str containing hexadecimal digits[edit]

classmethod bytearray.fromhex(string) returns a bytearray, decoding the given string object:

>>> bytearray.fromhex( '  12  04  E6  d5  ' )
bytearray(b'\x12\x04\xe6\xd5') # ASCII whitespace in string is ignored provided that hex digits are grouped as bytes.
>>> 
>>> bytearray.fromhex( '  12  20 04  20 E6  20 d5  ' )
bytearray(b'\x12 \x04 \xe6 \xd5')
>>> 
>>> bytearray.fromhex( '  12  20 04  20 E6  20 d  ' ) # The string must contain exactly two hexadecimal digits per byte.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: non-hexadecimal number found in fromhex() arg at position 24
>>>

classmethod bytearray.fromhex(string) can be used to convert from positive int to bytearray:

>>> i1 = 0xF12B4 ; i1
987828
>>> h1 = hex(i1)[2:] ; h1
'f12b4'
>>> h2 = ('0' * ((len(h1)) & 1)) + h1 ; h2 # Prepend '0' if necessary.
'0f12b4' # Length is even.
>>> ba1 = bytearray.fromhex( h2 ) ; ba1
bytearray(b'\x0f\x12\xb4')
>>>

Conversion from bytearray[edit]

to int[edit]

The following code illustrates the process for a positive integer:

>>> ba1
bytearray(b'\x0f\x12\xb4')
>>> i1 = (ba1[0] << 16) + (ba1[1] << 8) +  ba1[2] ; hex(i1)
'0xf12b4'
>>>

Class method int.from_bytes(bytes, ....)[edit]

Class method int.from_bytes(bytes, byteorder, *, signed=False) simplifies the conversion from bytearray to int:

>>> hex( int.from_bytes(bytearray(b'\xd3\xf8'), 'big', signed=True) )
'-0x2c08'
>>> hex( int.from_bytes(bytearray(b'\x00\xd3\xf8'), 'big', signed=True) )
'0xd3f8'
>>>

to str[edit]

If the bytearray contains only characters that fit into one byte:

>>> ba2 = bytearray(b'\t\n\r\x0e 123 abc \xe5\xe6\xe7') ; ba2
bytearray(b'\t\n\r\x0e 123 abc \xe5\xe6\xe7')
>>> s2 = ''.join([chr(p) for p in ba2]) ; s2
'\t\n\r\x0e 123 abc åæç'
>>> s2a = ba2.decode('Latin-1') ; s2a
'\t\n\r\x0e 123 abc åæç'
>>> s2 == s2a
True
>>>

Method bytearray.decode() creates a string representing the bytearray decoded:

>>> ba1 = bytearray(b'\t\n\r\x0e 123 \xe1\x88\xb4 abc \xc3\xa5\xc3\xa6\xc3\xa7') ; ba1
bytearray(b'\t\n\r\x0e 123 \xe1\x88\xb4 abc \xc3\xa5\xc3\xa6\xc3\xa7')
>>> s1 = ba1.decode() ; s1
'\t\n\r\x0e 123 ሴ abc åæç'
>>>

It is important to use the correct decoding:

>>> s1b = ba1.decode('utf-16') ; s1b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0xa7 in position 22: truncated data
>>>

to str containing international text[edit]

>>> ba1 = bytearray(b'\xce\x93 \xce\xb3 \xce\x94 \xce\xb4 \xce\x96 \xce\xb6')
>>> s1 = ba1.decode() ; s1
'Γ γ Δ δ Ζ ζ' # Greek decoded.
>>>
>>> ba2 = bytearray(b'\xd0\x90 \xd0\xb0 \xd0\x91 \xd0\xb1 \xd0\x92 \xd0\xb2 \xd0\x93 \xd0\xb3 \xd0\xa9 \xd1\x89')
>>> s2 = ba2.decode() ; s2
'А а Б б В в Г г Щ щ' # Cyrillic decoded.
>>> 
>>> ba3 = bytearray(b'\xe3\x82\xa6 \xe3\x82\xa3 \xe3\x82\xad \xe3\x83\x9a \xe3\x83\x87 \xe3\x82\xa3')
>>> s3 = ba3.decode() ; s3
'ウ ィ キ ペ デ ィ' # Japanese decoded.
>>> len(s3)
11 # Length of s3 in characters.
>>> len(ba3)
23 # Length of s3 in bytes.
>>>

to str containing hexadecimal digits[edit]

method bytearray.hex() returns a string object containing two hexadecimal digits for each byte in the instance.

>>> bytearray(b'\xf0\xf1\xf2').hex()
'f0f1f2'
>>> (bytearray(b'\xf0\xf1\xf2') * 2).hex()
'f0f1f2f0f1f2'
>>> (bytearray(b'\xf0\xf1\xf2') * 2)[1:4].hex()
'f1f2f0'
>>>

method bytearray.hex() can be used to convert from bytearray to positive int:

>>> ba1 = bytearray(b'\xF0\x23\x95') ; ba1
bytearray(b'\xf0#\x95')
>>> h1 = ba1.hex() ; h1
'f02395'
>>> i1 = int(h1, 16) ; i1 ; hex(i1)
15737749
'0xf02395'
>>>

Operations with methods on bytearrays[edit]

The following methods on bytearrays are representative of methods described in the reference. All can be used with bytes objects.

bytearray.find(sub[, start[, end]])[edit]

>>> bytearray([5,6,7,8,9]).find(8)
3
>>> bytearray([5,6,7,8,9]).find(b'\x07\x08')
2
>>> bytearray([5,6,7,8,9]).find(23)
-1
>>>

bytearray.join(iterable)[edit]

>>> bytearray(b' ').join([b'123', bytes([55,56,57]), bytearray(b'XYZ')])
bytearray(b'123 789 XYZ')
>>>

bytearray.partition(sep)[edit]

>>> bytearray([8,9,10,11,12]).partition(b'\x0a')
(bytearray(b'\x08\t'), bytearray(b'\n'), bytearray(b'\x0b\x0c'))
>>> 
>>> bytearray(b'abcdefg').partition(b'c')
(bytearray(b'ab'), bytearray(b'c'), bytearray(b'defg'))
>>> 
>>> bytearray(b'abcdefg').partition(b'X')
(bytearray(b'abcdefg'), bytearray(b''), bytearray(b''))
>>>
>>> bytearray(b'abcdefg').partition(b'cde')
(bytearray(b'ab'), bytearray(b'cde'), bytearray(b'fg'))
>>>

bytearray.center(width[, fillbyte])[edit]

>>> bytearray(b'abcd').center(10)
bytearray(b'   abcd   ')
>>> 
>>> bytearray(b'abcd').center(10, b'.')
bytearray(b'...abcd...')
>>>

bytearray.split(sep=None, maxsplit=-1)[edit]

>>> bytearray(b'1 2  3   4     5').split()
[bytearray(b'1'), bytearray(b'2'), bytearray(b'3'), bytearray(b'4'), bytearray(b'5')]
>>> 
>>> bytearray(b'1 2  3   4     5').split(b' ')
[bytearray(b'1'), bytearray(b'2'), bytearray(b''), bytearray(b'3'), bytearray(b''), bytearray(b''), bytearray(b'4'), bytearray(b''), bytearray(b''), bytearray(b''), bytearray(b''), bytearray(b'5')]
>>>
>>> bytearray(b'1 2  3   4     5').split(b'4')
[bytearray(b'1 2  3   '), bytearray(b'     5')]
>>>

bytearray.strip([chars])[edit]

>>> bytearray(b' \v  \f   \n  \t  ').strip()
bytearray(b'')
>>> 
>>> bytearray(b'    123  ').strip()
bytearray(b'123')
>>> 
>>> bytearray(b'www.example.com').strip(b'cmwz.')
bytearray(b'example.co')
>>>

bytearray.isspace()[edit]

>>> bytearray(b'').isspace()
False
>>> 
>>> bytearray(b'  \v  \f  \n  \t      ').isspace()
True
>>> 
>>> bytearray(b'  \v  \f  \n  \t   x   ').isspace()
False
>>>

Assignments[edit]

Crystal Clear app kedit.svg


Further Reading or Review[edit]

References[edit]

1. Python's documentation:

"2.4.1. String and Bytes literals," "4.8. Binary Sequence Types — bytes, bytearray, ....," "4.8.3. Bytes and Bytearray Operations," "4.4.2. Additional Methods on Integer Types," "Unicode HOWTO," "7.2.2. Encodings and Unicode"


2. Python's methods:

"bytes.decode()," "str.encode()"


3. Python's built-in functions:

"bytes()," "bytearray()," "ord()," "chr()," "open(file, ....)"