Jump to content

Python Concepts/Files

From Wikiversity

Objective

[edit | edit source]
  • Learn a little bit about files.
  • Learn about the built-in function open().
  • Learn how to read, write to, and seek in a file.
  • Learn about abstractions that act like files.
  • Learn about the optional parameters for open().
  • Learn how to rename, remove, move, and create files.

Lesson

[edit | edit source]

What's A File

[edit | edit source]

With work on a computer, files are usually used in daily tasks. You may spend your days writing word processor documents for a news company or you may like to listen to your mp3 files during your free time. You most likely already have an abstract idea of what a file is: a piece of information that's stored on a disk.

So what exactly is a file? A file is just a group of 1's and 0's that are stored on the disk. Since the operating system takes care of managing them, you don't have to worry about their technical details.

The data within a file may be as simple as a few words of text, or nothing at all (/dev/null); the data may be the audio and video of one of your favorite movies; a file may contain enormous quantities of numbers to be used to predict the path of a hurricane or to predict the existence of extra-terrestrial planets, or to monitor the national debt.

Whatever the size and shape of a file, the usual operations on files are open, read and/or write, and close.


To open a single file in Python, use the built-in function open(). This code illustrates opening, reading and closing a file:

ifile = open('test.txt') # ifile (input file) is a file object
                         # Same as ifile = open('test.txt', 'rt') where 'rt' (default) means 'read text'.
for line in ifile : # ifile behaves like an iterable
    print (line, end='')

ifile.close()

The content of file 'test.txt':

Hello, world!
Hola, mundo!
Goodbye, world!

Technical definition of a file object

[edit | edit source]
>>> import io
>>>
>>> f = open('test1.txt') ; f
<_io.TextIOWrapper name='test1.txt' mode='r' encoding='UTF-8'>
>>> isinstance(f, io.TextIOWrapper)
True
>>> f.close()
>>> f
<_io.TextIOWrapper name='test1.txt' mode='r' encoding='UTF-8'>
>>> isinstance(f, io.TextIOWrapper)
True # Still valid, even though f is closed.
>>> f.readable()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file
>>> 
>>> # Open file in binary mode:
>>>
>>> f = open('test1.txt', 'r+b') ; f
<_io.BufferedRandom name='test1.txt'>
>>> isinstance(f, io.BufferedRandom)
True
>>> f.close()
>>> f
<_io.BufferedRandom name='test1.txt'>
>>> isinstance(f, io.BufferedRandom)
True # Still valid, even though f is closed.
>>>

Opening A File

[edit | edit source]

The concept of opening a file seems simple enough, but it leads to significant questions:

Does the file exist? If not, why not?

If the file exists, is it where you expect it to be? If so, do you have permission to open it? You may be able to open it for reading, but what about writing or truncating the file?

If you open the file, is it OK if somebody else opens it while you have it open? You might decide to lock all others out of the file while you have it open. If so, normal etiquette requires that you do what you have to do quickly and then close it so that others may access it.

Computer scientists prepare for errors and handle them gracefully. Therefore, the above code is rewritten to handle errors:

inputFile = 'test.txt'

# Open the file                                                                                                         

status = 0
try:
    ifile = open(inputFile)
except:
    print ("Error detected when opening '{}'.".format(inputFile))
    status = 99

if status : exit (status)

# Read the file                                                                                                         

status = 0
try:
    for line in ifile :
        print (line, end='')
except:
    print ("Error detected when reading '{}'.".format(inputFile))
    status = 98

if status : exit (status)

# Close the file                                                                                                        

status = 0
try:
    ifile.close()
except:
    print ("Error detected when closing '{}'.".format(inputFile))
    status = 97

if status : exit (status)

exit (0)

Handling Errors

[edit | edit source]

In the example immediately above there is more error-handling code than operational code. If you think this is unrealistic, remember that software engineers are notorious for overestimating their ability and underestimating the time to complete a given project. Simple mistakes can lead to disastrous and expensive consequences.


Milstar: Military Strategic and Tactical Radar


The third launch in the series, 30 April 1999, failed because an engineer entered one parameter as -0.1992476 instead of the correct -1.992476. More than one billion dollars (that's billion with a 'b') was wasted.

"Milstar satellite overview" This page doesn't mention the failed third launch.

"History of Milstar" From Wikipedia.

"A single error can kill a mission"

"Examples from the Launch World"


Close to home


The following code is copied directly from a famous instructional book for Perl:

opendir(ETC, "/etc") || die "no etc?";
foreach $name (sort readdir(ETC)) {
    print "$name\n";
}
close(ETC);

Can you spot the error? Rewrite the code to catch potential errors:

opendir(ETC, "/etc") || die "no etc?";
foreach $name (sort readdir(ETC)) {
    print "$name\n";
}
close(ETC) || die "";

Execution of this piece of code fails at close(ETC) || die ""; The code should be:

opendir(ETC, "/etc") || die "no etc?";
foreach $name (sort readdir(ETC)) {
    print "$name\n";
}
closedir(ETC) || die '';

Your attitude may be: "It doesn't matter about the closedir() because the operating system closes the directory when the application exits." If so, the code should be:

opendir(ETC, "/etc") || die "no etc?";
foreach $name (sort readdir(ETC)) {
    print "$name\n";
}
exit (0);

Catching errors often reveals simple mistakes in software that can go undetected for a long time.


python's with statement

[edit | edit source]

python's with statement simplifies handling errors during operations on files:

with open("test.txt", "r") as file:
    print ( 'status1 =', file.closed)

print ( 'status2 =', file.closed)

try:
    with open("test.txt", "r") as file:
        print ( 'status3 =', file.closed)
        raise NameError
except:
    print ('Error detected in "with" statement.')

print ( 'status4 =', file.closed)

On exiting the body of the with statement (for any reason) python closes an open file:

status1 = False
status2 = True
status3 = False
Error detected in "with" statement.
status4 = True

Information about the file

[edit | edit source]

Information available after opening

[edit | edit source]
>>> f = open ('test.txt')
>>> f
<_io.TextIOWrapper name='test.txt' mode='r' encoding='UTF-8'> # file object
>>> 
>>> f = open ('test.txt') # File is silently reopened without error.
>>> 
>>> f.closed
False # File is open
>>> 
>>> f.fileno()
3 # The underlying file descriptor, an int.
>>> 
>>> f.isatty()
False # Input is not coming from a terminal.
>>> 
>>> f.readable()
True # We expect the file to be readable. It was opened without error.
>>> 
>>> f.seekable()
True # We can cause the file to seek to any desired position within the file.
>>> 
>>> f.tell()
0 # Internal pointer is at beginning of file. This is expected after opening file.
>>> 
>>> f.writable()
False # Not writable, opened for reading only. An attempt to write or truncate raises OSError.
>>> 
>>> # Size of file:
>>> f.seek(0,2) # Change stream position to 0 bytes from end of file.
43 # At end of file. Therefore, file contains 43 bytes.
>>> f.tell()
43 # Current position of stream.
>>>

Function os.stat()

[edit | edit source]

Function os.stat(path, *, dir_fd=None, follow_symlinks=True) provides more information about the file. It returns a stat_result object.

>>> import os
>>> 
>>> info = os.stat('test1.txt')
>>> isinstance(info, os.stat_result)
True
>>> info
os.stat_result(st_mode=33188, st_ino=24051862, st_dev=16777218, st_nlink=1, st_uid=501, st_gid=20, st_size=249, st_atime=1505769137, st_mtime=1505476400, st_ctime=1505476401)
>>> 
>>> info[1]
24051862
>>> info.st_ino
24051862
>>> info[6]
249
>>> info.st_size
249
>>> # The following additional info is platform dependent:
>>>
>>> info.st_blocks
8
>>> info.st_blksize
4096
>>> info.st_birthtime
1505476400.0

On the Unix command line:

$ ls -laid test1.txt
24051862 -rw-r--r--  1 user  staff  249 Sep 15 06:53 test1.txt
$ date -r 1505476401
Fri Sep 15 06:53:21 CDT 2017
$

info.st_ino is the inode displayed by the Unix command ls.

info.st_size is the size displayed by the Unix command ls.

info.st_ctime is the creation time displayed by the Unix command ls.

Seeking within a text file

[edit | edit source]

Within text files the method f.seek() has limited functionality:

>>> f = open ('test.txt', mode='rt')
>>> 
>>> f.seek( 16, 0 ) # Put beginning of stream at position 16 relative to beginning of file.
16
>>> f.seek( 13 ) # Same as f.seek( 13, 0 )
13
>>> 
>>> f.seek( -5, 0 ) # Try to position stream before beginning of file.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: negative seek position -5
>>> 
>>> f.tell()
13 # Position of stream pointer is unchanged.
>>> 
>>> f.seek( -5, 1 ) # Try to position stream -5 bytes relative to current position:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: can't do nonzero cur-relative seeks
>>> 
>>> f.tell()
13 # Position of stream pointer is unchanged.
>>> 
>>> f.seek( f.tell()+4 ) # Equivalent of f.seek( 4, 1 ).
17
>>> 
>>> f.seek( 0,2 ) # Position stream at 0 bytes relative to end-of-file.
43
>>> 
>>> f.seek( -5, 2 ) # Try to position stream -5 bytes relative to end-of-file:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: can't do nonzero end-relative seeks
>>> 
>>> f.seek( f.seek(0,2) - 7 ) # Equivalent of f.seek( -7, 2 ).
36
>>> 
>>> f.seek( 1234567 )
1234567
>>> f.tell()
1234567 # Yes, you can put beginning of stream after end-of-file without error.
>>>

Reading from a single file

[edit | edit source]

f.read()

[edit | edit source]
>>> f = open ('test.txt')
>>> f
<_io.TextIOWrapper name='test.txt' mode='r' encoding='UTF-8'> # file object
>>> 
>>> f.read() # reads the whole file.
'Hello, world!\nHola, mundo!\nGoodbye, world!\n'
>>> f.read()
'' # At end-of-file.
>>> 
>>> f.seek(0) # To read file again, put beginning of stream at beginning of file.
0
>>> f.read()
'Hello, world!\nHola, mundo!\nGoodbye, world!\n'
>>> f.read()
'' # At end-of-file again.
>>>

f.read() may contain an optional argument size: f.read(size) where size is numeric, in which case at most size bytes are read and returned. The reference states: When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory.

f.readline()

[edit | edit source]
>>> f = open ('test.txt', mode='rt')
>>> f.readline() # Read one line of the file.
'Hello, world!\n' # '\n' at end has not been removed.
>>> f.readline() # Read next line of the file.
'Hola, mundo!\n'
>>> f.readline() # Read next line of the file.
'Goodbye, world!\n'
>>> f.readline() # Read next line of the file.
'' # At end-of-file.
>>>

To iterate using f.readline():

f = open ('test.txt', mode='rt')

while True :
    s =	f.readline()
    if s == '' : break
    print (s, end='')
Hello, world!
Hola, mundo!
Goodbye, world!

f.readline() may contain an optional argument size: f.readline(size) where size is numeric, in which case at most size bytes are read and returned.

f = open ('test.txt', mode='rt')

while True :
    posn = f.tell()
    s =	f.readline(8)
    if s == '' : break                                                                            
    print (" %2d %s" % (posn , s), end='')
  0 Hello, w  8 orld!
 14 Hola, mu 22 ndo!
 27 Goodbye, 35  world!

File object as iterable

[edit | edit source]
>>> f = open ('test.txt', mode='rt')
>>>
>>> lines = list(f) ; lines
['Hello, world!\n', 'Hola, mundo!\n', 'Goodbye, world!\n']
>>> lines = list(f) ; lines
[] # At end-of-file
>>> f.seek(0) # Put beginning of stream at beginning of file.
0
>>> lines = list(f) ; lines
['Hello, world!\n', 'Hola, mundo!\n', 'Goodbye, world!\n']
>>> lines = list(f) ; lines
[]
>>> 
>>> f.seek(20) # Put beginning of stream at desired position.
20
>>> list(f)
['mundo!\n', 'Goodbye, world!\n']
>>> list(f)
[]
>>>
f = open('test.txt')

for line in f :            # This code is memory efficient,
    print (line, end='')   # fast and simple.

f.close()
Hello, world!
Hola, mundo!
Goodbye, world!

Display all lines in the file that contain the word 'world':

>>> f = open ('test.txt', mode='rt')
>>>
>>> [p for p in f if 'world' in p]
['Hello, world!\n', 'Goodbye, world!\n']
>>> 
>>> f.close()

Reading international text

[edit | edit source]

File test1.txt contains:

η ρωμαϊκή μυθολογία (Roman mythology)
Всероссийская перепись населения 2010 года (2010 All-Russia Population Census)
..../...//..../...//..../...//..../...//..../...//..../...//..../...//..../...//

The last line is included to facilitate counting characters. Without knowing anything about Greek or Russian we see immediately that the Greek characters for iota ι

ϊ ί

are different, as are the Russian characters

и й.

The next thing to notice is that the file contains 249 bytes, but each line contains 38, 79, 81 characters or 198 characters for the whole file. Not to worry. In text files Python performs the appropriate encoding and decoding nicely:

f = open('test1.txt')

for line in f :
    print (len(line), line, end='')

f.close()
38 η ρωμαϊκή μυθολογία (Roman mythology)
79 Всероссийская перепись населения 2010 года (2010 All-Russia Population Census)
81 ..../...//..../...//..../...//..../...//..../...//..../...//..../...//..../...//

The same again with different detail:

f = open ('test1.txt', mode='rt')

while True :
    posn = f.tell()
    s = f.readline(30) # In text files read 30 characters.
    print (" %3d %s" % (posn , s), end='')
    if s == '' : break                                                                              

f.close()
   0 η ρωμαϊκή μυθολογία (Roman myt  47 hology)
  55 Всероссийская перепись населен 113 ия 2010 года (2010 All-Russia  149 Population Census)
 168 ..../...//..../...//..../...// 198 ..../...//..../...//..../...// 228 ..../...//..../...//
 249

Length of first line in bytes = 55. Length of first line in characters = 38.

Length of second line in bytes = 168-55 = 113. Length of second line in characters = 79.

Length of third line in bytes = 249-168 = 81. Length of third line in characters = 81.


The first invocation of f.readline(30) read 30 characters (17 Greek and 13 English) in 47 (17*2 + 13) bytes, 47-30 = 17 extra bytes for 17 Greek characters. Similarly, the fourth invocation of f.readline(30) read 30 characters (6 Russian and 24 English) in 36 (149-113) or (6*2 + 24) bytes, 36-30 = 6 extra bytes for 6 Russian characters. When you're reading international text, the number of bytes read will be, almost certainly, more than the number of characters read.

Take care if you reposition the stream into the middle of international text:

f = open ('test1.txt', mode='rt')

f.seek(4)

posn = f.tell()
s = f.readline(30)
print (" %3d %s" % (posn , s))
Traceback (most recent call last):
  File "t3.py", line 11, in <module>
    s = f.readline(30)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte

Reading from multiple input streams

[edit | edit source]

The function fileinput.input(....) is the primary interface of this module:

>>> import fileinput
>>>
>>> fileinput.input( files=('test.txt', 'test1.txt') ) # The 2 named files provide the input stream.
<fileinput.FileInput object at 0x101a96208>
>>> 
>>> list ( fileinput.input( files=('test.txt', 'test1.txt') ) )
['Hello, world!\n', 
'Hola, mundo!\n', 
'Goodbye, world!\n', 
'η ρωμαϊκή μυθολογία (Roman mythology)\n', 
'Всероссийская перепись населения 2010 года (2010 All-Russia Population Census)\n', 
'..../...//..../...//..../...//..../...//..../...//..../...//..../...//..../...//\n']
# Output above was edited for clarity.

Successive invocations of the stream do not require reopening or resetting the stream:

>>> len(list ( fileinput.input( files=('test.txt', 'test1.txt') ) ))
6
>>> len(list ( fileinput.input( files=('test.txt', 'test1.txt') ) ))
6
>>> (list ( fileinput.input( files=('test.txt', 'test1.txt') ) ))[3]
'η ρωμαϊκή μυθολογία (Roman mythology)\n'
>>> (list ( fileinput.input( files=('test.txt', 'test1.txt') ) ))[4]
'Всероссийская перепись населения 2010 года (2010 All-Russia Population Census)\n'
>>> (list ( fileinput.input( files=('test.txt', 'test1.txt') ) ))[1]
'Hola, mundo!\n'
>>> (list ( fileinput.input( files=('test.txt', 'test1.txt') ) ))[1]
'Hola, mundo!\n'
>>> 
>>> (list ( fileinput.input( files=('test.txt', 'test1.txt') ) ))[1:4]
['Hola, mundo!\n', 'Goodbye, world!\n', 'η ρωμαϊκή μυθολογία (Roman mythology)\n']
>>>

The functions fileinput.filename(), fileinput.lineno(), fileinput.filelineno() provide information about the stream:

for line in fileinput.input( files=('test.txt', '/dev/null', 'test1.txt') ) :
    print ('filename =', fileinput.filename())
    print ('   ', line, end='')                         
    print ('    lineno = {}, filelineno = {}.'.format( fileinput.lineno(), fileinput.filelineno() ))

exit (0)
filename = test.txt
    Hello, world!
    lineno = 1, filelineno = 1.
filename = test.txt
    Hola, mundo!
    lineno = 2, filelineno = 2.
filename = test.txt
    Goodbye, world!
    lineno = 3, filelineno = 3.
filename = test1.txt
    η ρωμαϊκή μυθολογία (Roman mythology)
    lineno = 4, filelineno = 1.
filename = test1.txt
    Всероссийская перепись населения 2010 года (2010 All-Russia Population Census)
    lineno = 5, filelineno = 2.
filename = test1.txt
    ..../...//..../...//..../...//..../...//..../...//..../...//..../...//..../...//
    lineno = 6, filelineno = 3.

The null file /dev/null was silently ignored.

Writing text to a disk file

[edit | edit source]

The function f.write(str) may be used to write to a file in text mode or binary mode.

In text mode it returns the number of characters written, in binary mode the number of bytes.

s = '''η ρωμαϊκή μυθολογία (Greek characters)                                                                           
Всероссийская перепись (Russian or Cyrillic)                                                                            
The Quick BROWN foX (English characters)                                                                                
'''

length_of_s = len(s) # in characters

ofile =	open('test1a.txt', 'w')

number_written = ofile.write(s) # in characters
end_of_file = ofile.tell() # in bytes

ofile.close()

print (
'''                                                                                                                     
length_of_s = {}                                                                                                        
number_written = {}                                                                                                     
end_of_file = {}                                                                                                        
'''.format(length_of_s, number_written, end_of_file)
)

length_of_s = 125
number_written = 125
end_of_file = 163

On the UNIX command line:

$ wc  test1a.txt
       3      17     163 test1a.txt
$ 

The UNIX executable wc (word count) shows that the output file contains 3 lines, 17 words and 163 bytes. The size in bytes agrees with end_of_file above.

The file actually contains 16 words. wc sees 'μυθολογία' as 2 words, but Python performs the output without error.

The difference between 163 and 125 (38) is because of the inclusion of 17 Greek letters and 21 Russian. Each of the international characters requires 2 bytes.

Binary operations on disk files

[edit | edit source]

Brief review of binary conversion

[edit | edit source]

An int data type is conceptually a sequence of bytes. However, an int cannot be written to disk directly. Before an int can be written to disk in binary format, it must be converted to, eg, a bytes object or bytearray. This section illustrates the conversion to and from by means of examples.

>>> b = 123456789 
>>> b1 = hex(b) ; b1
'0x75bcd15'
>>> b2 = b1[2:] ; b2
'75bcd15'
>>> b3 = '0'*(len(b2)%2) + b2 ; b3 # Ensure that the string contains an even number of hex digits. 
'075bcd15'
>>> len(b3) % 2
0 # Length of b3 is even.
>>> 
>>> b4 = bytes.fromhex(b3) ; b4 
b'\x07[\xcd\x15' # b4 is a bytes object containing integer b in binary format.
>>> isinstance(b4, bytes)
True
>>> 
>>> list(b4)
[7, 91, 205, 21]
>>> 7 == b4[0] and 91 == b4[1] == ord('[') and 205 == b4[2] == 0xCD and 21 == b4[3] == 0x15
True
>>> 
>>> (7<<24) + (91<<16) + (205<<8) + 21 == b # Check the conversion.
True
>>> # Convert from bytes object b4 to int:
>>>
>>> b5 = b4.hex() ; b5
'075bcd15'
>>> b6 = int(b5,16) ; b6
123456789
>>> 
>>> b6 == b
True
>>>

Methods int.to_bytes(length, ....) and int.from_bytes(bytes, ....)

[edit | edit source]


Method int.to_bytes(length, byteorder, *, signed=False) simplifies conversion from int to bytes.

Class method int.from_bytes(bytes, byteorder, *, signed=False) simplifies the reverse.

The following code ensures that the integer produced after encoding and decoding is the same as the original int:

def int_to_bytes (input) : # input is int.
    num_bits = input.bit_length()	
    num_bytes = (num_bits + 7) // 8
    if ((num_bits % 8) == 0) : num_bytes += 1

    return input.to_bytes(num_bytes, byteorder='big', signed=True)

def int_from_bytes (input) : # input is bytes.
    return int.from_bytes(input, byteorder='big', signed=True)

Writing to disk in binary mode

[edit | edit source]

Integer i1 contains binary data. Convert i1 to bytes and write to disk.

i1 = 123456789012345678901234567890123456789012345678901234567890

b1 = int_to_bytes (i1)

b1a = r"""b'\x13\xaa\xf5\x04\xe4\xbc\x1e\x62\x17\x3f\x87\xa4\x37\x8c\x37\xb4\x9c\x8c\xcf\xf1\x96\xce\x3f\x0a\xd2'"""

print (
r'''                                                                                                                                                                                        
i1 = {} # original int                                                                                                                                                                      
b1 = {} # original int as bytes                                                                                                                                                             
b1a is b1 expanded so that each byte is expressed as '\xHH':                                                                                                                             
b1a = {}                                                                                                                                                                                    
b1 == b1a : {} # b1a matches the hex representation of i1.                                                                                                                                  
'''.format( hex(i1), b1, b1a, b1 == eval(b1a) )	# b1a is str.                                                                                                                               
)

try:
    with open("test.bin", "wb") as ofile:
        number_written = ofile.write(b1) # in bytes                                                                                                                                         
        end_of_file = ofile.tell() # in bytes                                                                                                                                               
except:
    print ('Error detected in "with" statement.')

print (
'''                                                                                                                                                                                         
len(b1) = {}                                                                                                                                                                                
number_written = {}                                                                                                                                                                         
end_of_file = {}                                                                                                                                                                            
'''
.format(len(b1), number_written, end_of_file)
)
i1 = 0x13aaf504e4bc1e62173f87a4378c37b49c8ccff196ce3f0ad2 # original int
b1 = b'\x13\xaa\xf5\x04\xe4\xbc\x1eb\x17?\x87\xa47\x8c7\xb4\x9c\x8c\xcf\xf1\x96\xce?\n\xd2' # original int as bytes
b1a is b1 expanded so that each byte is expressed as '\xHH':
b1a = b'\x13\xaa\xf5\x04\xe4\xbc\x1e\x62\x17\x3f\x87\xa4\x37\x8c\x37\xb4\x9c\x8c\xcf\xf1\x96\xce\x3f\x0a\xd2'
b1 == b1a : True # b1a matches the hex representation of i1.

len(b1) = 25
number_written = 25
end_of_file = 25

$ od -t x1 test.bin
0000000    13  aa  f5  04  e4  bc  1e  62  17  3f  87  a4  37  8c  37  b4
0000020    9c  8c  cf  f1  96  ce  3f  0a  d2                            
0000031
$

Reading from disk in binary mode

[edit | edit source]

The file on disk contains a large int in bytes format. Read the file from disk and convert to int.

try:
    with open("test.bin", "rb") as ifile:
        b2 = ifile.read()
        end_of_file = ifile.tell() # in bytes
except:
    print ('Error detected in "with" statement.')

i2 = int_from_bytes (b2)

print (
'''
isinstance(b2, bytes): {}
len(b2) = {}
end_of_file = {}
i2 == i1: {}
'''.format( isinstance(b2, bytes),  len(b2), end_of_file, i2 == i1)
)
isinstance(b2, bytes): True
len(b2) = 25
end_of_file = 25
i2 == i1: True

Reading text in binary mode

[edit | edit source]

Data may be read from a text file in binary mode. The disadvantage is that python does not automatically perform the necessary decoding.

try:
    with open("test1a.txt", "rb") as ifile:
        b5 = ifile.read()
        end_of_file = ifile.tell() # in bytes
except:
    print ('Error detected in "with" statement.')

b6 = b5.decode(encoding='utf-8')

print (
'''
isinstance(b5, bytes): {}
len(b5) = {} bytes
end_of_file = {}

b5 = (
  {}
+ {}
+ {}
+ {}
+ {}
)

b5 decoded =
{}
length of b5 decoded = {} characters
'''.format( isinstance(b5, bytes),  len(b5), end_of_file,
b5[:18] , b5[18:56] , b5[56:83] , b5[83:122] , b5[122:] ,
b6, len(b6) )
)
isinstance(b5, bytes): True
len(b5) = 163 bytes
end_of_file = 163

b5 = (
  b'\xce\xb7 \xcf\x81\xcf\x89\xce\xbc\xce\xb1\xcf\x8a\xce\xba\xce\xae '
+ b'\xce\xbc\xcf\x85\xce\xb8\xce\xbf\xce\xbb\xce\xbf\xce\xb3\xce\xaf\xce\xb1 (Greek characters)\n'
+ b'\xd0\x92\xd1\x81\xd0\xb5\xd1\x80\xd0\xbe\xd1\x81\xd1\x81\xd0\xb8\xd0\xb9\xd1\x81\xd0\xba\xd0\xb0\xd1\x8f '
+ b'\xd0\xbf\xd0\xb5\xd1\x80\xd0\xb5\xd0\xbf\xd0\xb8\xd1\x81\xd1\x8c (Russian or Cyrillic)\n'
+ b'The Quick BROWN foX (English characters)\n'
)

b5 decoded =
η ρωμαϊκή μυθολογία (Greek characters)
Всероссийская перепись (Russian or Cyrillic)
The Quick BROWN foX (English characters)

length of b5 decoded = 125 characters

Seeking in binary mode

[edit | edit source]

f.seek() works as expected in binary mode:

>>> f = open("test.bin", "rb") 
>>> f.seek(0,2) # seek to end-of-file.
25 # File contains 25 bytes.
>>> f.tell()
25
>>> f.seek(-3,0) # You cannot seek to a position before beginning of file.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
>>> f.tell()
25 # Still at end-of-file.
>>> f.seek(3,0) # relative to beginning of file.
3
>>> f.seek(4,1) # relative to current position.
7
>>> f.seek(-5,2) # relative to end-of-file.
20
>>> f.seek(123456)
123456 # Yes, you can position the internal pointer to a position after end-of-file.
>>> f.close()

Truncating a file in binary mode

[edit | edit source]

Function os.ftruncate(fd, length) uses a file descriptor.

import os

try:
    with open("test.bin", "r+b") as f:
        print ('File is writable:', f.writable())
        print ('File is readable:', f.readable())
        b = f.read()
        if len(b) != f.tell() : raise OSError
        print ('b =', b, '# Original contents.')

        print ('Truncate the file to current size - 10')
        os.ftruncate(f.fileno(), f.seek(0,2) - 10)
        f.seek(0)
        b = f.read()
        if len(b) != f.tell() : raise OSError
        print ('b =', b, '# Last 10 bytes removed.')

        print ('Truncate the file to current size + 10')
        os.ftruncate(f.fileno(), f.seek(0,2) + 10)
        f.seek(0)
        b = f.read()
        if len(b) != f.tell() : raise OSError
        print ('b =', b, '# 10 null bytes added at end.')

        print ('Truncate the file to 4 bytes')
        os.ftruncate(f.fileno(), 4)
        f.seek(0)
        b = f.read()
        if len(b) != f.tell() : raise OSError
        print ('b =', b, '# Truncated to 4 bytes.')
except OSError:
    print ('OSError detected in "with" statement.')
except FileNotFoundError:
    print ('FileNotFoundError detected in "with" statement.')
except PermissionError:
    print ('PermissionError detected in "with" statement.')
except:
    print ('Unknown error detected in "with" statement.')
File is writable: True
File is readable: True
b = b'\x13\xaa\xf5\x04\xe4\xbc\x1eb\x17?\x87\xa47\x8c7\xb4\x9c\x8c\xcf\xf1\x96\xce?\n\xd2' # Original contents.
Truncate the file to current size - 10
b = b'\x13\xaa\xf5\x04\xe4\xbc\x1eb\x17?\x87\xa47\x8c7' # Last 10 bytes removed.
Truncate the file to current size + 10
b = b'\x13\xaa\xf5\x04\xe4\xbc\x1eb\x17?\x87\xa47\x8c7\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' # 10 null bytes added at end.
Truncate the file to 4 bytes
b = b'\x13\xaa\xf5\x04' # Truncated to 4 bytes.

External operations on files

[edit | edit source]


External operations are similar to those executed on the Unix command line. They do not depend on having a file open before execution:

Creating a file

[edit | edit source]

On the Unix command line:

$ ls -la test1.txt test5.txt
ls: test5.txt: No such file or directory
-rw-r--r--  1 user  staff  249 Sep 15 06:53 test1.txt
$ 
>>> f = open ('test1.txt', 'x') # 'x' for exclusive creation.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileExistsError: [Errno 17] File exists: 'test1.txt'
>>> f
<_io.BufferedRandom name='test1.bin'> # Unchanged. f refers to previous open.
>>> 
>>> f = open ('test5.txt', 'x')
>>> f
<_io.TextIOWrapper name='test5.txt' mode='x' encoding='UTF-8'>
>>> f.closed
False
>>> f.readable()
False
>>> f.writable()
True
>>> f.close()
>>>

On the Unix command line:

$ ls -la test1.txt test5.txt
-rw-r--r--  1 user  staff  249 Sep 15 06:53 test1.txt
-rw-r--r--  1 user  staff    0 Oct  4 08:28 test5.txt # File was created.
$ 

Truncating a file

[edit | edit source]
>>> os.stat ('test1.txt').st_size
249
>>> os.truncate('test1.txt', 200)
>>> os.stat ('test1.txt').st_size
200
>>> os.stat ('test5.txt').st_size
0
>>> os.truncate('test5.txt', 150)
>>> os.stat ('test5.txt').st_size
150 # 150 bytes were added to file.
>>>

On the Unix command line:

$ ls -la test1.txt test5.txt
-rw-r--r--  1 user  staff  200 Oct  4 08:37 test1.txt
-rw-r--r--  1 user  staff  150 Oct  4 08:38 test5.txt
$ od -h test5.txt
0000000      0000    0000    0000    0000    0000    0000    0000    0000
*
0000220      0000    0000    0000                                        
0000226
$ # test5.txt contains 150 null bytes after truncation.

Accessing a file

[edit | edit source]

os.access(path, mode, ....) may be used to determine the existence, readability, writability and executability of path.

$ ls -la t*n
--w-r-----  1 user  staff  25 Sep 29 13:12 test.bin
-rw-r--r--  1 user  staff  25 Sep 29 08:03 test1.bin
$ 
>>> os.access('test0.txt', os.F_OK)
False # The file does not exist.
>>>
>>> os.access('test.bin', os.F_OK)
True # The file exists.
>>>
>>> os.access('test.bin', os.R_OK)
False # It is not readable.
>>>
>>> os.access('test.bin', os.W_OK)
True # It is writable.
>>
>>> os.access('test.bin', os.X_OK)
False # It is not executable.
>>>
>>> os.access('test1.bin', os.F_OK + os.R_OK + os.W_OK) # Modes may be added.
True # The file exists and it's both readable and writable.
>>>

Changing a file's u,g,o permissions

[edit | edit source]

os.chmod(path, mode, *, dir_fd=None, follow_symlinks=True) may be used to change the permissions for user, group and other.


On the Unix command line:

$ ls -la t*n
--w-r-----  1 user  staff  25 Sep 29 13:12 test.bin
-rw-r--r--  1 user  staff  25 Sep 29 08:03 test1.bin
$ 

On the python command line:

>>> import os
>>> import stat
>>> os.chmod('test.bin', stat.S_IRWXG) # read, write, execute for group.
>>>

On the Unix command line:

$ ls -la t*n
----rwx---  1 user  staff  25 Sep 29 13:12 test.bin # Permissions changed to r,w,x for group.
-rw-r--r--  1 user  staff  25 Sep 29 08:03 test1.bin
$ 

Renaming a file

[edit | edit source]

os.rename(src, dst, *, src_dir_fd=None, dst_dir_fd=None) may be used to rename a file.


On the Unix command line:

$ ls -la test1*t
-rw-r--r--  1 user  staff  200 Oct  4 08:37 test1.txt
-rw-r--r--  1 user  staff  163 Sep 18 18:07 test1a.txt
$

On the python command line:

>>> os.rename('test1.txt', 'test1a.txt')
>>>

On the Unix command line:

$ ls -la test1*t
-rw-r--r--  1 user  staff  200 Oct  4 08:37 test1a.txt
$ 

The old file test1a.txt was silently deleted.

os.rename(src, dst, ....) and os.replace(src, dst, ....) are almost identical with slight differences dependent on Operating System.

Removing a file

[edit | edit source]

os.remove(path, *, dir_fd=None) may be used to remove (delete) a file.

On the Unix command line:

$ ls -la test1*t
-rw-r--r--  1 user  staff  200 Oct  4 08:37 test1a.txt
$ 

On the python command line:

>>> os.remove('test1a.txt')
>>>

On the Unix command line:

$ ls -la test1*t
ls: test1*t: No such file or directory
$ 

The file test1a.txt was deleted.

This function and unlink() are semantically identical.

Objects that behave like files

[edit | edit source]

The terminal

[edit | edit source]

On Unix each terminal window has its unique device name, a name that looks like a file name, eg, /dev/ttys003. Communication with the console may be achieved by treating the console like a file:

$ cat t5.py
import os

print ('Name of my terminal is:', os.ttyname(0))

f = open(os.ttyname(0), 'wt')

print ('File object opened for writing to my terminal is:', f)

print ('Enter your date-of-birth [mm/dd/yyyy]: ', end='', flush=True, file=f) # Writing to terminal.

f.close()

f = open(os.ttyname(0))

dob = f.read() # Reading from terminal.

print ('File object opened for reading from my terminal is:', f)

f.close()

print ('You entered:', dob, end='')
$ python3.6 t5.py
Name of my terminal is: /dev/ttys003
File object opened for writing to my terminal is: <_io.TextIOWrapper name='/dev/ttys003' mode='wt' encoding='UTF-8'>
Enter your date-of-birth [mm/dd/yyyy]: 12/31/1999 # Enter dob followed by new-line and ^D for end-of-file.
File object opened for reading from my terminal is: <_io.TextIOWrapper name='/dev/ttys003' mode='r' encoding='UTF-8'>
You entered: 12/31/1999
$

Pipes

[edit | edit source]

os.pipe() creates a pipe:

>>> fdr,fdw = os.pipe()
>>> fdr
3 # The file descriptor for reading from the pipe.
>>> fdw
4 # The file descriptor for writing to the pipe.
>>> 
>>> os.write(fdw, b'Hello, world!') # Write to the file descriptor associated with the pipe.
13 # Number of bytes written.
>>> os.read(fdr, 7) # Read a max of 7 bytes from the file descriptor associated with the pipe.
b'Hello, '
>>> os.read(fdr, 99) # Read a max of 99 bytes from fdr.
b'world!'
>>> os.close(fdr)
>>> os.close(fdw)
>>>

Standard input is usually file descriptor 0, standard output is 1, and standard error is 2. Further files opened by a process will then be assigned 3, 4, 5, and so forth. Hence file descriptors 3 and 4 above.

The pipe implements a fifo, first-in-first-out queue. Data added to the pipe is appended to the data in the pipe. Data removed from the pipe is read from the beginning of data in the pipe.

Before trying to read data from a pipe ensure that there is data in the pipe:

import os

fdr,fdw	= os.pipe()

number_written = os.write(fdw, b'9876543210')

number_of_bytes_in_pipe = os.stat(fdr).st_size

if number_written == number_of_bytes_in_pipe == 10 :
    pass # Integrity of data looks good.                                                                                                        
else :
    print ('Internal error: Input data corrupted.')
    exit (99)

print ('number_of_bytes_in_pipe =', number_of_bytes_in_pipe)

number_of_bytes_read_from_pipe = 0

while os.stat(fdr).st_size > 0 :
    data = os.read(fdr,1)
    number_of_bytes_read_from_pipe += len(data)
    print ('data =', data)

os.close(fdr)
os.close(fdw)

print ('number_of_bytes_read_from_pipe =', number_of_bytes_read_from_pipe)

exit (0)
number_of_bytes_in_pipe = 10
data = b'9'
data = b'8'
data = b'7'
data = b'6'
data = b'5'
data = b'4'
data = b'3'
data = b'2'
data = b'1'
data = b'0'
number_of_bytes_read_from_pipe = 10

It seems that functionsos.lseek(fd, ....), os.pread(fd, ....) do not work with pipes.

File objects and file descriptors

[edit | edit source]

On the Unix command line:

$ ls -l test.txt ; cat test.txt
-rw-r--r--  1 user  staff  43 Sep  9 17:26 test.txt
Hello, world!
Hola, mundo!
Goodbye, world!
$

The function open() returns a file object and silently creates a file descriptor:

>>> f = open('test.txt')
>>> f
<_io.TextIOWrapper name='test.txt' mode='r' encoding='UTF-8'>
>>>
>>> fd_f = f.fileno() # fd_f (an int) is the file descriptor associated with f.
>>> fd_f
5
>>> 
>>> os.stat(fd_f).st_size # The file descriptor can provide info about the file.
43 # Size of file 'test.txt' in bytes
>>> 
>>> os.lseek(fd_f,10,0) # Seek to beginning-of-file + 10
10
>>> f.tell()
10
>>> os.lseek(fd_f,-3,1) # Seek -3 relative to current position.
7 # Using fd_f it works.
>>> f.seek(-3,1) # Using f it doesn't work in text mode.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: can't do nonzero cur-relative seeks
>>> f.tell()
7
>>> 
>>> f.seek(14)
14
>>> os.lseek(fd_f,0,1) # Report current position.
14
>>> f.read(13)
'Hola, mundo!\n'
>>> f.tell()
27
>>> os.lseek(fd_f,0,1)
43 # Internal pointers of f and fd_f are NOT always the same.
>>> 
>>> os.lseek(fd_f,-16,2) # Seek -16 relative to end-of-file.
27
>>> os.read(fd_f,99)
b'Goodbye, world!\n'
>>> 
>>> f.closed
False
>>> f.close() # This also closes fd_f.
>>> f.closed
True
>>> os.stat(fd_f).st_size
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 9] Bad file descriptor: 5
>>>

Multiple file objects with same file descriptor

[edit | edit source]

Two or more file objects may have the same file descriptor:

>>> f = open('test.txt')
>>> f1 = open(f.fileno(), 'w', closefd=False)
>>> f
<_io.TextIOWrapper name='test.txt' mode='r' encoding='UTF-8'>
>>> f1
<_io.TextIOWrapper name=5 mode='w' encoding='UTF-8'>
>>> f.fileno()
5
>>> f1.fileno()
5
>>> f.closed
False
>>> f1.closed
False
>>> f.close()
>>> f.closed
True
>>> f1.closed
False # f1 remains open
>>> f1.fileno()
5
>>> f1.readable()
False
>>> f1.writable()
True # This info is deceptive.
>>>

If you have two file objects associated with the same file descriptor and you close one of the file objects, the behavior of the other may be unpredictable. Unless you really know what you're doing, when you close one file object of many associated with the same file descriptor close them all. Also, don't close the file descriptor (with os.close(fd)) before closing the file object. This creates a really messy situation.

Temporary files

[edit | edit source]

python's tempfile module contains functions that can be used to generate temporary files. Depending on the function and parameters used, the file created may or may not be visible on the file system, it may or may not be deleted when the file is closed, and it may or may not be opened in binary mode.


Function tempfile.NamedTemporaryFile(....) is representative of the functions available for file creation in module tempfile:

tempfile.NamedTemporaryFile(
mode='w+b',     # For consistent behavior across platforms.
buffering=None, 
encoding=None,  # Buffering, encoding and newline are as for open().
newline=None, 
suffix='',      # With 'suffix' and 'prefix' you have some control over the file name.
prefix='tmp', 
dir=None,       # You can specify the directory where the file will be created.
delete=True     # You can choose to keep the file after closure.
)

Opening a temporary file in text mode for deletion on closing:

>>> import tempfile
>>>
>>> f1 = tempfile.NamedTemporaryFile(mode='r+t', suffix='.txt')
>>> f1
<tempfile._TemporaryFileWrapper object at 0x101b7ab38>
>>> isinstance(f1, tempfile._TemporaryFileWrapper)
True
>>> f1.name
'/var/folders/8s/dctgn1y57fgc9_h2mzckbqs80000gn/T/tmpew4jippj.txt'
>>> f1.readable()
True
>>> f1.writable()
True
>>> f1.seekable()
True
>>> f1.encoding
'UTF-8'
>>> f1.write('Hello, world!\n')
14
>>> f1.write('Καλώς ήρθατε στο Βικιεπιστήμιο\n')
31 # 27 Greek + 4 English characters = 27*2 + 4 = 58 bytes.
>>> f1.tell()
72 # 14 + 58
>>> f1.seek(0)
0
>>> f1.readline()
'Hello, world!\n'
>>> f1.readline()
'Καλώς ήρθατε στο Βικιεπιστήμιο\n'
>>> f1.readline()
''
>>> os.stat( f1.name )
os.stat_result(st_mode=33152, st_ino=24460790, st_dev=16777218, st_nlink=1, st_uid=501, st_gid=20, st_size=72, st_atime=1508154874, st_mtime=1508154545, st_ctime=1508154545)
>>> f1.close()
>>> f1.name
'/var/folders/8s/dctgn1y57fgc9_h2mzckbqs80000gn/T/tmpew4jippj.txt'
>>> os.stat( f1.name )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 
'/var/folders/8s/dctgn1y57fgc9_h2mzckbqs80000gn/T/tmpew4jippj.txt'
>>>

Opening a temorary file in binary mode for retention on closing:

>>> f1 = tempfile.NamedTemporaryFile(suffix='.bin', delete=False)
>>> f1
<tempfile._TemporaryFileWrapper object at 0x101a8c518>
>>> 
>>> f1.write( b'When in the course of ....' )
26
>>> f1.name
'/var/folders/8s/dctgn1y57fgc9_h2mzckbqs80000gn/T/tmpdz14i44i.bin'
>>> f1.tell()
26
>>> f1.seek(0)
0
>>> f1.read()
b'When in the course of ....'
>>> f1.close()
>>>

The temporary file exists after closing:

$ ls -la /var/folders/8s/dctgn1y57fgc9_h2mzckbqs80000gn/T/tmpdz14i44i.bin
-rw-------  1 user  staff  26 Oct 16 07:22 /var/folders/8s/dctgn1y57fgc9_h2mzckbqs80000gn/T/tmpdz14i44i.bin
$

Assignments

[edit | edit source]

Further Reading or Review

[edit | edit source]

References

[edit | edit source]

1. Python's documentation:

"7.2. Reading and Writing Files," "11.3. fileinput — Iterate over lines from multiple input streams," "4.8. Binary Sequence Types," "16.1.5. Files and Directories," "11.6. tempfile — Generate temporary files and directories"


2. Python's methods:

"7.2.1. Methods of File Objects," "16.2.3.1. I/O Base Classes," "16.2.3.4. Text I/O"


3. Python's built-in functions:

"open(file, mode=.....)," "bytes()," "bytearray()," "os.ftruncate(fd, length)," " os.stat(path, *, ...)," "os.truncate(path, length)," "os.access(path, mode, ....)," "os.chmod(path, mode, ....)," "os.rename(src, dst, ....)," "os.remove(path, ....)," "os.pipe()," "os.read(fd, n)," "os.write(fd, str)," "print(*objects, ....)"