Getting a hang of python's file pointers
I've been using python for a while now, and I literally enjoy the elegance and simplicity it offers. It's completely fluidic when comes to developing something. It's fast (combined with theano which I'll be discussing later) and easy to understand.
Unlike others, by the time I was introduced to these "geeky stuff", I was in my senior year. That's when I actually started creating something that mattered. It's a common misconception that c/c++ are old and outdated. Maybe they aren't. In fact, I am grateful that I started out with c, unlike others who criticized me for doing it. I don't completely disagree with them, but c has it's own perks that python doesn't. As days progressed, complicated stuff became even more complicated for me.
I was petrified and was looking for a way out. I was seeking for alternatives desperately. The magic word "Python" appeared out of now where. And, that initiated the python mania. I've faced countless errors which would usually drive me crazy. These obstacles were hanging like a dead weight until one day. That was when I realized, maybe I should start writing blog posts about the errors and methods to circumvent them. Welcome aboard dear python'ers, Let's fix our errors and write code that will shape something (it definitely will :P).
![]() |
That pretty much sums up my life!! |
This blog post revolves around the concepts of python file pointers, dumb mistakes I've made and how to overcome them. Let's get started...
When file pointers have their own mind...
A while back, when I was working on one of my side projects, one semantic error proved to be a pain. It was so horrible that I wrote the code without understanding the intricate details of file pointers.
I began writing code for displaying the contents of a .csv (comma separated value) file, which I expected to work perfectly. It did work as expected. Just once!! I spent a good deal of time to trace what could have possibly gone wrong. After a while, I was finally able to figure out what went wrong. Thanks to c, my old friend.
The error can be attributed to the unique characteristic of file pointers. If the file pointers happen to reach the EOF (end of file), they tend to stay put. I was clueless what could have happened even though my code was syntactically correct. File pointers of c and python share this characteristic. Back in the c days, I had to reposition the file pointer at the desired position whenever they reach the EOF. It's actually pretty satisfying that python does have something in common with c ( well, that's how much I've explored python. naive me :P)
I intended using a pseudo-ly fabricated .CSV file as the source accompanied by a python script. Get your code and other supporting files from here. The custom generated .csv (comma separated value file) contains three records which are spread across 7 fields (columns). The .csv file contains the partition details of my virtual hard disk which I used in previously during arch Linux installation. The python file contains main and a sub-routine.
The main function houses the declaration of file pointers and along with function call statements. While the subroutine iterates the object of the file repeatedly and displays them.
dude, err... can you explain what happens in a nutshell??
1 import csv
2
3 def print_file(source_handle):
4 for row in source_handle:
5 for item in row:
6 print ("%s") %item
7
8 def main():
9 source_file = open("test.csv", "rb")
10 source_handle = csv.reader(source_file)
11 for i in range(2):
12 print_file(source_handle)
13 source_file.seek(0)
14
15 if __name__ == "__main__":
16 main()
2
3 def print_file(source_handle):
4 for row in source_handle:
5 for item in row:
6 print ("%s") %item
7
8 def main():
9 source_file = open("test.csv", "rb")
10 source_handle = csv.reader(source_file)
11 for i in range(2):
12 print_file(source_handle)
13 source_file.seek(0)
14
15 if __name__ == "__main__":
16 main()
The code snippet (which probably is junk) contains a subroutine along with the main routine. It incorporates various built-in functions which we will be analyzing in this post.
To begin with, let's dechiper what's happening in main():
Line 9 source_file = open("test.csv", "rb")
Line 9 on execution returns a file object to the variable source_file after opening the test.csv in read mode. The source_file object can be used to iterate over the file, but I wouldn't recommend it. The built-in function open(), used for opening a file, is commonly invoked with two arguments: file name and open mode.
open(filename, open_mode)
Line 10 source_handle = csv.reader(source_file)
Similar to line 9, on execution, line 10 returns a file object. Although they have similar functionalities, the resultant file objects essentially differ from each other. The built-in function csv.reader returns a readable object enabling the line by line iteration of the file. csv.reader is commonly invoked with two arguments: source file/object and delimiter with several other optional parameters.
csv.reader(file_object, [delimiter = ' '], [quotechar = ' '])
What happens when csv.reader is ignored?
Ignoring csv.reader() may develop some runtime anomalies for this particular test case as it involves a CSV file. On creating two test cases, only open() and open() with csv.reader(), the following were obtained. They rather provide a good insight into the intricate details of handling files.
Case 1: Using only open() without csv.reader()
![]() |
Fig 1. Obtained using the file object created open() |
As cited before, the CSV file contains three rows of data spread across seven columns partitioned by a comma. On iterating the file object created by open(), it considers "," to be a character and displays it.
Case 2: Using both open() and csv.reader():
![]() |
Fig 2. Obtained while using file object created by csv.reader() |
Line 11 for i in range(2):
Line 12 print_file(source_handle)
Line 13 source_file.seek(0)
Line 12 print_file(source_handle)
Line 13 source_file.seek(0)
For loop in line 11 on execution, initializes i with 0 and iterates over and over till i exceeds 1. In line 12, on executing print_line (file_handle), user-defined print_file subroutine gets invoked by passing file_handle (source_handle) as a parameter. Line 13 is where the magic happens... The file pointer repositioning. file_handle(seek_loc) repositions the file pointer's current position at the given offset.
Commonly invoked seek() format is as follows:
fileobject.seek(offset [, whence])
Where whence is an optional parameter which is used to position the file pointer relative to a position in the file.
The comprehensive objective of lines 11, 12 and 13 is to call the print_file function twice followed by repositioning the file_pointer at the beginning of the file after the control returns from the subroutine.
Figuring out what's in store at print_file subroutine:
The print_file subroutine takes a file_object as a parameter. The main objective of this subroutine is to print the individual rows present in the file by iterating them.
Line 4 for row in source_handle:
Line 5 for item in row:
Line 6 print ("%s") %item
Line 4 acts as an outer container which considers an entire row in the CSV file. The index pointer of outer for loop holds the row over which the inner for loop, Line 5, iterates over to extract individual texts. The individual words/text copied onto the index pointer of the inner for loop is then printed in Line 6.Line 5 for item in row:
Line 6 print ("%s") %item
Now that all the main routines and subroutines are untangled, we are left with these two eccentric lines... What do they do??
Line 15 if __name__ == "__main__":
Line 16 main()
Line 16 main()
if __name__ == "__main__":
is used to execute some code only if the file was run directly, and not imported. In our case, it runs main() directly... Even though it isn't necessary, I resort to using it.
If you made it till here, this is for you...

And this too...

ConversionConversion EmoticonEmoticon