Sunday, June 23, 2013

Exploring "regular" functions versus generator functions in Python


By Vasudev Ram

This post is about "regular" functions versus generator functions in Python. I'm using the term "regular" functions for lack of a better word; what I mean by that is non-generator functions.

Consider this text file, test1.txt:
this is a line with a foo and another foo and one more foo.
the foo brown foo jumped over the lazy foo
foo are you. you are foo.
Here is a program, with a "regular" function, to process all the lines in that text file:
# regular_text_proc.py

import string

# Replace instances of the string old_pat with new_pat in line.
def process_line(line, old_pat, new_pat):
    return line.replace(old_pat, new_pat)

# Process a text file, calling process_line on each line.
def regular_text_proc(filename, old_pat, new_pat):

    new_lines = []
    with open(filename) as fp:
        for line in fp:
            new_line = process_line(line, old_pat, new_pat)
            new_lines.append(new_line)
    return new_lines

def main():

    newlines = regular_text_proc("test1.txt", "foo", "bar")

    print "new file:"
    for line in newlines:
        print line,

main()
This command:
python regular_text_proc.py
gives this output:
new file:
this is a line with a bar and another bar and one more bar.
the bar brown bar jumped over the lazy bar
bar are you. you are bar.
Here is a program, with a generator function, to do the same kind of processing of the same file:
# lazy_text_proc.py
# Lazy text processing with Python generators.

import string

# Replace instances of the string old_pat with new_pat in line.
def process_line(line, old_pat, new_pat):
    return line.replace(old_pat, new_pat)

# Process a text file lazily, calling process_line on each line.
def lazy_text_proc(filename, old_pat, new_pat):
    with open(filename) as fp:
        for line in fp:
            new_line = process_line(line, old_pat, new_pat)
            yield new_line

def main():
    newlines = lazy_text_proc("test1.txt", "foo", "bar")
    print "type(newlines) =", type(newlines)
    # Line below will give error if uncommented, because
    # newlines is not a list, it is a generator.
    #print "len(newlines) =", len(newlines)
    print "new file:"
    for lin in newlines:
        print lin,

main()
This command:
python lazy_text_proc.py
gives the same output as the regular_text_proc.py program, except for the type(newlines) output, which I added, to show that the variable called 'newlines', in this program, is not a list but a generator. (It is a list in the regular_text_proc.py program.)

I found the difference between these two programs, one with a regular function and the other with a generator function, to be interesting in a few ways. I'll discuss that in my next blog post.

The Wikipedia article on generators is of interest.

- Vasudev Ram - Dancing Bison Enterprises

Contact me

No comments: