Friday, August 31, 2012

PipeController v.01 released - simulating UNIX-style pipes in Python


By Vasudev Ram


Some time ago, I had written a post about ways of doing UNIX-style pipes in Python. It had links to some different tools that enable you to do that.

More recently, I wrote another post about one more such tool, Plumbum:

Plumbum, UNIX shell-like library and tool in Python

Recently, I worked on implementing something on the same lines myself. I've tentatively named it PipeController, for lack of a better name. I did google for shorter and better-sounding names, mostly variations on the word "pipe", but most of them were already taken by other software products or other stuff. So I've settled on this name for now.

PipeController is a tool to experiment with a simple, sequential, synchronous simulation of UNIX-style pipes in Python. It's the first release, and has only a little functionality as of now.

The main source file is pipes.py. Apart from class PipeController, it has a main function that does a simple test of setting up and running a pipe.

How to use PipeController:

Each component of the "pipe" is to be implemented by the calling program as a Python function. Each function should take one input and return one output, which should be the result of processing the input.

The tool (basically, the class PipeController), takes care of setting up the pipe and running it.

You have to create an instance of PipeController and calls its methods to make that happen. The main methods to be used are:

- PipeController.add_processor(func_name)
- PipeController.run_pipe()

The add_processor() method just adds the given function name (which should be passed as a bare name, without trailing parentheses), to a list in the instance.

The run_pipe() method actually runs the pipe. It loops over the lines of input (text, for now) in sys.stdin, and passes them to each of the functions in turn, using an inner loop:
def run_pipe(self):
        item = self._pipe_input.readline()
        while item != '': # while not EOF
            result = item
            for processor in self._processors:
                result = processor(result)
            self._pipe_output.write(result)
            item = self._pipe_input.readline()
        self._pipe_output.close()
You can call add_processor() any number of times. Each call will append one function to the pipeline. The functions will be called on the input in the order they are added to the instance.

The default input for the whole pipeline is sys.stdin and the default output is sys.stdout. As of now there is no support or usage example for changing the default input source or output destination programmatically, although the PipeController.__init__() method's signature indicates that you can. I put that in so I can work on it later.

The run_pipe() method loops over the lines in the input, passes each line to the chain of functions, one by one, with each function's output becoming the next function's input (as in the case of UNIX pipes, except there we have commands instead of functions), and writes the final result for each line to the output.

Sample usage (some code omitted):
pc = PipeController()
pc.add_processor(oto0)
pc.add_processor(eto3)
pc.add_processor(upcase)
pc.add_processor(delspace)
pc.run_pipe()
where the Python functions oto0, eto3, upcase and delspace convert occurrences of "o" to "0", "e" to "3", letters to uppercase, and delete spaces (from their input), respectively. (See the code in the PipeController source zip file linked to below, for those function definitions and the rest of the code).

If that was all that PipeController provided, it could be replaced by just nesting / composing function calls, with the innermost call taking the input string (line). E.g. Instead of the for loop in the run_pipe() method, we could use:
f(g(h(item)))
or, to use the example functions in the code:
delspace(upcase(eto3(ot0(item)))
But while experimenting with the code after first writing it, I discovered that it has a few interesting properties (and hence potential uses), some of which may not be as convenient to achieve using the functional composition method. I'll write about that in an upcoming post or two.

You can download PipeController v0.1 here.

It has a few input files that are meant to be used with the existing main function that tests the PipeController class.

Example usage (where it1 is one of the input text files):

On UNIX/Linux:
$ cat it1 | python pipes.py
or
$ python pipes.py < it1
On Windows:
C:> type it1 | python pipes.py
or
C:> python pipes.py < it1
The input file it1 contains:
1    some lowercase text
2    more lowercase text
3    even more lowercase text
4    yet more lowercase text
After the above commands run (any one), the output is:
1    S0M3L0W3RCAS3T3XT
2    M0R3L0W3RCAS3T3XT
3    3V3NM0R3L0W3RCAS3T3XT
4    Y3TM0R3L0W3RCAS3T3XT
which is the result of processing the input using the pipeline.

To summarize, again:

it1 is an input text file containing a few lines of text. The above command transforms the contents in the following ways, via the pipeline: converts occurrences of 'o' to '0', then occurrences of 'e' to '3', then upper-cases letters, and finally deletes spaces.

Each of these is conversions is done by a corresponding Python function in the program (the functions named above). The whole pipeline is setup and run by an instance of the PipeController class.

You can define any functions of your own (and any number of them) and use them to create a pipeline for your own purposes. The only requirement is that each function should take a string (representing a line of text) as input (i.e. its argument), and return the processed string.

Enjoy, and please give your feedback, if any.

- Vasudev Ram - Dancing Bison Enterprises

No comments: