Monday, November 5, 2012

PDFBuilder can now take multiple input files from command line


By Vasudev Ram

PDFBuilder, which I blogged about recently, can now build a composite PDF from an arbitrary number [1] of input files (CSV and TDV) [2] specified on the command line. (I've removed the hard-coding in the first version.)

I've also cleaned up and refactored the PDFBuilder code some, though I still need to do some more.

UPDATE: I've pasted a few code snippets from PDFBuilder.py at the end of this post.

This version of PDBBuilder can be downloaded here, as a part of xtopdf v1.4, from the Bitbucket repository.

[1] Arbitrary number, that is, subject to the limitations of the length of the command line supported by your OS, of course - whether Unix / Linux, Mac OS X or Windows. However, there is a solution for that.

[2] The design of PDFBuilder allows for easily adding support for other input file formats that are row-oriented. See the method next_row() in the file CSVReader.py in the source package, for an example of how to add support for other compatible input formats. You just have to write a reader class (analogous to CSVReader) for that other format, called, say, FooReader, and provide an open() method and a next_row() method as in the CSVReader class, but adapted to handle Foo data.

Some code snippets from PDFBuilder.py:

The PDFBuilder class:
class PDFBuilder:
 """
 Class to build a composite PDF out of multiple input sources.
 """

 def __init__(self, pdf_filename, font, font_size, 
    header, footer, input_filenames):
  """
  PDFBuilder __init__ method.
  """
  self._pdf_filename = pdf_filename
  self._input_filenames = input_filenames

  # Create a PDFWriter instance.
  self._pw = PDFWriter(pdf_filename)

  # Set its font.
  self._pw.setFont(font, font_size)

  # Set its header and footer.
  self._pw.setHeader(header)
  self._pw.setFooter(footer)
  
 def build_pdf(self, input_filenames):
  """
  PDFBuilder.build_pdf method.
  Builds the PDF using contents of the given input_filenames.
  """

  # Loop over all names in input_filenames.
  # Instantiate the appropriate reader for each filename, 
  # based on the filename extension.

  # For each reader, get each row, and for each row,
  # combine all the columns into a string separated by a space,
  # and write that string to the PDF file.

  # Start a new PDF page after each reader's content is written
  # to the PDF file.

  for input_filename in input_filenames:
   # Check if name ends in ".csv", ignoring upper/lower case
   if input_filename[-4:].lower() == ".csv":
    reader = CSVReader(input_filename)
   # Check if name ends in ".tdv", ignoring upper/lower case
   elif input_filename[-4:].lower() == ".tdv":
    reader = TDVReader(input_filename)
   else:
    sys.stderr.write("Error: Invalid input file. Exiting\n")
    sys.exit(0)

   hdr_str = "Data from reader: " + \
    reader.get_description()
   self._pw.writeLine(hdr_str)
   self._pw.writeLine('-' * len(hdr_str))

   reader.open()
   try:
    while True:
     row = reader.next_row()
     s = ""
     for item in row:
      s = s + item + " "
     self._pw.writeLine(s)
   except StopIteration:
    # Close this reader, save this PDF page, and 
    # start a new one for next reader.
    reader.close()
    self._pw.savePage()
    #continue

 def close(self):
  self._pw.close()
The main() function that uses the PDFBuilder class to create a composite PDF:
def main():

 # global variables

 # program name for error messages
 global prog_name
 # debug flag - if true, print debug messages, else don't
 global DEBUGGING
 
 # Set the debug flag based on environment variable DEBUG, 
 # if it exists.
 debug_env_var = os.getenv("DEBUG")
 if debug_env_var == "1":
  DEBUGGING = True

 # Save program filename for error messages
 prog_name = sys.argv[0]

 # check for right args
 if len(sys.argv) < 2:
  usage()
  sys.exit(1)

 # Get output PDF filename from the command line.
 pdf_filename = sys.argv[1]

 # Get the input filenames from the command line.
 input_filenames = sys.argv[2:]

 # Create a PDFBuilder instance.
 pdf_builder = PDFBuilder(pdf_filename, "Courier", 10, 
       "Composite PDF", "Composite PDF", 
       input_filenames)

 # Build the PDF using the inputs.
 pdf_builder.build_pdf(input_filenames)

 pdf_builder.close()

 sys.exit(0)
And a batch file, run.bat, calls the program with input filename arguments:
@echo off
python PDFBuilder.py %1 file1.csv file1.tdv file2.csv file2.tdv file1-repeats5.csv
Run the batch file like this:
C:> run composite.pdf
which will create a PDF file, composite.pdf, from the input CSV and TDV files given as command-line arguments.

Enjoy.

- Vasudev Ram - Dancing Bison Enterprises

No comments: