Thursday, January 23, 2014

Publish Berkeley DB data to PDF with xtopdf

By Vasudev Ram


Berkeley DB (sometimes called BDB or BSD DB) is an embedded (*) key-value database with a long history and a huge user base. It is quite fast and supports very large data sizes. Berkeley DB was developed by Sleepycat Software which was acquired by Oracle some years ago.

(*) "embedded", in the sense of, not client-server, it is a database library that gets linked with your application; not "embedded" in the sense of software embedded in hardware devices, although Berkeley DB can also be embedded in the second sense, since it is small in size.

Excerpt from the Wikipedia article about Berkeley DB linked above:

[ Berkeley DB (BDB) is a software library that provides a high-performance embedded database for key/value data. Berkeley DB is written in C with API bindings for C++, C#, PHP, Java, Perl, Python, Ruby, Tcl, Smalltalk, and many other programming languages. BDB stores arbitrary key/data pairs as byte arrays, and supports multiple data items for a single key. Berkeley DB is not a relational database.[1]
BDB can support thousands of simultaneous threads of control or concurrent processes manipulating databases as large as 256 terabytes,[2] on a wide variety of operating systems including most Unix-like and Windows systems, and real-time operating systems. ]

[ Incidentally, Mike Olson, the former CEO of Sleepycat Software, is now the Chief Strategy Officer of Cloudera, which I blogged about here today:

Cloudera's Impala engine - SQL querying of Hadoop data. ]


I've used Berkeley DB off and on, from before Sleepycat Software was acquired by Oracle, and including via at least C, Python and Ruby.

Today I thought of writing a program that enables a user to publish the data in a Berkeley DB database to PDF, using my xtopdf toolkit for PDF creation. Here is the program, BSDDBToPDF.py:

# BSDDBToPDF.py

# Program to convert Berkeley DB (BSD DB) data to PDF.
# Uses Python's bsdd library (deprecated in Python 3),
# and xtopdf.
# Author: Vasudev Ram - http://www.dancingbison.com

import sys
import bsddb
from PDFWriter import PDFWriter

try:
    # Flag 'c' opens the DB read/write and doesn't delete it if it exists.
    fruits_db = bsddb.btopen('fruits.db', 'c')
    fruits = [
            ('apple', 'The apple is a red fruit.'),
            ('banana', 'The banana is a yellow fruit.'),
            ('cherry', 'The cherry is a red fruit.'),
            ('durian', 'The durian is a yellow fruit.')
            ]
    # Add the key/value fruit records to the DB.
    for fruit in fruits:
        fruits_db[fruit[0]] = fruit[1]
    fruits_db.close()

    # Read the key/value fruit records from the DB and write them to PDF.
    with PDFWriter("fruits.pdf") as pw:
        pw.setFont("Courier", 12)
        pw.setHeader("BSDDBToPDF demo: fruits.db to fruits.pdf")
        pw.setFooter("Generated by xtopdf")
        fruits_db = bsddb.btopen('fruits.db', 'c')
        print "FRUITS"
        print
        pw.writeLine("FRUITS")
        pw.writeLine(" ")
        for key in fruits_db.keys():
            print key
            print fruits_db[key]
            print
            pw.writeLine(key)
            pw.writeLine(fruits_db[key])
            pw.writeLine(" ")
        fruits_db.close()

except Exception, e:
    sys.stderr.write("ERROR: Caught exception: " + repr(e) + "\n")
    sys.exit(1)


And here is a screenshot of the PDF output of the program:


- Vasudev Ram - Dancing Bison Enterprises


O'Reilly 50% Ebook Deal of the Day


Contact Page

No comments: