Thursday, April 28, 2016

Exploring sizes of data types in Python

By Vasudev Ram

I was doing some experiments in Python to see how much of various data types could fit into the memory of my machine. Things like creating successively larger lists of integers (ints), to see at what point it ran out of memory.

At one point, I got a MemoryError while trying to create a list of ints that I thought should fit into memory. Sample code:
>>> lis = range(10 ** 9)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
After thinking a bit, I realized that the error was to be expected, since data types in dynamic languages such as Python tend to take more space than they do in static languages such as C, due to metadata, pre-allocation (for some types) and interpreter book-keeping overhead.

And I remembered the sys.getsizeof() function, which shows the number of bytes used by its argument. So I wrote this code to display the types and sizes of some commonly used types in Python:
from __future__ import print_function
import sys

# data_type_sizes_w_list_comp.py
# A program to show the sizes in bytes, of values of various 
# Python data types.`

# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram - https://vasudevram.github.io

#class Foo:
class Foo(object):
    pass

def gen_func():
    yield 1

def setup_data():
    a_bool = bool(0)
    an_int = 0
    a_long = long(0)
    a_float = float(0)
    a_complex = complex(0, 0)
    a_str = ''
    a_tuple = ()
    a_list = []
    a_dict = {}
    a_set = set()
    an_iterator = iter([1, 2, 3])
    a_function = gen_func
    a_generator = gen_func()
    an_instance = Foo()

    data = (a_bool, an_int, a_long, a_float, a_complex,
        a_str, a_tuple, a_list, a_dict, a_set,
        an_iterator, a_function, a_generator, an_instance)
    return data

data = setup_data()

print("\nPython data type sizes:\n")

header = "{} {} {}".format(\
    "Data".center(10), "Type".center(15), "Length".center(10))
print(header)
print('-' * 40)

rows = [ "{} {} {}".format(\
    repr(item).center(10), str(type(item)).center(15), \
    str(sys.getsizeof(item)).center(10)) for item in data[:-4] ]
print('\n'.join(rows))
print('-' * 70)

rows = [ "{} {} {}".format(\
    repr(item).center(10), str(type(item)).center(15), \
    str(sys.getsizeof(item)).center(10)) for item in data[-4:] ]
print('\n'.join(rows))
print('-' * 70)
(I broke out the last 4 objects above into a separate section/table, since the output for them is wider than for the ones above them.)

Although iterators, functions, generators and instances (of classes) are not traditionally considered as data types, I included them as well, since they are all objects (see: almost everything in Python is an object), so they are data in a sense too, at least in the sense that programs can manipulate them. And while one is not likely to create tens of thousands or more of objects of these types (except maybe class instances [1]), it's interesting to have an idea of how much space instances of them take in memory.

[1] As an aside, if you have to create thousands of class instances, the flyweight design pattern might be of help.

Here is the output of running the program with:
$ python data_type_sizes.py

Python data type sizes:
----------------------------------------
   Data          Type        Length  
----------------------------------------
  False     <type 'bool'>      12    
    0        <type 'int'>      12    
    0L      <type 'long'>      12    
   0.0      <type 'float'>     16    
    0j     <type 'complex'>     24    
    ''       <type 'str'>      21    
    ()      <type 'tuple'>     28    
    []      <type 'list'>      36    
    {}      <type 'dict'>     140    
 set([])     <type 'set'>     116    
----------------------------------------------------------------------

----------------------------------------------------------------------
<listiterator object at 0x021F0FF0> <type 'listiterator'>     32    
<function gen_func at 0x021EBF30> <type 'function'>     60    
<generator object gen_func at 0x021F6C60> <type 'generator'>     40    
<__main__.Foo object at 0x022E6290> <class '__main__.Foo'>     32
----------------------------------------------------------------------

[ When I used the old-style Python class definition for Foo (see the comment near the class keyword in the code), the output for an_instance was this instead:
<__main__.Foo instance at 0x021F6C88> <type 'instance'> 36
So old-style class instances actually take 36 bytes vs. new-style ones taking 32.
]

We can draw a few deductions from the above output.

- bool is a subset of the int type, so takes the same space - 12 bytes.
- float takes a bit more space than long.
- complex takes even more.
- strings and the data types below it in the first table above, have a fair amount of overhead.

Finally, I first wrote the program with two for loops, then changed (and slightly shortened) it by using the two list comprehensions that you see above - hence the file name data_type_sizes_w_list_comp.py :)

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new courses and products.

My Python posts     Subscribe to my blog by email

My ActiveState recipes

Sunday, April 24, 2016

Blistering Barnacles! [1] A lobste.rs-like site for bootstrappers

By Vasudev Ram

[1] That was a reference to Captain Haddock from the Tintin comics, for those who didn't know.

http://barnacl.es is a new discussion site for bootstrappers. It uses the http://lobste.rs codebase (which is open source, and in Rails) with some modifications. I signed up for the site, after seeing it mentioned on lobste.rs. I've been on lobste.rs for a while, but had not used the site much. Checking it out more often nowadays. Likely will do the same with barnacl.es, since the signal-to-noise ratio on both are high (is so on lobste.rs, likely to be so on barnacl.es, to be precise [2] - as Thomson and Thompson might say).

Both lobste.rs and barnacl.es have some differences from sites like HN. Instead of describing those, I'll let readers check them out - it'll be more fun and complete that way :)

[2] From Tintin comics again.

Here are a couple of posts from barnacl.es, with some comments:

Ask BN: Which chat widget provider do you use/recommend for online visitors?

List of tools for bootstrappers

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new courses and products.

My Python posts     Subscribe to my blog by email

My ActiveState recipes

Wednesday, April 20, 2016

Make the witch speak!

By Vasudev Ram



This was fun and popular, so I'm blogging it again (I rarely do this, first time in fact):

Make the TTS witch speak:

1. Here is How.

2. Tremble with fear and awe.

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new products and services.

My Python posts     Subscribe to my blog by email

My ActiveState recipes


Sunday, April 17, 2016

A simple joke bot in Python

By Vasudev Ram


Image attribution: Vasudev Ram

Following a chain of thoughts, including about bots and jokes, I had the idea of writing a simple joke bot in Python. Here it is, in file joke_bot.py:
from __future__ import print_function
import sys
import os
from random import randint

'''
A joke bot in Python.
v1.0.
Author: Vasudev Ram
Copyright 2016 Vasudev Ram - http://jugad2.blogspot.com 
'''

jokes = [
    '''
    Q: Why did the chicken cross the road?
    A: To get to the other side.
    ''',
    '''
    Q: What is black, white and red all over?
    A: A newspaper.
    ''',
    '''
    Q: What time is it when an elephant sits on your fence?
    A: Time to build a new fence.
    ''',
    '''
    Q: How many elephants will fit into a Mini?
    A: Four: Two in the front, two in the back.
    Q: How many giraffes will fit into a Mini?
    A: None. It's full of elephants.
    ''',
    '''
    Q: What do elephants have that nothing else has?
    A: Baby elephants.
    ''',
    '''
    Knock Knock.
    Who's there?
    Apple.
    Apple Who?
    Apple-y ever after.
    ''',
    '''
    Knock Knock.
    Who's there?
    Amos.
    Amos Who?
    A mosquite bit me.
    Knock Knock.
    Who's there?
    Andy.
    Andy Who?
    Andy's still biting me!
    ''',
    '''
    Knock Knock.
    Who's there?
    Orange.
    Orange Who?
    Orange you going to the party?
    ''',
]

lj = len(jokes)

def tell_a_joke():
    i = randint(0, lj - 1)
    print("Here is a joke for you:")
    print(jokes[i])

def clear_screen():
    # For Windows.
    os.system('cls')
    # For Unix/Linux.
    #os.system('clear')
    # Add clear screen support for 
    # other OS's here if needed.

def main():
    clear_screen()
    print('\nPython Joke Bot v1.0 activated.\n')
    ans = ''
    while ans != 'n':
        tell_a_joke()
        ans = raw_input('Tell another one? [YyNn]: ')
        ans = ans.strip().lower()
        clear_screen()
    print('See you next Funday.')

main()
Run it with:
python joke_bot.py
Sample output:
Python Joke Bot v1.0 activated.

Here is a joke for you:

    Knock Knock.
    Who's there?
    Apple.
    Apple Who?
    Apple-y ever after.

Tell another one? [YyNn]: y

Here is a joke for you:

    Q: How many elephants will fit into a Mini?
    A: Four: Two in the front, two in the back.
    Q: How many giraffes will fit into a Mini?
    A: None. It's full of elephants.

Tell another one? [YyNn]:

Here is a joke for you:

    Knock Knock.
    Who's there?
    Amos.
    Amos Who?
    A mosquite bit me.
    Knock Knock.
    Who's there?
    Andy.
    Andy Who?
    Andy's still biting me!
Here is a joke for you:

    Knock Knock.
    Who's there?
    Amos.
    Amos Who?
    A mosquite bit me.
    Knock Knock.
    Who's there?
    Andy.
    Andy Who?
    Andy's still biting me!

Tell another one? [YyNn]:

Here is a joke for you:

    Q: What time is it when an elephant sits on your fence?
    A: Time to build a new fence.

Tell another one? [YyNn]:

Here is a joke for you:

    Knock Knock.
    Who's there?
    Apple.
    Apple Who?
    Apple-y ever after.

Tell another one? [YyNn]: n

See you next Funday.

A point about the program: it's obviously quite simple. I was almost not going to post about it because of that, but then realized that this, as well as some other small programs I've written in the past and plan to write in the future, though small now, can still illustrate a few points about programming (at least for beginners, including me, a perpetual beginner :).

And, more importantly, it can be built upon incrementally over time, in multiple versions, to illustrate various other programming language and library features. E.g. I can modify/enhance this program to read the jokes from a flat or structured file (such as JSON or XML), a key-value store like BSD DB (supported by the Python stdlib), SQLite (ditto), etc.

See you next Funday :)

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new products and services.

My Python posts     Subscribe to my blog by email

My ActiveState recipes


Wednesday, April 13, 2016

A quick console ruler in Python

By Vasudev Ram

I've done this ruler program a few times before, in various languages.

Here is an earlier version: Rule the command-line with ruler.py!

This one is a simplified and also slightly enhanced version of the one above.

It generates a simple text-based ruler on the console.

Can be useful for data processing tasks related to fixed-length or variable-length records, CSV files, etc.

With REPS set to 8, it works just right for a console of 80 columns.

Here is the code:
# ruler.py
"""
Program to display a ruler on the console.
Author: Vasudev Ram
Copyright 2016 Vasudev Ram - http://jugad2.blogspot.com
0123456789, concatenated.
Purpose: By running this program, you can use its output as a ruler,
to find the position of your own program's output on the line, or to 
find the positions and lengths of fields in fixed- or variable-length 
records in a text file, fields in CSV files, etc.
"""

REPS = 8

def ruler(sep=' ', reps=REPS):
    for i in range(reps):
        print str(i) + ' ' * 4 + sep + ' ' * 3,
    print '0123456789' * reps

def main():

    # Without divider.
    ruler()

    # With various dividers.
    for sep in '|+!':
        ruler(sep)

if __name__ == '__main__':
    main()
And the output:
$ python ruler.py
0         1         2         3         4         5         6         7         
01234567890123456789012345678901234567890123456789012345678901234567890123456789
0    |    1    |    2    |    3    |    4    |    5    |    6    |    7    |    
01234567890123456789012345678901234567890123456789012345678901234567890123456789
0    +    1    +    2    +    3    +    4    +    5    +    6    +    7    +    
01234567890123456789012345678901234567890123456789012345678901234567890123456789
0    !    1    !    2    !    3    !    4    !    5    !    6    !    7    !    
01234567890123456789012345678901234567890123456789012345678901234567890123456789
You can also import it as a module in your own program:
# test_ruler.py
from ruler import ruler
ruler()
# Code that outputs the data you want to measure 
# lengths or positions of, goes here ...
print 'NAME      AGE  CITY'
ruler()
# ... or here.
print 'SOME ONE   20  LON '
print 'ANOTHER    30  NYC '
$ python test_ruler.py
Output:
0         1         2         3         4         5         6         7         
01234567890123456789012345678901234567890123456789012345678901234567890123456789
NAME      AGE  CITY
0         1         2         3         4         5         6         7         
01234567890123456789012345678901234567890123456789012345678901234567890123456789
SOME ONE   20  LON 
ANOTHER    30  NYC 

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new products and services.

My Python posts     Subscribe to my blog by email

My ActiveState recipes