Thom Nichols

Thom

Technology is evolution outside the gene pool

Lightweight Data Types in Python

I've been working on a Python project where system resources are very limited; as a result, I'm concerned about object overhead.  Additionally, the nature of the code requires a lot of data types (e.g. objects that have no methods, just a lot of properties.)  Python's namedtuple class lends itself to this use case, but it's not quite as flexible as I'd like. 

Inspired by this post and the source code for namedtuple, I set out to see how I might improve it.  

The first and most obvious limitation of namedtuple is in its construction.  When creating a new instance of a namedtuple-derived class, you need to specify all arguments as positional parameters.  There's no way to specify defaults without creating the namedtuple class and then immediately subclassing it like so:

class Cat(namedtuple('Cat', 'fur claws tail')):
    def __init__(self, fur, claws='sharp', tail='long'):
        super(Cat,self,).__init__(fur,claws,tail)

So suffice to say, it's not easy to simplify by allowing the constructor to take default arguments.  This example is trivial with only three arguments, but still more cumbersome than it should be.  So I decided to make my own 'record' class  based on the tuple type which allows default parameters and some other convenience methods:

import sys
from copy import deepcopy
from collections import namedtuple

def Record( name, *required, **defaults ):
 '''
 Similar to `collections.namedtuple` except default properties may be
 specified. For example:

 Person = Record('Person', 'name', 'address', age=10)
 bob = Person(name='bob', address='123 someplace')

 Note that all constructor parameters must be keyword args. Positional
 arguments are not supported at this time.
 '''
 # all properties is the union of required and defaulted properties
 all_props = required + tuple(defaults.keys())
 # We're starting off with a 'stock' namedtuple and modifying it slightly
 cls = namedtuple(name,all_props)

 # here we are intercepting the normal __new__ call in order to assign 
 # defaults before all properties are passed to the namedtuple constructor. 
 # if extra properties are given, or any are missing, the old_new will make that 
 # determination and throw the proper exception
 old_new = cls.__new__
 def _new(cls,_ignore,**props):
  args = deepcopy(defaults)
  args.update(props)
  return old_new(cls, **(args))
 cls.__new__ = _new.__get__(cls)
 # alternate method for assigning a new method
# from types import MethodType 
# cls.__new__ = MethodType(_new,cls)

 def _eq(self,other):
  '''Equality should depend on type as well as values'''
  return tuple.__eq__(self,other) and self.__class__ == other.__class__
 cls.__eq__ = _eq

 # TODO override __hash__ - two records of different 'type' that have the same
 # properties will have the same hash (I think). 

 # since this is wrapped, the proper module name is up another level:
 if hasattr(sys, '_getframe'):
  cls.__module__ = sys._getframe(1).f_globals.get('__name__', '__main__')

 return cls

You can see  by the above example the dynamic creation of class and class methods.  It looks a little foreign at first, but it's actually very straightforward and follows standard Python idioms.

Here are some examples of usage:

Person = Record('Person', 'first','last',age=10)
p = Person(first='Bob',last='fish')
print p
p2 = Person(last='fish',first='hi')
print p2
p2 = Person(last='fish',first='Bob')
print "Instances are equal of the same type and values: %s" % (p2 == p)
print "Hash values are equal too: %s" % (hash(p2) == hash(p))
Alien = Record('Alien', 'first','last',age=1200)
impostor = Alien(first='Bob',last='fish',age=10)
print "Alien impostor is not equal even with the same values: %s" % (impostor == p)

from cPickle import dumps, loads
print p == loads(dumps(p))

try:
  badPerson = Person(first='bill',warts=10)
  raise Exception("badPerson should not be created! %s" % badPerson)
except TypeError as e: print "Got expected exception: %s" % e

try:
  missingPerson = Person(first='bill')
  raise Exception("missingPerson should not be created! %s" % missingPerson)
except TypeError as e: print "Got expected exception: %s" % 

Next week I'll write a post about MutableRecord, a class similar in its efficiency to Record, but, well, mutable.

Category: Python