What happens when a class is created?

This post will dig into what happens when a new class is created, at a bytecode level. This isn’t too interesting on its own, but one of my other posts that’s in the works has ended up being far too long, and so I’m breaking this out as a chunk that I can refer to.

As you may already know, class definitions in Python are executed like any other code. This means that you can execute arbitrary code in your definition if you so wish:

class Dog:
    print("In the process of creating class")
 
    def bark(self):
        print("woof")

This prints the message "In the process of creating class" once and only once, when the module that defines the class is first imported.

You can also conditionally define a class:

import random
 
if random.choice([True, False]):
    class Dog:
        def bark(self):
            print("woof")

If you do this, then 50% of the time it will give you a working class, and 50% of the time you’ll get a NameError if you try to use the class. This is stupid, but maybe you can usefully use this kind of thing for OS-dependent classes that shouldn’t be available if the underlying OS doesn’t support them.

So what’s actually going on in the bytecode? Let’s disassemble it and find out. I’m going to assume that you already know the basics of how Python bytecode works. If not, you can see my article on it here.

First of all, we compile some code:

source = """
class Dog:
    def bark(self):
        print("woof")
"""
 
code = compile(source, '<string>', 'exec')

We can then disassemble it to find out what’s going on. If you’re using Python 3.6, you’ll get something like this:

import dis
 
dis.dis(code)

will print to STDOUT:

 2           0 LOAD_BUILD_CLASS
             2 LOAD_CONST                0 (<code object Dog at 0x7f42d20b6c00, file "<string>", line 2>)
              4 LOAD_CONST               1 ('Dog')
              6 MAKE_FUNCTION            0
              8 LOAD_CONST               1 ('Dog')
             10 CALL_FUNCTION            2
             12 STORE_NAME               0 (Dog)
             14 LOAD_CONST               2 (None)
             16 RETURN_VALUE

The basic form of this is reasonably familiar, but what’s the MAKE_FUNCTION talking about? This is code that makes a class, and yet the bytecode is making a function. And where’s the function bark? This actually is a function, but it’s nowhere in sight when MAKE_FUNCTION is being invoked.

Let’s break it down. We can see this as:

 2           0 LOAD_BUILD_CLASS

... some other stuff ...

             10 CALL_FUNCTION            2
             12 STORE_NAME               0 (Dog)
             14 LOAD_CONST               2 (None)
             16 RETURN_VALUE

What this does is load a special built-in function called __build_class__, set up some arguments (which we skip over for now), call the function with two arguments on the stack, assign the result to the name Dog and then return None. So the interesting things to consider are:

  • What goes on inside __build_class__?
  • What are the arguments that we pass to it?

What does __build_class__ do?

__build_class__ isn’t documented anywhere, but it’s easy enough to find in the cpython source code. I won’t step through it line by line, but you can find it at Python/bltinmodule.c if you want to dig in to the details.

The __build_class__ function takes at least two arguments (a function plus a name string), with optional arguments after that for base classes. Let’s ignore base classes for now.

The interesting part of the __build_class__ function is this:

    cell = PyEval_EvalCodeEx(PyFunction_GET_CODE(func), PyFunction_GET_GLOBALS(func), ns,
                             NULL, 0, NULL, 0, NULL, 0, NULL,
                             PyFunction_GET_CLOSURE(func));

Here func is the function that was passed in to __build_class__, which is the mystery function we haven’t explained yet. The only other variable is ns, which is an empty Python dict.

This call evaluates some code. Specifically, it executes the code of the function func, in the context of the globals that func has access to, and using ns as the local namespace. The return value gets mostly ignored. If this function does anything at all, the interesting thing is in its side-effects on the dict ns.

Hint: side-effects on ns are very important to this.

After we’ve evaluated this mystery function, the ns dict is passed to the class’s metaclass. Metaclasses in Python get a bit confusing, so for now let’s ignore this detail and assume we’re using the default metaclass, which is type(). Therefore what we’re doing is calling:

   type("Dog", base_classes, ns)

You can think of this as a class instantiation: The class is type, and the instance we end up with is the Dog class. Dog is both a class and an instance: it’s a class for future instances like rex, rover and lassie, but is itself an instance of type.

What are the arguments we pass in to __build_class__?

We’ve figured out that __build_class__ takes a mystery function, evaluates it for side effects, then creates an instance of type using the resultant namespace. But what is the mystery function?

Let’s look again at that disassembly:

 2           0 LOAD_BUILD_CLASS
             2 LOAD_CONST                0 (<code object Dog at 0x7f42d20b6c00, file "<string>", line 2>)
              4 LOAD_CONST               1 ('Dog')
              6 MAKE_FUNCTION            0
              8 LOAD_CONST               1 ('Dog')
             10 CALL_FUNCTION            2
             12 STORE_NAME               0 (Dog)
             14 LOAD_CONST               2 (None)
             16 RETURN_VALUE

Specifically, we’ll look at the bit we skipped over before:

             2 LOAD_CONST                0 (<code object Dog at 0x7f42d20b6c00, file "<string>", line 2>)
              4 LOAD_CONST               1 ('Dog')
              6 MAKE_FUNCTION            0

When MAKE_FUNCTION is called with an argument of 0, it’s the simplest case: it takes only two arguments. The two arguments are a code object and a name. So if we want to know about the function we’re creating (and ultimately calling inside of __build_class__) we need to look inside this code object ot see what it’s doing.

The code object is loaded with LOAD_CONST 0, which means that we can find it in the tuple of constants associated with this code block:

dis.dis(code.co_consts[0])

gives:

  2           0 LOAD_NAME                0 (__name__)
              2 STORE_NAME               1 (__module__)
              4 LOAD_CONST               0 ('Dog')
              6 STORE_NAME               2 (__qualname__)

  3           8 LOAD_CONST               1 (<code object bark at 0x7fb6c7b77930, file "<string>", line 3>)
             10 LOAD_CONST               2 ('Dog.bark')
             12 MAKE_FUNCTION            0
             14 STORE_NAME               3 (bark)
             16 LOAD_CONST               3 (None)
             18 RETURN_VALUE

Now we’re getting somewhere. Suddenly this looks a bit more like the inside of a class definition. We’re loading up our method object and giving it the name "bark". The actual code for the method isn’t visible here, it’s stored in a constant nested inside the code object:

dis.dis(code.co_consts[0].co_consts[1])

gives:

  4           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('woof')
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

You should recognise this bit as the innards of this method:

    def bark(self):
        print("woof")

So what are we actually saying here?

It’s got a bit confusing. Functions within functions within functions.

I think the key is to think of this class definition:

class Dog:
    def bark(self):
        print("woof")

as actually being a function dressed up:

def _Dog_namespace_creator():
    def bark(self):
        print("woof")

Creating the class works something like this:

  • Compile a function that creates the Dog namespace (I’m calling this _Dog_namespace_creator for clarity, though it isn’t really called that).
  • Execute this function, and keep hold of the resultant namespace. Remember that a namespace is just a dictionary. In our case, the namespace after executing this function contains one member, a function called bark.
  • Create an instance of type (or some other metaclass) using this namespace. The class will therefore have a method called bark in its namespace.

This is all done once, when the class is defined. None of this stuff needs to happen again when the class is instantiated.

Hmm. What’s the really short version?

The body of a class definition is a lot more like a function than you think. It actually is a function, one that is executed when the class is defined and builds the members of the class.

Why doesn’t import * work at function scope?

Python is a regular language, which means that function definitions, class definitions, import statements etc. mostly work at any scope. But there’s an exception for “from ... import *“, which can’t be used inside a function. The reason why turns out to reveal something interesting about the internals of Python.

Python is a pretty regular language; it follows patterns. Most of the time if you can do something in a function, you can do it in global scope or in a class definition or vice versa. There’s an exception for from ... import *, which works at global scope but not inside a function. The reason why turns out to tell us something interesting about Python internals.

Continue reading “Why doesn’t import * work at function scope?”

Dictionaries preserve ordering

I came across something unexpected when preparing my post about the basics of dictionaries in Python. For as long as I’ve been using Python, it’s been a fundamental truth that dictionaries don’t have a reliable ordering. It’s mentioned all over StackOverflow. The standard library even has an OrderedDict container for cases where you want your dictionary to have order.

Turns out it isn’t so.

I was trying to create an example that shows the fundamental way that dictionaries return elements in an order that has nothing (apparently) to do with the keys or the order in which you inserted them. Something like this:

fruits = {}
fruits["apple"] = 100
fruits["banana"] = 200
fruits["cherry"] = 300

I printed the dictionary. The fruits came back in alphabetical order. Dammit.

I guess I had bad luck and the keys just happened to be in the naive order. It’s a 1 in 6 chance, right?

fruits["damson"] = 400

Still in insertion order. Maybe something is up.

This actually isn’t anything mysterious: it’s right there in the documentation:

Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was implementation detail of CPython from 3.6.

I’m on Ubuntu, so I’m getting Python 3.6 as my default Python 3. Mystery solved.

I don’t know about anyone else, but I was really surprised to see this change. I was assuming that making things more user-friendly like this would only come at the cost of making things slower, and there’s no way that Python would make things slower across the board.

Why wasn’t it always this way?

To recap some of the basics: Dictionaries don’t preserve ordering because they’re implemented using hash tables. A hash table is a data structure that gives very fast lookup regardless of the number of items in the container. It works like this:

Start with a so-called “hash function” that maps strings (or whatever key value you’re using—I’ll keep it simple here) into integers. To insert a value, apply the function to the key and then store the value in an array at that numbered position. Retrieving it again works the same way. The hash function is quick, and fetching an item from a specified position in an array is trivial. The great thing about it is that it’s equally fast no matter how many other elements are stored in the hash table (I’m skipping over a few details here, to keep things simple).

To be concrete, suppose:

We can store these entries in an 8-element array like this:

We can now easily look up any of the values from its key: hash the key, look up the corresponding position in the array, check that the key matches what we expect and then return the value.

But note that there’s no way to produce the keys in alphabetical order, without looking up all the keys and then sorting them (which is a costly operation). Nor can we get the keys in insertion order, because nothing about the structure shows us the order they were inserted; the image above would look the same whatever order you inserted the three values.

So why did it change?

One of the downsides of a hash function can be seen in the diagram above: generally there’s a bunch of empty space in it. This empty space isn’t zero cost: this is an array, meaning that it’s all one block of memory and the empty spaces are blank memory just taking up space. There are two ways to limit this overhead: Pick a clever hash function that maps the keys as densely as possible into as small a space as possible, and limit the size of each element in the array. In Python 3.6, a change was made to achieve the latter.

Looking a little deeper at how Python actually uses hash tables: Firstly, the has function isn’t as simple as I suggested above. The hash function produces a 64-bit integer, which is then used to identify an entry in the array (by using 3 bits of the integer, if we’ve got an 8-entry array). This is done because if enough data is inserted into the hash table it will have to be resized, and it’s convenient if we use the same hash values in the new bigger table. For the same reason, the hash values are stored in the array and not discarded as I suggested above. This means that our hash table ends up looking like this:

There are 3 items in each array entry, so our wasted space is actually 24 bytes for each empty array entry. Can we improve on this?

Yes we can. Instead of the array holding the hash value, key and value make the array store an index into another array:

It might look like we’ve just moved the data around, but note that this new array is dense: there is no wasted space in it between elements. Each newly inserted element gets added on to the end of this array. Now we only have 8 bytes of space used per empty entry in the hash table, rather than 24. On the other hand, we’ve added 8 bytes of space per filled entry in the hash table. But this is probably a win overall for size.

We still have all the desirable hash table properties, and the only cost is that we have to follow another pointer lookup to find the value we want. But on a modern CPU following pointers is cheap provided they are stored in cache, and data is much more likely to be stored in cache if it is small. So making the data structure smaller at the cost of another pointer lookup is likely to make things faster as well as smaller.

Wait, what does this have to do with preserving order?

Usually when we make code faster we end up making it less useful, or at least less flexible. But in this case something nice falls out for free. Note that the new array the data is stored in will automatically be in insertion order. If we want to iterate over the elements of the dict, we don’t need to look at the first table, we can just walk over the insertion-order table and produce the keys in insertion order.

For more information see this thread from python-dev way back in 2012 when the change was first proposed.

Does this mean that collections.OrderedDict is now obsolete?

Good question. Actually, no. The thing with an OrderedDict is that ordering is assumed to have meaning over and above the values stored. The most obvious way this comes into play is in comparison between dictionaries: if you compare two OrderedDict instances for equality, they only compare equal if the keys are the same and the insertion order is the same. For regular dict we don’t want this to be the case: two dicts with different insertion order but the same keys and values are the same dict. Even if we did want to change this we couldn’t, because it would break existing code.

It might be that dict and OrderedDict can share all their implementation internals and just have a flag set for “ordering is significant for equality test” or whatever. I’ll admit I haven’t looked into this.

Can I rely on this behaviour?

There’s a big difference between behaviour that happens on a certain version of Python and behaviour you can rely on in your code. As the docs quoted above say, as of Python 3.7 this is an official specification and not just an implementation detail. For more information about why this is specified, see this post from Guido.