Pattern API

Most bytecode transformations are best expressed by identifying a pattern in the bytecode and emitting some replacement. codetransformer makes it easy to express and work on these patterns by defining a small dsl for use in CodeTransformer classes.

Matchables

A pattern is expressed by a sequence of matchables paired with the startcode. A matchable is anything that we can compare a sequence of bytecode to.

Instructions

The most atomic matchable is any Instruction class. These classes each can be used to define a pattern that matches instances of that instruction. For example, the pattern:

LOAD_CONST

will match a single LOAD_CONST instance.

All matchables support the following operations:

or

Matchables can be or’d together to create a new matchable that matches either the lhs or the rhs. For example:

LOAD_CONST | LOAD_FAST

will match a either a single LOAD_CONST or a LOAD_FAST.

not

Matchables may be negated to create a new matchable that matches anything the original did not match. For example:

~LOAD_CONST

will match any instruction except an instance of LOAD_CONST.

matchrange

It is possible to create a matchable from another such that it matches the same pattern repeated multiple times. For example:

LOAD_CONST[3]

will match exactly three LOAD_CONST instances in a row. This will not match on any less than three and will match on the first three if there are more than three LOAD_CONST instructions in a row.

This can be specified with an upper bound also like:

LOAD_CONST[3, 5]

This matches between three and five LOAD_CONST instructions. This is greedy meaning that if four or five LOAD_CONST instructions exist it will consume as many as possible up to five.

var

var is a modifier that matches zero or more instances of another matchable. For example:

LOAD_CONST[var]

will match as many LOAD_CONST instructions appear in a row or an empty instruction set.

plus

plus is a modifier that matches one or more instances of another matchable. For example:

LOAD_CONST[plus]

will match as many LOAD_CONST instructions appear in a row as long as there is at least one.

option

option is a modifier that matches zero or one instance of another matchable. For example:

LOAD_CONST[option]

will match either an empty instruction set or exactly one LOAD_CONST.

matchany

matchany is a special matchable that matches any single instruction. ... is an alias for matchany.

seq

seq is a matchable that matches a sequence of other matchables. For example:

seq(LOAD_CONST, ..., ~LOAD_CONST)

will match a single LOAD_CONST followed by any instruction followed by any instruction that is not a LOAD_CONST. This example show how we can compose all of our matchable together to build more complex matchables.

pattern

In order to use our DSL we need a way to register transformations to these matchables. To do this we may decorate methods of a CodeTransformer with pattern. This registers the function to the pattern. For example:

class MyTransformer(CodeTransformer):
    @pattern(LOAD_CONST, ..., ~LOAD_CONST)
    def _f(self, load_const, any, not_load_const):
        ...

The argument list of a pattern is implicitly made into a seq. When using MyTransformer to transform some bytecode _f will be called only when we see a LOAD_CONST followed by any instruction followed by any instruction that is not a LOAD_CONST. This function will be passed these three instruction objects positionally and should yield the instructions to replace them with.

Resolution Order

Patterns are checked in the order they are defined in the class body. This is because some patterns may overlap with eachother. For example, given the two classes:

class OrderOne(CodeTransformer):
    @pattern(LOAD_CONST)
    def _load_const(self, instr):
        print('LOAD_CONST')
        yield instr

    @pattern(...)
    def _any(self, instr):
        print('...')
        yield instr


class OrderTwo(CodeTransformer):
    @pattern(...)
    def _any(self, instr):
        print('...')
        yield instr

    @pattern(LOAD_CONST)
    def _load_const(self, instr):
        print('LOAD_CONST')
        yield instr

and the following bytecode sequence:

LOAD_CONST POP_TOP LOAD_CONST RETURN_VALUE

When running with OrderOne we would see:

LOAD_CONST
...
LOAD_CONST
...

but when running with OrderTwo:

...
...
...
...

This is because we will always match on the ... pattern where OrderOne will check against LOAD_CONST before falling back to the matchany.

Contextual Patterns

Sometimes a pattern should only be matched given that some condition has been met. An example of this is that you want to modify comprehensions. In order to be sure that you are only modifying the bodies of the comprehensions we must only match when we know we are in one. pattern accepts a keyword only argument startcodes which is a set of contexts where this pattern should apply. By default this is DEFAULT_STARTCODE which is the default state. A startcode may be anything hashable; however it is best to use strings or integer constants to make it easy to debug.

The begin() method enters a new startcode. For example:

class FindDictComprehensions(CodeTransformer):
    @pattern(BUILD_MAP, matchany[var], MAP_ADD)
    def _start_comprehension(self, *instrs):
        print('starting dict comprehension')
        self.begin('in_comprehension')
        yield from instrs

    @pattern(RETURN_VALUE, startcodes=('in_comprehension',))
    def _return_from_comprehension(self, instr):
        print('returning from comprehension')
        yield instr

    @pattern(RETURN_VALUE)
    def _return_default(self, instr):
        print('returning from non-comprehension')
        yield instr

This transformer will find dictionary comprehensions and enter a new startcode. Inside this startcode we will handle RETURN_VALUE instructions differently.

>>> @FindDictComprehensions()
... def f():
...     pass
...
returning from non-comprehension

>>> @FindDictComprehensions()
... def g():
...     {a: b for a, b in it}
...
starting dict comprehension
returning from comprehension
returning from non-comprehension

It is important to remember that when we recurse into a nested code object (like a comprehension) that we do not inherit the startcode from our parent. Instead it always starts at DEFAULT_STARTCODE.