Dynamically generated benchmarks with vbench

To construct a vbench benchmark you need a setup string and a code string. The constructor’s signature is:

Benchmark(self, code, setup, ncalls=None, repeat=3, cleanup=None, name=None, description=None, start_date=None, logy=False).

Why generate benchmarks dynamically?

For most scikit-learn purposes, the code string will be very close to "algorithm.fit(X, y)", "algorithm.transform(X)" or "algorithm.predict(X)". We can generate a lot of benchmarks by changing what the algorithm is, and changing what the data is or the way it is generated.

A possible idea would be to create a DSL in which to specify scikit-learn tests and create benchmarks from them. However, before engineering such a solution, I wanted to test out how to generate three related benchmarks using different arguments for the dataset generation function.

This is what I came up with:

[sourcecode language=”python”]
from vbench.benchmark import Benchmark

_setup = “”“
from deps import *

kwargs = %s
X, y = make_regression(random_state=0, **kwargs)
lr = LinearRegression()
 “”“

_configurations = [
(‘linear_regression_many_samples’,
{‘n_samples’: 10000, ‘n_features’: 100}),
(‘linear_regression_many_features’,
{‘n_samples’: 100, ‘n_features’: 10000}),
(‘linear_regression_many_targets’,
{‘n_samples’: 1000, ‘n_features’: 100, ‘n_targets’: 100})
 ]

_statement = “lr.fit(X, y)”

_globs = globals()
_globs.update({name: Benchmark(_statement, _setup % str(kwargs), name=name)
for name, kwargs in _configurations})

[/sourcecode]

It works perfectly, but I don’t like having to hack the globals to make the benchmarks detectable. This is because of the way the vbench suite gathers benchmarks. In __init__.py we have to do from linear_regression import *. With a small update to the detection method, we could replace the hacky part with a public lists of Benchmark objects.

Exposed issues

While working on this, after my first attempt, I was surprised to see that there were no results added to the database, and output plots were empty. It turns out that the generated benchmarks weren’t running, even though if I copied and pasted their source code from the generated html, it would run. Vbench was not issuing any sort of message to let me know that anything was wrong.

So what was the problem? My fault, of course, whitespace. But in all fairness, we should add better feedback.

This is what I was doing to generate the setup string:

[sourcecode lang=”python”]
def _make_setup(kwargs):
return “”“
from deps import *

kwargs = %s
X, y = make_regression(random_state=0, **kwargs)
lr = LinearRegression()
“”” % str(kwargs)
 [/sourcecode]

It’s clear as daylight now that I overzealously indented the multiline string. But man, was it hard to debug! Also, in this example, the bug led to a refactoring that made the whole thing nicer and more direct. Hopefully, my experience with vbench will lead to some improvements to this cool and highly useful piece of software.

Comments !