manadart.github.io

Little Big Bug

Introduction

The solution that I currently oversee evolves quite quickly, having to meet the demands of growing scale on a weekly, sometimes daily basis. With it we fetch a lot of data at high frequency from the web. Most of the solution is written in Python.

For a long time, we were using Urllib2, threads and a queue implementation to do raw data acquisition. Recently we modified this concern to do use Requests and Gevent.

The tale that follows reminded me of Stroustrup’s quote on C++, “… it’s harder to shoot yourself in the foot, but when you do, you blow off your whole leg.”

A Simple Fix

After ironing out a few quirks with that showed up with Gevent’s monkey patching, we encountered an issue with SNI. Installing the dependencies as advised got us over the problem, but the sites from which we acquire data are many and varied, and other minor bugs began popping up.

In the end it was decided that our need for speed trumped the need for certificate validation and we elected to pass verify=False to our requests in order to skip it. Unfortunately our logs were then filled with warnings about this for every request, which we deemed to be a nuisance.

A cursory search yielded the solution - just disable the warning. We added this code to the initialiser of the class we use for URL retrieval.

requests.packages.urllib3.disable_warnings()

Everything seemed dandy at this point.

Mediiiiiiic!

We noticed that according to New Relic, CPU usage increased linearly such that over the first six to eight hours after a restart, usage roughly doubled. This was seriously throttling our performance and causing us headaches. Memory was more or less stable.

We tried disabling the Gevent monkey patching, which we can do with a flag, reverting to our old vanilla threading technique, but the issue remained.

After some head scratching (and some process restarting), I began to look closely at all of our most recent changes. This lead me to drill into the call to disable_warnings. This is what it looks like.

def disable_warnings(category=exceptions.HTTPWarning):
    """
    Helper for quickly disabling all urllib3 warnings.
    """
    warnings.simplefilter('ignore', category)

OK, so now simplefilter.

def simplefilter(action, category=Warning, lineno=0, append=0):
    """Insert a simple entry into the list of warnings filters (at the front).

    A simple filter matches all modules and messages.
    'action' -- one of "error", "ignore", "always", "default", "module",
                  or "once"
    'category' -- a class that the warning must be a subclass of
    'lineno' -- an integer line number, 0 matches all warnings
    'append' -- if true, append to the list of filters
    """
    assert action in ("error", "ignore", "always", "default", "module",
                      "once"), "invalid action: %r" % (action,)
    assert isinstance(lineno, int) and lineno >= 0, \
           "lineno must be an int >= 0"
    item = (action, None, category, None, lineno)
    if append:
        filters.append(item)
    else:
        filters.insert(0, item)

Aha, so the filters are maintained in a global list and our operation results in a call to insert. The cost? O(n). Now it turns out the class who’s __init__ method this is called in, is instantiated and torn down on each data retrieval run causing our n to grow over the life of the process - imperceptible in the short term, bad news in the long.

Ensuring a single call to disable the warnings was all it took to fix our issue.

Silly error? Undoubtably, but I can’t overstate what a relief it was to catch it; the effects were far from trivial.