Devouring an API

Back in June one of the news providers we use at Puls Biznesu was changing their distribution channel to a new and shiny REST API. We decided to take advantage of this opportunity to the fullest and create a new single-page app aggregating news channels for our editors. It consisted of the following layers:

  • front-end app using React.js and Flux,
  • back-end API using Flask-restful, running on nginx+uwsgi - pretty standard stuff,
  • a bunch of celery tasks gathering data from news providers.

To make it all nice and cozy we created two generic tools abstracting out common parts in the process (and we're using them in production since June without any problems):

  • custom object-NoSQL mapper (later named Basilisk),
  • an easily customisable API client (later named Devourer).

Recently I got green light to publish both of them with MIT license, so they're available on github - just follow the links above.

That finally gets me to the point of this article - presenting the thought process and usage cases of Devourer, our generic API client for Python 2.7 and 3.3+

So what is this generic API client all about?

Basically we have several news channels accessible through various APIs, most of them http-based (think REST API, only not necessarily using JSON, but XML or HTML). We didn't really want to integrate with every single one of them separately; on the contrary, we wanted to have as little logic in there as possible. So we reached the first conclusion:

class GenericAPI(object):  
    pass

Okay, with that major hurdle out of the way, what's next? After all, it's not only REST calls with JSON responses.

class GenericAPI(object):  
    def invoke(self, method, resource, params):
        return getattr(self.provider, method)(resource, auth=self.auth, params=params)

We went with the above as the connecting point between our GenericAPI object and API provider. It was able to cover most of our use cases, and was easily overridable in inheriting classes for anything that wasn't compliant. In opensourced version, using REST API by default, last snippet got changed to:

def invoke(self, http_method, url, params):  
    """
    This method makes a request to given API address concatenating the method
    path and passing along authentication data.
    :param http_method: http method to be used for this call.
    :param url: exact address to be concatenated to API address.
    :returns: response object as in requests.
    """
    return getattr(requests, http_method)(self.url + url, auth=self.auth, params=params)

As you can see, it's exactly the same minus name changes, changing self.provider to requests directly and prepending base url to call url.

Okay, that's cool, some programming philosophy questions are out of the way now. Where do we proceed now? Well, since we like declarative syntax, we went just there. Since metadata of every method used in an API had to be kept somewhere, it may as well be well-defined class, not a dict, hence:

class APIMethod(object):  
    pass

And obviously Python's favourite love/hate love child, enter metaclass:

class GenericAPICreator(type):  
    def __new__(mcs, name, bases, attrs):
        methods = {}
        if bases != (object, ):
            attrs['_methods'] = {}
            for key, item in attrs.items():
                if isinstance(item, APIMethod):
                    attrs['_methods'][key] = item
                    item.name = key
                    methods['prepare_{}'.format(key)] = attrs['prepare'] if \
                        'prepare' in attrs else GenericAPI.prepare
                    methods['{}'.format(key)] = attrs['call_{}'.format(key)] if \
                        'call_{}'.format(key) in attrs else GenericAPI.outer_call(key)
                    methods['finalize_{}'.format(key)] = attrs['finalize'] if \
                        'finalize' in attrs else GenericAPI.finalize
            for key in attrs['_methods']:
                del attrs[key]
                if 'call_{}'.format(key) in attrs:
                    del attrs['call_{}'.format(key)]
            methods.update(attrs)
            model = super(GenericAPICreator, mcs).__new__(mcs, name, bases, methods)
        else:
            model = super(GenericAPICreator, mcs).__new__(mcs, name, bases, attrs)
        return model

We don't really want to have our class properties to be just regular instances of APIMethod - we'd have no control over how they work in context of the whole API then. That's why we just put them to _methods property and replace them with a classmethod named outer_call, which is supposed to call proper (that is, the metaclass makes sure to use any customized methods the subclass may have provided) prepare_ and finalize_ methods, and in between that make the call to actual APIMethod object:

class GenericAPI(object):

    @classmethod
    def outer_call(cls, name):
        return lambda obj, *args, **kwargs: obj.call(name, *args, **kwargs)

    def call(self, name, *args, **kwargs):
        prepared = getattr(self, 'prepare_{}'.format(name))(name, *args, **kwargs)
        return getattr(self, 'finalize_{}'.format(name))(name,
                                                         prepared.call(self, *prepared.args, **prepared.kwargs),
                                                         *prepared.args,
                                                         **prepared.kwargs)

    def prepare(self, name, *args, **kwargs):
        return PrepareCallArgs(call=self._methods[name], args=args, kwargs=kwargs)

    def finalize(self, name, result, *args, **kwargs):
        if self.throw_on_error and result.status_code >= 400:
            error_msg = "Error when invoking {} with parameters {} {}: {}"
            raise APIError(error_msg.format(name, args, kwargs, result.__dict__))
        if self.load_json:
            content = result.content if isinstance(result.content, str) else result.content.decode('utf-8')
            return json.loads(content)
        return result.content

It's pretty straightforward: outer_call wraps bound call in an anonymous function - it's required since we want to pass the name argument. Therefore when we invoke a method, we actually invoke the lambda created in outer_call, which knows the method name and only requires args and kwargs of the method call.

We introduced a helper objects along the way, PrepareCallArgs: it wraps up all the data required to invoke the actual API call. It's great place to put custom input transformations, i.e. from in-project conventions to provider conventions. By default it only keeps what was passed, filling the input with sane defaults when needed.

class PrepareCallArgs(object):  
    __slots__ = ['call', 'args', 'kwargs']

    def __init__(self, call=None, args=None, kwargs=None):
        self.call = call or (lambda *arguments, **keywords: None)
        self.args = args or []
        self.kwargs = kwargs or {}

Okay, so the only thing left now is how the APIMethods themselves work. The logic is really simple - you instantiate them with appropriate data, and in return when __call__ed they return the results obtained from the API provider. The following example is best suited for usage with REST APIs:

class APIMethod(object):  
    def __init__(self, http_method, schema):
        self.name = None
        self.http_method = http_method
        self._params = []
        self._schema = None
        self.schema = schema

    @property
    def schema(self):
        return self._schema

    @schema.setter
    def schema(self, schema):
        self._schema = schema
        self._params = [a[1] for a in Formatter().parse(self.schema) if a[1]]

    @property
    def params(self):
        return self._params

    def __call__(self, api, **kwargs):
        params = {key: value for key, value in kwargs.items() if key not in self.params}
        return api.invoke(self.http_method, self.schema.format(**kwargs), params=params)

During initialization we set some values (providing defaults). We also have two properties, one of them being read-only, since it's a derivative of schema. When setting new schema value, we extract all format string parameters from it, and put them into params property. The __call__ method prepares API call params and falls back to GenericAPI instance owning it to invoke the call (that's the very second thing we knew we want to do!). That's the whole roundtrip, we're done.

Enter devourer

So that pretty much sums up how did we arrive at the current state of GitHub devourer repository. The package is also available on PyPI, so if you want to play with it you can just pip install devourer. It's stable (we run it in production, so we'd be the first to know if something broke), has nice docs, passing tests with 100% coverage and pylint score of 10.00 if you want to dive into it and/or contribute.

When it comes to an example, I obviously can't share exact use cases and code snippets from our production systems, but luckily there is a placeholder JSON API publicly available, so we can use that:

from devourer import GenericAPI, APIMethod, APIError

class TestApi(GenericAPI):  
    posts = APIMethod('get', 'posts/')
    comments = APIMethod('get', 'posts/{id}/comments')
    post = APIMethod('get', 'posts/{id}/')
    add_post = APIMethod('post', 'posts/')

    def __init__(self):
        super(TestApi, self).__init__('http://jsonplaceholder.typicode.com/',
                                      None,
                                      load_json=True,
                                      throw_on_error=True
                                     )

api = TestApi()  
posts = api.posts()  
post = api.post(id=posts[0]['id'])  
comments = api.comments(id=post['id'])  
new_post_id = api.add_post(userId=1,  
                           title='Breaking news',
                           body='I just got devoured.')
try:  
    post = api.post(id=new_post_id)
except APIError:  
    print('Oops, this API is not persistent!')

Lessons learned

I always enforced a very strict coding standards policy - consisting of writing elegant code passing a little customized pylint, containing docstrings for every module, class, method and function and enough tests (100% coverage is a must) for new person to be able to figure the project out by oneself - it helped tremendously with opensourcing the library. The only things I had to do was creating a new example and setting Travis CI and readthedocs up. I was wondering whether the policy wasn't too strict, but now I'm assured the benefits far outweight the costs.

What's next?

Whenever I'll be able to spend several hours writing, I'll explain the thought process behind our second opensource project, basilisk. In the meantime, if you have any feedback, questions and/or suggestions, want to talk Python or say hi - feel free to tweet me!

You might enjoy:

Prev post No more posts
Next post No more posts

Comments

comments powered by Disqus