2018 Recap, our work with tools for Argentina’s open data community

A quick review of our year working with Argentina’s Modernization Secretariat in several open data tools.

Rafael Romero
devartis

--

As a new year starts, we’d like to take a step back and go over some of the work of this past year. 2018 has seen improvements on existing products and a couple of new projects altogether. This article will go briefly over each of them.

On this article we’ll go quickly over some of the projects we’ve been working on: Andino, monitoreo-apertura, series-de-tiempo and API-gateway. Everything is open-source and you can check it out for yourself!

Andino

We’ll start with Andino, a customizable CKAN instance which allows organizations to publish and distribute their data. Here we made improvements to the customization options for the offices implementing the platform. Some of them are:

  • Cache options: The user can opt between an out-of-the-box default, or a user-defined extended cache.
  • HTTPS: On install or update, the admin can define the SSL certificates plus the port to which redirect HTTP and HTTPS requests.
  • Plugins: We now offer a plugin template for developers to add their own plugin to visualize their data.

On top of that, we made improvements on the UI/UX and performance fronts. Plus, the central data-harvesting site is now an Andino instance. As they say, eat your own dog food.

Monitoring-tools

On that topic, we’ve also developed new tools for managing metadata for the catalogs that collect the data published by the different public offices. We have new features in pydatajson (a Python tool for manipulating these catalogs’ metadata), making the harvesting process easier and more manageable. Plus we’ve launched django_datajsonar, a django app to model the different open data catalogs in a standard manner.

Using both of these tools we set up monitoreo-apertura, a django application used for monitoring the state of the different nodes. It runs three major tasks:

  • Data harvesting: Keeping the central node of datos.gob.ar updated with the latest changes in the network.
  • Metadata validation: Analyzing the catalogs metadata to ensure they are correct and report possible errors.
  • Indicators calculation: Taking measurements of data quality across the network. In other words, how complete and well documented is the published data. Here we can see if our efforts are paying dividends; we expect to see the data quality trend upwards over time.

Time series

We released a public API to search, look up and aggregate series from the National Time Series database. The API allows users to query up to 40 series simultaneously, filtering by date or aggregating by different periodicities. At the time of writing, the API is averaging 20,000 daily hits. You can check the documentation (in Spanish) here.

Time series explorer in action.

On top of that, you can download the entire database in different formats from here. Or, you can also explore the different time series using the time-series-explorer we released this year. With this tool you can chart, transform, compare, and share different series from the database. Go ahead, give it a try!

API gateway

API gateway is one of the projects we started this year. It provides a way to manage the different APIs from the National Public Administration offering other services as well. Among those, we count:

  • Routing under the same domain for all APIs, apis.datos.gob.ar.
  • Rate limiting for anonymous and registered users.
  • Authentication and authorization for users.

On top of that, it gathers analytics ranging from unique users, devices, frequency of different queries, and more. It is also integrated with Google Analytics to present this information. There is also the possibility to download the raw info in csv format to calculate your own metrics, and in the near future we are hoping to add Kibana into the mix to visualize them. Right now, API gateway is managing two services: the time-series API we discussed earlier and the Normalization Service for Argentina’s Geographical Data.

Infrastructure

The stack supporting all of this is what ultimately powers the projects and keeps things running smoothly. In the case of Andino, we have a multi-repo structure keeping the core, logic, and services detached. This allows us to minimize overhead for builds and quickly upload changes and release new versions to Dockerhub via Travis-ci.

As for the rest of the projects, we use Django with dockerized services. Here we also use Travis as our Continuous Integration server, running test suites for new changes and managing deploys with Ansible. Managing the projects this way turned a nightmare-inducing migration to vCloud, into a relatively painless process. If you are interested in the technical side of things, I cannot recommend this article series enough.

Conclusion

It’s hard to review an entire year of work without going into too much detail. I could go on all day, but I’ll save you the tedium. All that is left for us to say is that we wouldn’t have accomplished half of the objectives we set out to, without the effort and teamwork from everyone involved. Looking back, 2018 was a year with many accomplishments and hard work. Here’s to 2019 being even better!

If I’ve managed to pique your interest, you might want to read more about Andino, here. Or, if you are more technically oriented, I recommend our approach to handling ElasticSearch data on Python. Both written by team members working in this project. In any case, thanks for reading!

Visit us!

--

--