
- Data sets are often disparate and/or siloed, thereby reducing a user’s ability to utilise multiple sources in an analysis
- While city officials are often highly capable problem solvers, they often lack coding expertise and struggle to manipulate often unwieldy datasets (e.g. the 2011 Census) efficiently into formats necessary for informative visualisation and downstream analysis
- As a consequence of (2), the contemporary modelling approaches utilised for analysis of city data, e.g. internal populations migrations/dynamics, are limited to linear methods and Excel-compatible file-sizes viz sub-gigabyte level
- Training in responsible data-use strategies and statistical best practices have seldom been provided to officials, or to most people for that matter, at secondary and tertiary schooling levels
The platform we envisaged would attempt to alleviate these challenges by means of the following built-in functionalities:
Multiple datasets would be made accessible within a single interface. These would include:
- 102 of SACN’s city indicators that are published yearly in the printed State of South African Cities Report.
- The 2011 South African National Census as accessed through the Wazimap data tool built by OpenUp for Media Monitoring Africa (MMA)
- Municipality budget data as accessed through the Municipal Money data portal built by OpenUp for National Treasury
- Custom datasets that city users themselves upload to the platform
A generalised visualisation framework that generates an interactive dashboard for any dataset passed to it and that facilitates the exporting of graphs and data tables in a variety of popular file formats
A modelling framework by which researchers can contribute and collaborate on novel, open-source approaches for city-level analysis. As a prototype for this infrastructure, we sought to develop a novel “open-system” demographics model that leverages Big Data information sources to produce a continuously improving analysis framework for studying migration within cities
A user-centered, easily accessible data processing experience that has data fidelity checks and balances built in so as to ensure that all information combined for an analysis has passed a sanity check.
Over the course of the coming weeks I will discuss each of these key functionalities in a series of technically-focused posts on how we built them into the platform, the cool things we discovered, and the lessons learned from challenges we faced. These articles will often delve into pieces of code used, however, I will attempt to ensure that the narrative allows anyone to understand the significance of each component. If you’d like to take a gander at the code and follow along, it can be found on our GitHub.


