Congrats to contest winner Martin Madginier!

Congratulations to Toronto-based business analyst Martin Madginier for winning our first local data storytelling contest! As winner, Madginier gets a prestigious and highly coveted $100 ITunes gift certificate ;) Don’t spend it all in one place, Martin (oh, wait …)

Magdinier’s submission won on its merits of accuracy and rigorous methodology, which he explained at length at our meetup earlier this week. While not visually or aesthetically dazzling, his graphs, compared with less sophisticated crunching of the dataset (admittedly done by yours truly for beginner tutorial purposes), demonstrate how data mining results can vary widely from one end of the spectrum to the other, depending on how you drill into the data.  

Realizing that the best way to evaluate water consumption levels across wards would be to compare water consumption per ward population size rather than number of accounts, Magdinier tracked down and manually scraped both 2001 and 2006 census ward population data off pdf files available on the City of Toronto ward profile web pages.

Rather ingeniously, to estimate interim year population numbers, he used compound annual growth rate calculations — a standard financial tool used to estimate the annual growth rate of an investment over a set period of time. He then applied the calculations across all age groups and published his full, cleaned-up population dataset on BuzzData to accompany his main project. 

Rather than presume that population size from one year could be applied to water consumption in other years, Magdinier also made the prudent decision to only mine residential water consumption data of years for which he had population numbers, then calculated each ward’s average residential water consumption across those years, from 2001 to 2006. 

In his first visualization, he plotted each pair of geographically adjacent wards (he decided that visualizing pairs of wards as distinct points reduced noise on the page) as a scatterplot of ward population vs. average annual residential water consumption.

By scanning horizontally across the graph, it’s startling to see how consumption rates can differ between wards with near-identical population sizes. For example, Scarborough Centre’s residential water consumption is about twice the level of Toronto Centre Rosedale, and some wards, such as Etobicoke-Lakeshore, appear to consume more than four times as much water as others such as Don Valley East, despite having a smaller population size.

Moreover, these findings are in stark contrast to initial results from my own basic visualization tutorial blog post, which only used residential account and consumption data, resulting in Centre-Rosedale and Trinity-Spadina standing out — misleadingly — as the city’s highest water consumers. 

Another interesting trend Madginier uncovered in his second graph is that water consumption appears to decrease as the average number of people sharing an account per ward increases. This was calculated by dividing population size by number of accounts per ward and then averaging those values over years 2001-2006. 

All in all, some bang-up data-mining, Mr. Magdinier! Would love to see a touch more aesthetic pizzazz next time to highlight the strongest trends (importing into Illustrator is great for this), but in general a wonderfully thorough effort. Thanks for submitting your work!

Also, for those interested in learning more about how to hack data, Madginier has his own blog on how to make the best use of Google Refine, a powerful data cleaning tool. Check it out!

-Momoko Price

DATASETS FOR OUR NEXT CONTEST (DEADLINE LATE NOVEMBER) WILL BE PUBLISHED OCT. 31 (AND BELIEVE ME, THEY ARE JUICY AND GIANT. STAY TUNED)

Have you tried BuzzData yet? What are you waiting for, silly?

  1. tintenstrahldrucker-test-2012 reblogged this from buzzdatablog
  2. buzzdatablog posted this
blog comments powered by Disqus