Mournbots, Poppy Files and Veterans’ Day-ta

Digital newsrooms at the Ottawa Citizen and OpenFile decided to use technology to help connect their readers to our war-torn past this Remembrance Day.

@Wearethedead is a bot created by Ottawa Citizen data journalist Glen McGregor that tweets one fallen Canadian soldier on the 11th minute of every hour. Excluding any updates from today onward, it will take 13 years to tweet the entire database. 

You can read more about how McGregor came up with the idea in the Ottawa Citizen blog post here

Each tweet includes the name, position, date and location of death, and age of the soldier who died. 

OpenFile’s Poppy File  is a data-driven historical retrospective on Canadian veterans that has blossomed from a simple map only a year ago into a beautiful, popular interactive series that allows viewers to discover the identities of soldiers killed in war who once lived in their neighbourhoods, in addition to touching personal narratives and summary charts.

Lovely to see newsmedia using technology creatively to help us connect to an increasingly distant past. 

-Momoko Price

Our next data-journalism workshop is nigh …

The Guardian's data journalism workflow

(Part of the Guardian’s popular Prezi presentation of its data-journalism workflow) 

… and well, it’s going to be awesome. 

In case you missed my most recent email update, here’s the rundown of tomorrow afternoon:

WHERE: The Marketcrashers Hackernest, 231 Wallace Avenue, Toronto, ON

WHEN: Saturday, September 24, 1 pm - 4 pm

WHAT TO BRING: Laptop, power cord, and a determined, enthusiastic attitude

HOW TO RSVP: Please do so on this Meetup

TENTATIVE EVENT SYLLABUS: 

The workshop itself will have two “streams” happening simultaneously:

STREAM 1 (for hackers): 

  1. A hackathon-style brainstorm-fest for the more advanced hackers

  Like a hackathon, I’ll pick some specific datasets to hack on, as well as offer a prize for best project. (iTunes gift card? value ~ $100.)

  Considering this is meetup is data-journalism-focused, the goal of the contest  will be: Can you find and convey a story from this data? The finished project  can be anything: an essay, a data-viz/infographic, an app, but it has to be web-  publishable. 

  The non-competing “hacks” of the group act will act as as final jury on selecting  the winner, since they have the narrative expertise and editorial sense to evaluate  projects on their clarity, novelty, story cohesiveness.

  Attendees would have the meetup time to brainstorm ideas and/or pick collaborators, and then have the following week to code the h*** out of it. 

  I will be sending out links to the candidate datasets and other rules of the  contest to the RSVP list on Meetup, so be sure you’re on it if you plan to participate. 

STREAM 2 (for hacking newbies): 

  A tutorial/skill-learning stream, with a planned step-by-step curriculum of exercises.

I was going to plan this around simple data-wrangling tricks I hear about through my job, but a data-hacker friend of mine made a really good point about journalists not trying to avoid coding if they really want to mine data. 

  In his words: 

  “To my mind, all of these various sites and tools are great so long as you have to   have a problem / data set small enough to use with them … and have a problem   that the tool is actually suited to address. For all of the effort spent, why not learn a fully general programming language and, having obtained mastery, wield great power over mere mortals?”

Frankly, I agree, and I think most journalists who want to hack are actually eager to learn, as long as they have some kind of curriculum to follow and someone to coordinate. So that’s what I’m going to help with. 

So, the suggested general curriculum for newbies:

PART I: 

- install Python before workshop

- intro to python syntax

- basic data types: primitives, tuples, lists, dictionaries

PART II: 

- fetching a JSON file via HTTP

- working with JSON (much nicer than CSV if you can get it)

- storing data in a SQLite database

- querying the database

Tackling these face-on, from the ground-up, will empower you to:

  1. write and customize scripts - which is what data-scraping is all about.
  2. make full use of public APIs and actually know how to use the data once you get it
  3. be able to explore data more flexibly without relying on outdated, proprietary software like Microsoft Access

I’m getting Part I ready for this workshop, and maybe we can set up a followup workshop in a week or so to tackle Part II. Python is actually a really fun, easy language to learn, by the way. You’re going to love it!

And of course, everyone is free to discuss other projects they want to tackle, etc.

Again, if this sounds like something you’d like to join, please RSVP! (And if you can’t make it this time, please un-RSVP, so those on the wait-list can get in on it.)

-Momoko Price

Data-journalism reunion, anyone?

Well, we said we’d try to do these every month, so we’re back! BuzzData’s putting on another data-journalism workshop next week — can you make it? Be there or well, suck! Just kidding. But you should come.

 

A lot of things have happened in the last month, most notably: BuzzData is now public (and pretty awesome, if we do say so ourselves). So in addition to having the chance to learn more data-wrangling tools, this time you’ll have the opportunity to start using BuzzData and get connected to great data journalists and hackers around the world who are already using it. 

The details:

WHAT: BuzzData’s Data-Journalism Fun-Times (Vol. 2)

WHO: You, silly. RSVP to momoko@buzzdata.com 

(Space limited to 20 attendees. Don’t worry, we’ll do another round the following week if need be)

WHEN: Wednesday, August 17, 6pm - 9pm

WHERE: The BuzzData office - find us at 174 Spadina Ave, Suite #204 (just north of Queen and Spadina)

DETAILS: This workshop will be decidedly less intense & more individual project-focused than last time. Needless to say, bring your laptop and power cord. We will teach some tools, ie: more Google Refine wizardry (barely got started on GRefine last time), and an introduction to ScraperWiki, as well as show you how to look for and do data collaboration on BuzzData. A number of users on BuzzData have already started publishing weird, wild data worth mining, so it will give you an opportunity to practice viz/analytical skills with clean, machine-readable, interesting data, too. 

We also highly recommend you bring in either a) a project you’re working on, b) the basic pitch of a project you want to start, or c) a story/topic you would like to tackle from a data/quantitative angle. After all, much of data journalism is problem-solving. So pick your problem and let’s get to solving it!

Anything else? No? Great! See you there!

Data-driven journalism, done faster

From the start, we went out of our way to enlist the participation of groups and businesses for the BuzzData beta — after all, BuzzData is all about improving group collaboration around data, right?

Having said that, bringing businesses on board at the beta stage, let alone post-commercial launch, is no small feat for a funky, outside-the-box app like BuzzData. The concept of open data is still relatively new, and simple workflow tools for data wrangling and sharing are rare. Finding organizations that were hip to the movement and up for trying a new, untested digital app was a fun challenge, needless to say.

Lucky for us, a small number of influential, forward-thinking organizations came forward to test the beta right at from start, including: 

The Economist Intelligence Unit 

The Globe and Mail (Canada’s national newspaper)

Global News (Canadian broadcast and online news)

The City of Vancouver

And while the beta’s only been active less than a week, we’ve already witnessed instances of unscripted cross-pollination between media, government and data-literate citizens. This is hugely exciting to us. 

The Globe and Mail’s account in particular, hosted by Toronto Hacks/Hackers organizer and Globe mobile editor Mason Wright, has been off to a promising start, largely because Wright clearly gets the give/take aspect of social networking, posting Globe articles to other users’ data and making an effort to put the Globe’s data in context with accompanying articles and visualizations.

It’s fascinating to watch this happen in the context of data. We’re so used to static catalogues and repositories that appear to move at a glacial pace. In contrast, on BuzzData you tell a user something — whether it’s your best friend or a national newspaper — and they talk back to you as a visible, dynamic, listening entity, a single degree of separation away. Not a new phenomenon to social media, certainly, but a refreshing change of pace for data communication. 

As an example, last week the Globe uploaded food price indices data as an accompaniment to a recent Report on Business article. The article itself focused on short-term food prices, but New York-based beta tester David Joerg took the data and, by simply plotting the data over time in Excel, uncovered a startling spike in sugar prices no one had yet noticed: 

Even Wright was surprised to see this. So the question remains: what’s driving the price inflation of sugar? Perhaps Joerg’s cursory data-viz will trigger an entirely new business investigation by the Globe in the near future. That would be incredibly cool, and a truly unique example of collaborative data journalism — one that, in an instant, transcended national boundaries and professional disciplines.

Not bad for the first five days of a beta. 

25 great links for data-lovin’ journalists

Knowing how to avoid errors like this just one reason to love being a data journalist:

In case you missed it — everything we worked on last weekend (and plenty more)!

WORKSHOP PART 1: Intro to ScraperWiki and ManyEyes w/ Momoko Price

For the first half we worked on visualizing data with ManyEyes. We used arms import/export data courtesy of our friend and my doppelganger at ScraperWiki, data journalist Nicola Hughes:

Scraped data (see the icon that says “Download the Spreadsheet (CSV)?” Yeah, do that.):
http://scraperwiki.com/scrapers/arms_imports_database/
http://scraperwiki.com/scrapers/arms_exports_database/

If the source of the data isn’t apparent, check the scraper script (click on the tab that says “Edit”) and check for the source URL. Like so:

http://scraperwiki.com/scrapers/arms_imports_database/edit/

(Did you find the source? Good.)

You can check out a few of the visualizations we made in ManyEyes (for teaching purposes only. I don’t actually think these are great viz’s):

http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/total-arms-exporting-volume-per-na

http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/arms-importing-and-exporting-natio

http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/top-10-arms-exporting-nations-stac

We also used Google Refine to start cleaning up data taken from the new Canadian International Development Agency open data portal. Did you know they just launched one? Well they did:

CIDA Database Source:
http://www.acdi-cida.gc.ca/cidaweb/cpo.nsf/fWebprojDataEn?Readform

Google Refine:
[Data manipulation and cleaning tool]
http://code.google.com/p/google-refine/wiki/Downloads

(Keep in mind, GRefine keeps track of every single alteration you to a dataset, so don’t ever worry about doing something “wrong.” You can always go back. Version control, what an amazing thing.)

WORKSHOP PART 2: Mapping, FusionTables and FusionTables Layers with Joey Coleman

All of Joey’s workshop materials can be found on his data page:

http://data.joeycoleman.ca/

I believe he’ll be posting slides of his presentation soon …

OTHER COOL DATA-JOURNALISM REFERENCES:

Paul Bradshaw’s online journalism blog (amazing resource):
http://onlinejournalismblog.com/

NICAR-L Discussion mailing list (National Institute of Computer
Assisted Reporting)
http://www.ire.org/membership/subscribe/nicar-l.html

Toronto’s open-data catalogue:
http://www1.toronto.ca/wps/portal/open_data/open_data_home?vgnextoid=b3886aa8cc819210VgnVCM10000067d60f89RCRD

Data Visualization Blogs:

Stephen Few’s Perceptual Edge

http://www.perceptualedge.com/examples.php

David McCandless’s Information is Beautiful

http://www.informationisbeautiful.net/

Doug McCune’s Adobe Flex- and ActionScript-focused blog: 

http://dougmccune.com/blog/


OTHER FUN STUFF (COURTESY OF DATA HACKER ROB MEDEIROS):

Google Public Data Explorer:
[Online data visualization tool]
http://www.google.com/publicdata/home

R Project
[statistics and visualization tool]
http://www.r-project.org/

SQLite
[Small, fast, embeddable SQL database]
http://sqlite.org

Matplotlib
[Python graphing and visualization]
http://matplotlib.sourceforge.net/

OpenDX
[Hard-core old skool data visualization tool]
http://www.opendx.org

Blender
[3-D modelling and rendering application; scriptable w/ Python;
great
for 3-D static or interactive visualizations]
http://www.blender.org/

NumPy
[Scientific computing package for Python; fun w/ numbers]
http://numpy.scipy.org/

GNU Octave
[Mathematica clone; great for numerical calculations,
visualizations]
http://www.gnu.org/software/octave/

Linked Data
[Slightly esoteric vision of the future web in which data is
much
easier to get and work with]
http://linkeddata.org

Semantic Web
[Official home of the future, data-centric web]
http://www.w3.org/standards/semanticweb/

REQUIRED READING


Run, don’t walk, to the nearest bookstore and buy anything
written by Edward Tufte, e.g.

* The Visual Display of Quantitive Information
http://www.amazon.com/Visual-Display-Quantitative-Information/dp/0961392142/

* Envisioning Information
http://www.amazon.com/Envisioning-Information-Edward-R-Tufte/dp/0961392118/

* Visual Explanations: Images and Quantities, Evidence and
Narrative
http://www.amazon.com/Visual-Explanations-Quantities-Evidence-Narrative/dp/0961392126/

* Beautiful Evidence
http://www.amazon.com/Beautiful-Evidence-Edward-R-Tufte/dp/0961392177/

* Visual & Statistical Thinking: Displays of Evidence for
Decision Making
http://www.amazon.com/Visual-Statistical-Thinking-Displays-Evidence/dp/0961392134/

BuzzData gets hands-on with Hacks/Hackers

Heyooo! We put on our first data journalism workshop last Saturday at the Centre for Social Innovation in Toronto and it went grrrrrrrrrrreat!

Still can’t believe we had a full house despite the incredible weekend weather. We attracted a horde of enthusiastic, geeky hacks and hackers ready to learn some new skills.

Special props go to mapping workshop presenter Joey Coleman and the Spectator’s open-data reporter Bill Dunphy, who commuted in from Hamilton. We were glad to see The Hammer represent; they have admirably strong, active digital journalism and civic engagement circles. 

Despite the fact that a number of people had to go home after the first half (understandable; six hours of full-on geekery is a tall order on a sunny Saturday) we still had a pretty full room going all the way to the end. Here, Joey Coleman (centre left) leads us even deeper down the rabbit hole of hacker journalism. 

Clearly in the zone. 

Nerd Alert! (Pete Forde would correct me: “Ahem, that’s ‘Geek Alert’ ” …)

The collaborative energy on Saturday was really exceptional. In light of the trouble Open File Toronto has had recently getting FOI data from Toronto Police Services in digital format (the TPS mailed hundreds of pages of data, refusing to hand over a digital version), we asked everyone to lend a hand with some manual data entry, grassroots style, and they all pitched in. 

[UPDATE: I’ve made a note to contact Carole Moore, chief librarian at U of T and colleague of Open Library/Internet visionary Brewster Kahle, for OCR software recommendations to circumvent bureaucratic blocks like this in the future. Stay tuned!]

One particular data enthusiast and freelance hacker present was especially helpful: Robert Medeiros took our original workshop reference list and expanded it to a veritable treasure trove of ddj resources (below).

We’re planning another workshop in early August — if you want in, by all means let me know at momoko@buzzdata.com and I’ll get you plugged in right-quick.