Another day, another iteration!

[The following post is a rundown of updates from our most recent newsletter. Enjoy!] 

Highlight of the week: BuzzData on Flowing Data 

In case you missed it, BuzzData recently caught the eye of Nathan Yau, author of the popular blog Flowing Data and the new hit data-viz book Visualize This! Yau wrote a nice review of our platform, curious about  whether BuzzData will forge a new path from existing or now-defunct data platforms. [Short answer: yes. Long answer: read this.] We were psyched about the coverage. Thanks, Nathan!

Development News

Over the last week, our team has really refined our dataset upload engine. Now when you upload your data, you’ll clearly see how much time is left to go in your upload and how far along it is. In addition, our ingest process has become far more flexible, so if you have some funky formatting hidden in your dataset, we’re better equipped than ever to accommodate it.

Adding links and formatting your datasets

We’ve had a couple of users recently ask about when they’ll be able to add links and headers to their datasets. Actually, you can already do this on BuzzData, using a simple web-writing syntax called Markdown.

If you’ve never tried Markdown, don’t worry, it’s easy. To add links to your dataset overview, just write it as shown:

markdown

Want to learn how to add headers, fonts, images and more? John Gruber has a great primer on Markdown, as well as a funky web app for testing it out. Give it a whirl. And if you need a hand, just tap us on the shoulder at support@buzzdata.com and we’ll help you out. 

Okay, that’s it for now. See you on the site! 

The BuzzData Team 

BuzzData’s Greatest Bits (Vol. 1)

Over the last two weeks, BuzzData users have begun populating the space with data — some datasets are clearly just for fun, but others are worthy of real investigation by data-literate folks. 

It’s not difficult to see which datasets on BuzzData are the most intriguing; follower totals tend to reflect what people think will be the most interesting. However, this doesn’t necessarily mean they’re the most significant or informative datasets. Determining which dataset is truly the most interesting requires actual investigation. 

Here a shortlist of some datasets on BuzzData that I think would be the most interesting to hack on, and why: 

Canada Revenue Agency Contracts

As soon as Toronto hacker Ilia Lobsanov uploaded this dataset from his website Disclosed.ca, all kinds of data journalists followed it, and for good reason: it documents every contract issued by the Canadian federal government worth more than $10,000 CAD. 

So which companies get the biggest contracts? What proportion of contracts are awarded through competitive vs. non-competitive bidding? What work is getting contracted? What proportion of the companies are Canadian vs. U.S./international?

These are just a few questions that could be investigated with this dataset. But following the dataset isn’t enough. Mash it, sort it, visualize it — that’s the only way to really know what’s going on. 

From an initial look, I’ve noticed that none of the largest contracts listed in this dataset are marked as competitively sourced, and they have almost all been “amended” in one way or another. Not sure what that means yet, but it’s an interesting trend to investigate further. Thankfully contract reference IDs have been listed in each row.

Water Billing by Ward

It’s amazing to me that this dataset hasn’t been hacked on yet: 10 years worth of water usage data, per Toronto ward. Which ward wastes the most water? Which ward is the most water-efficient? Which ward’s water usage has changed the most over time? These are all great questions T.O.-based data journalists can start hacking away at. 

Having said this, I’ve cloned and begun wrangling this dataset on my own and have been coming up with some stratospheric water usage numbers per residential account in Toronto. Waiting to hear back from the City for clarification on this. Methinks the units might be in litres, not cubic metres? Li’l help — anyone?

World Bank World Development Indicators

We are extremely excited that the World Bank’s open-data rep Tariq Khokhar has begun uploading their data to BuzzData. This mammoth dataset, their first, is described as follows: 

“The primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates.”

This dataset holds significant information about the economic status of countries around the world and over multiple decades. I only wish I knew how those indicators are measured and what exactly they reflect. What does an indicator value of 120 in Afghanistan in 1960 mean relative to a value of, say, 500 in 1968? I’m excited to know more.

Wanna take a shot at hacking on this data? Sign up for BuzzData and show what you can do. Got some good ideas? Email me at momoko@buzzdata.com and I’ll spread the word. 

-Momoko Price 

What BuzzData will (and won’t) be

“If you want to build a ship, don’t drum up people together to collect wood and don’t assign them tasks and work, but rather, teach them to long for the endless immensity of the sea.”

-Antoine de Saint-Exupery

The BuzzData beta has been public for a few weeks now. Its general reception so far has ranged from evangelistic enthusiasm for its early activity to tentative, thoughtful speculation about its future direction. 

BlogPulse co-creator Matthew Hurst earlier this month attempted, understandably, to position BuzzData in the data value-chain alongside pre-existing startup models: “It is going to be very interesting to see how the site grows and evolves,” he wrote. “Is it a commercial version of IBM’s Many Eyes? A twist on DataMarket or InfoChimps? A re-implentation of Swivels (the YouTube of data)?”

Social datasets— so what?

A few of our early beta users have probably mulled over similar questions since we launched. Many of our early adopters (usually hackers who use Githubget it: they can see where we’re headed and dove right in, while others, likely less familiar with collaborative workflow apps like Github, might upload a test dataset, follow a couple of other users, and then think: “okay, so now what?” 

BuzzData’s social features and easy-to-use UI are familiar value-adds in a post-Twitter world, but we as a team have come to realize that our True Big-Picture Mission is not nearly as easy to recognize for our early adopters. To be clear, with these social features (and many more to come), the BuzzData master plan is nothing less than to gradually infuse the data community (and beyond) with the same real-time, social, collaborative energy that revolutionized innovation for web developers a decade ago. 

What BuzzData will (and won’t) be

In answer to Hurst’s above question, BuzzData is not going to be anything like ManyEyes (or InfoChimps or DataMarket, for that matter). Sometimes I personally think BuzzData might be better described as “ManyHands” for data: as in, “many hands make light work.” 

The real vision of BD’s co-founders Pete Forde and Mark Opausky is an online hub whose purpose goes far beyond that of any static catalogue or “data marketplace,” as existing data-startups are now called. Our goal is to create a place where users — whether they’re individuals, news agencies, science labs, governments — have the power to publish, build, revise and expand existing data into information that’s more current, accurate, accessible and ultimately useful than any version of data they might create alone.

In general, data management is still a relatively isolated, esoteric process — if only someone (hint hint) was focused on connecting people more intuitively and efficiently to their data, their interests and each other, future innovation and knowledge discovery might move more quickly and reliably, while requiring less unpleasant gruntwork per individual person.

Wouldn’t that be nice?

Keeping our eyes on the prize

To improve the speed of data collaboration on BuzzData, one user recently suggested we implement Google-Spreadsheet-like editing functionality to BuzzData. We definitely agree, this seems like an intuitive move, but: we actually have our own plans in mind. Google Spreadsheets is great for on-the-spot, one-off group editing; we’re really bent on creating a place where the best, most current, most accurate data floats to the top, as easily accessible to its audience as it is attributable to its publisher. 

That said, there are many ways to skin a cat, and problems can often be solved by multiple routes. We’re really looking forward to hearing what our users think of the route we’ve taken once it’s fully unveiled. 

Social functionality and easy dataset publishing is just Stage 1 of BuzzData’s ultimate vision. We really hope you’re enjoying it. Stay tuned, because there’s a lot more in store for you. 

-Momoko Price

Got some ideas about improving data workflow? Try out the site (it’s free) and tell us your ideas at support@buzzdata.com (or feel free to bug me directly at momoko@buzzdata.com).

BuzzData Site Superstars (Vol. 1)

BuzzData isn’t just a data platform, it’s a community. As such, we’ll be regularly highlighting users who show exceptional creativity and initiative on the site. Here’s the first pair from last week (more to come): 

David Joerg  Alexander Smith 

Joerg, founder of The Data Collective, got active on BuzzData pretty much immediately, scrutinizing datasets and asking a host of intelligent questions. We love this kind of activity on BuzzData, not only because it gets people thinking, but because it prompts other users to maintain good “data etiquette,” ex. sourcing your data, specifying header rows, explaining your data appropriately. This is something we take for granted  on media sites like Vimeo, and should be actively encouraged in the only-recently visible data community. 

Because Joerg knew the value of simple visualization, he graphed some of the Globe and Mail’s data, quickly finding an apparent yet-unreported spike in sugar, which he prompted notified the Globe about. Shortly after this, Alexander Smith, CEO of Graphient, added oil-price-per-barrel data to the graph to further highlight the trend. He’s since looked into possible leads for what’s behind it. You can read about the whole development in a recent post on open-data advocate David Eaves’s blog

Perhaps one of the coolest things about this kind of activity, besides the data mashups and cross-disciplinary collaboration, is the very encouraging lift in constructive, informed dialogue with newsmedia. 

News website comment threads are often riddled with emotional and ideological blanket statements, etc. (as well as great contributor insights, let’s not forget). Data doesn’t just attract the data-literate (when was the last time you saw someone conceding in a newspaper comment thread that their convictions could benefit from a little regression analysis?). It also has the fantastic capacity to ground dialogue and keep the talk focused on numbers and reality, rather than people and beliefs. 

Here’s hoping we see more of this in the future! 

 

Data-journalism reunion, anyone?

Well, we said we’d try to do these every month, so we’re back! BuzzData’s putting on another data-journalism workshop next week — can you make it? Be there or well, suck! Just kidding. But you should come.

 

A lot of things have happened in the last month, most notably: BuzzData is now public (and pretty awesome, if we do say so ourselves). So in addition to having the chance to learn more data-wrangling tools, this time you’ll have the opportunity to start using BuzzData and get connected to great data journalists and hackers around the world who are already using it. 

The details:

WHAT: BuzzData’s Data-Journalism Fun-Times (Vol. 2)

WHO: You, silly. RSVP to momoko@buzzdata.com 

(Space limited to 20 attendees. Don’t worry, we’ll do another round the following week if need be)

WHEN: Wednesday, August 17, 6pm - 9pm

WHERE: The BuzzData office - find us at 174 Spadina Ave, Suite #204 (just north of Queen and Spadina)

DETAILS: This workshop will be decidedly less intense & more individual project-focused than last time. Needless to say, bring your laptop and power cord. We will teach some tools, ie: more Google Refine wizardry (barely got started on GRefine last time), and an introduction to ScraperWiki, as well as show you how to look for and do data collaboration on BuzzData. A number of users on BuzzData have already started publishing weird, wild data worth mining, so it will give you an opportunity to practice viz/analytical skills with clean, machine-readable, interesting data, too. 

We also highly recommend you bring in either a) a project you’re working on, b) the basic pitch of a project you want to start, or c) a story/topic you would like to tackle from a data/quantitative angle. After all, much of data journalism is problem-solving. So pick your problem and let’s get to solving it!

Anything else? No? Great! See you there!

BuzzData, live and uncensored

BuzzData has now been public for one whole week. Time for an update! First, the community snapshot. What groups stand out on BuzzData so far?

BuzzData’s community is bustling — we’ve got close to 1,000 users registered on the site, many of them developers (obviously) but also a surprising and exciting number of data-loving journalists hailing from Canada, the U.K., the U.S. and Europe


We’ve been extremely impressed with the government open-data curators who jumped on test-driving us early. The City of Vancouver is already publishing data on BuzzData, while Toronto just signed up for an official City of Toronto account as well. We’re really hoping to see Ottawa and B.C. on BuzzData in the near future, too!

Most recently we were pleasantly surprised to discover a developer from Digital Science and the director of IT and Web from the Public Library of Science trying out the platform, each of them posting some fascinating datasets on impact metrics of science papers (here and here, respectively). 

[We recently had a deep talk with fellow Torontonian and quantum-computing superscientist/author Michael Nielsen about what kind of metrics might compel scientists to publish their data, and the challenge is a nuanced one, involving a host of technical, cultural and political considerations (blog post on this to come). So it was especially encouraging to know that news of our platform had already reached the science-data community.] 

All in all, a very promising first week. One thing I’d personally love to see: more avatar pics! Web 2.0 101: Without a pic, you don’t exist!

Next Up: The BD community visualized and BuzzData’s site superstars! Stay tuned …

BuzzData’s now live!

Well, this private party’s been fun, but it’s time to stop being so coy and show the world what we’re about. The BuzzData beta is officially public, open to data lovers (and the data-curious) everywhere!

In the last two weeks, we’ve gotten some incredibly engaged and knowledgeable feedback from our private-beta users. Some of the more memorable, warm fuzzy-inducing excerpts:

“I’m sure you hear the word ‘slick’ and/or ‘sleek’ all the time and are perhaps sick of it by now. But that’s what it is, darn it!”

— “I tried uploading my massive 1,315,816-row CSV today, and it worked! :-D”

— “I kind of wished the sign-up process were more arduous just so I could fill in some more forms.  O_o  That’s some magic fairy dust, that is.”

— “So far I’ve loved what I’ve seen on the site, I’m kicking myself for not getting on there sooner”

And perhaps the most validating one of all:

      Dude, I love using this!”

We hope to get plenty more feedback as we roll out bigger features — every bit helps us build a product that genuinely meets the needs of the expanding data community. Talk to us, we’re listening. 

Curious about our latest iteration? Check it out for yourself. Here’s one fascinating dataset currently on BuzzData: annual food price indices as published by the Globe and Mail:

Below — an overview (cross-indexed by topic and licensed appropriately):

Then of course, the data itself (feel free to clone or download):

Last but not least, the dataset’s followers:

(To date, two beta users have already graphed and mashed up the indices data, unveiling a yet unaccounted-for spike in sugar prices. You can read about the implications of this collaborative investigative effort on open-data advocate David Eaves’s blog today

Intrigued? You should be. And now that we’re live, you can invite your friends and colleagues to check out the site, too; no invite code required. What are you waiting for? 

A few small caveats to consider while we’re in public beta:

— This is still a beta, so there will be bugs here and there (let us know when you come across bugs, we’ll tackle them ASAP.)

— As a beta, we’re still fine-tuning site accommodation in different browsers. BuzzData works by far the best in Chrome, does well in Firefox 5 and Internet Explorer 9, and is functional in Firefox 3.6 and Internet Explorer 8.  

— We’re still sticking to tabular data (csv, tsv, and simple xls) for now. More to come, we promise.

Within the next day or two, we’ll also be rolling out new features that will let you reap the benefits of the platform and get more seamlessly connected to your existing social circle.

No more faceless emailing: BuzzData is giving data users the visibility and voice (and credit) they need (and deserve).  

Embracing the end of the ‘end user’

In step with our imminent public beta launch, BuzzData has recently been written about in VisionCloud, an EU-funded project that focuses on innovations “for the future Internet.” We met VisionCloud contributor and information architect Mirko Lorenz at the Open Knowledge Conference earlier this summer. Lorenz, a speaker at OKCon this year, has high hopes for BuzzData’s impact on data journalism. We hope we can deliver. 

Below is an excerpt from Lorenz’s interview with BuzzData CTO Pete Forde. You can read the whole piece on VisionCloud

At OKCon, the big open data gathering in Berlin at the beginning of July, we presented our ideas and concepts related to future cloud storage, data handling and data-journalism in particular. 

This is how we met Pete Forde, co-founder/CTO of BuzzData.

“Do you want to see what we have been working on? I think we solved a few of the problems you where just talking about.”

The next minute, on the back of a Biergarten table, Forde briefly demo-ed BuzzData, a soon to be launched platform enabling collaborative data interrogation. The system takes the open-data approach further, overcomes limitations of platforms such as Google Docs and could spark interesting collaborations in communities around the world. 

An age without end-users

BuzzData is addressing a larger theme evolving around the web, open data and new uses of all the tools that are now available: Effectively it allows to take a data set, publish it and then dig into the information concealed in the figures in public - alone or by sharing it with others. Datasets can be copied as a clone, thus opening many new ways to play with them. 

The service fills a need that gaining importance. It is increasingly important to know how numbers affect our daily routines. This is not confined to a single area of life, many areas will be affected: Business, government, health, your work, your community.

 Journalists and media companies are among the firsts to feel the growing pressure to make use of such new possibilities. Jeff Jarvis, journalist, professor and book author for example says that in the future media world “the article will be luxury”. Instead we will see a process in which journalists and users work together to really find out about a problem or development affecting the community.  

Every user is the start of something new

On the IT side of things, an interesting article addressing another angle of this change, says: ”There are no ‘end users’ anymore. With good BI, and especially with newer business discovery or self-service tools, no user is at the ‘end’ of anything. Every user is the start of something new.” (Source: Information Management)

Interview with Pete Forde:

Can you briefly describe the benefits of BuzzData?

Forde: BuzzData treats datasets as destinations where a community of interest can form. People used to hunting for data in a vacuum will love being able to discuss and annotate datasets. They can attach articles, visualizations, apps and even source code.

Meanwhile, they see that they are getting timely, accurate, complete data with proper licensing straight from the publisher. There’s no scraped data on BuzzData. Publishers love it because they can finally see who is interested in their data, and what they are doing with it.

How did you get the idea?

Forde: I was writing a book about open data and how it should be for all people, and how we could use it to fix some of the world’s problems. I was working on a related project that got me really interested in the open data movement. Right around the time startups like InfoChimps were being announced. To me, it seemed like the data marketplaces were missing an obvious opportunity. Sure, some people want to buy datasets from a cart. However, the biggest problem in data today isn’t finding it, but connecting the communities and educating the public. There’s a BuzzData-shaped piece of the data value chain missing that’s obvious, if you’re looking.

Was it difficult to get support or funding for this?

Forde: We raised an angel round from four exceptional Toronto investors. It was exceptionally difficult! There are not many active angels in Toronto, unfortunately. We scored a major coup when I recruited Mark Opausky to be our CEO. Mark built a $50M software company before we met, and so I get to learn from the best.

If you could make a wish: Which kind of users should use BuzzData?

Forde: I think that initially it’ll be very popular with journalists, bloggers and data hackers. However, we’re working very hard to make sure that we’re solving huge problems for scientists and academics. I have a crush on all librarians (it’s the glasses) so I’m making extra sure to think like an archivist when we design our features. Ultimately, I’d love to be responsible for seeing a dataset homepage manifest in Google search results right beside Wikipedia.

Then we’d have everyone using BuzzData, regardless of their tech ability.

That’s my dream.

Read more at VisionCloud.

Data-driven journalism, done faster

From the start, we went out of our way to enlist the participation of groups and businesses for the BuzzData beta — after all, BuzzData is all about improving group collaboration around data, right?

Having said that, bringing businesses on board at the beta stage, let alone post-commercial launch, is no small feat for a funky, outside-the-box app like BuzzData. The concept of open data is still relatively new, and simple workflow tools for data wrangling and sharing are rare. Finding organizations that were hip to the movement and up for trying a new, untested digital app was a fun challenge, needless to say.

Lucky for us, a small number of influential, forward-thinking organizations came forward to test the beta right at from start, including: 

The Economist Intelligence Unit 

The Globe and Mail (Canada’s national newspaper)

Global News (Canadian broadcast and online news)

The City of Vancouver

And while the beta’s only been active less than a week, we’ve already witnessed instances of unscripted cross-pollination between media, government and data-literate citizens. This is hugely exciting to us. 

The Globe and Mail’s account in particular, hosted by Toronto Hacks/Hackers organizer and Globe mobile editor Mason Wright, has been off to a promising start, largely because Wright clearly gets the give/take aspect of social networking, posting Globe articles to other users’ data and making an effort to put the Globe’s data in context with accompanying articles and visualizations.

It’s fascinating to watch this happen in the context of data. We’re so used to static catalogues and repositories that appear to move at a glacial pace. In contrast, on BuzzData you tell a user something — whether it’s your best friend or a national newspaper — and they talk back to you as a visible, dynamic, listening entity, a single degree of separation away. Not a new phenomenon to social media, certainly, but a refreshing change of pace for data communication. 

As an example, last week the Globe uploaded food price indices data as an accompaniment to a recent Report on Business article. The article itself focused on short-term food prices, but New York-based beta tester David Joerg took the data and, by simply plotting the data over time in Excel, uncovered a startling spike in sugar prices no one had yet noticed: 

Even Wright was surprised to see this. So the question remains: what’s driving the price inflation of sugar? Perhaps Joerg’s cursory data-viz will trigger an entirely new business investigation by the Globe in the near future. That would be incredibly cool, and a truly unique example of collaborative data journalism — one that, in an instant, transcended national boundaries and professional disciplines.

Not bad for the first five days of a beta. 

Got data? Get on the beta (like, right now!)

BuzzData’s beta is officially underway! We’re bringing a first wave of users on board today and more every few days from here on. We’re so excited (and exhausted)!

But first: We at BuzzData built this platform not just for people who are interested in data, but for people who have it. More and more people collect, use and wrangle data these days. They need a place to show off their work and collaborate with others in a way that’s truly efficient, dynamic and fun.

If you’re listening, please know: We built BuzzData for you.

We still have a lot of people to bring on board, but we’d like to let those with data onto BuzzData as soon as possible. You’ve waited long enough (and deserve better than Google Docs, for god’s sake!).

So if you’ve got data, let us know and we’ll hook you up with an account immediately. Email us directly at blog@buzzdata.com with the subject line “Got Data” and we’ll take care of you :)

Oh yeah! Wondering what BuzzData looks like? Check out the demo video for a taste: 

One last thing: if you haven’t signed up for the beta at all yet, seriously: what are you waiting for? Get on the bus, yo!