All posts in the topic Thoughts on where next (Short link)
I spent the summer break reflecting about what we accomplished and
what we might look at in 2010. I started to talk to an Economist
reporter and realised that I ought to get my thoughts straight first,
and came up with this (attached). In short, I think we should be
trying to build a community of users of open data, helping them help
each other. Please let me know what you think.
Cheers;
Nat
Rethinking Open Data
In the last year I've been involved in two open data projects, Open
New Zealand and data.govt.nz. I believe in learning from experience
and I've seen some signs recently that other projects might benefit
from my experience, so this post is a recap of what I've learned. It's
the byproduct of a summer reflection on my last nine months working in
open data.
Technologists like to focus on technology, and I'm as guilty of that
as the next person. When Open New Zealand started, we rushed straight
to the "catalogue". I was part of a smart group of top-notch web
hackers--we know what a catalogue is, it's a web-based database and
let's figure out the UI flow and which fields do we want and hey I can
hack one up in Wordpress and I'll work on the hosting and so on. We
spent more time worrying about CSS than we did worrying about the users.
This is the exact analogue of an open source software failure mode:
often companies think they can get all the benefits of open source
simply by releasing their source code. The best dinner parties are
about the other people. Similarly, the best open source projects have
great people, attract great people, and the source is simply what
they're working on: necessary but not sufficient. You can build it but
they won't come. All successful open source projects build communities
of supportive engaged developers who identify with the project and
keep it productive and useful.
Data catalogues around the world have launched and then realised that
they now have to build a community of data users. There's value locked
up in government data, but you only realise that value when the
datasets are used. Once you finish the catalogue, you have to market
it so that people know it exists. Not just random Internet developers,
but everyone who can unlock that value. This category, "people who can
use open data in their jobs" includes researchers, startups,
established businesses, other government departments, and (yes) random
Internet hackers, but the category doesn't have a name and it doesn't
have a Facebook group, newsletter, AGM, or any other way for you to
reach them easily.
This matters because it costs money to make existing data open. That
sounds like an excuse, and it's often used as one, but underneath is a
very real problem: existing procedures and datasets aren't created,
managed, or distributed in an open fashion. This means that the data's
probably incomplete, the document's not great, the systems it lives on
are built for internal use only, and there's no formal process around
managing and distributing updates. It costs money and time to figure
out the new processes, build or buy the new systems, and train the
staff.
In particular, government and science are often funded as projects.
When the project ends, the funding stops. Ongoing maintenance and
distribution of the data hasn't been budgeted for almost all the data
sets we have today. This attitude has to change, and new projects give
us the chance to get it right, but most existing datasets are unfunded
for maintenance and release.
So while opening all data might be The Right Thing To Do from a
philosophical perspective, it's going to cost money. Governments would
rather identify the high-value datasets, where great public policy
comment, intra-government optimisation, citizen information, or
commercial value can be unlocked. Even if you don't buy into the cost
argument, there's definitely an order problem: which datasets should
we open first? It should be the ones that will give society the
greatest benefit soonest. But without a community of users to poll, a
well-known place for would-be data consumers to come to and demand
access to the data they need, the policy-making parts of governments
are largely blind to what data they have and what people want.
That's not to say that data catalogues aren't useful. We were
scratching an itch--we wanted easier access to government data, so we
built the tool that would provide it. The community of data users can
be built around the tool. As Krishna was told by Arjuna, "a man must
go forth from where he stands. He cannot jump to the Absolute, he must
evolve toward it". I'm just noting that, as with all creative
endeavours, we learned about the problem by starting to fix it.
Which brings me to the second big lesson: which problem are we trying
to solve? There's an Open Data movement emerging around governments
releasing data. However, there are at least five different types of
Open Data groupie: low-polling governments who want to see a PR win
from opening their data, transparency advocates who want a more
efficient and honest government, citizen advocates who want services
and information to make their lives better, open advocates who believe
that governments act for the people therefore government data should
be available for free to the people, and wonks who are hoping that
releasing datasets of public toilets will deliver the same economic
benefits to the country as did opening the TIGER geo/census dataset.
The one thing these groups don't share is an outcome. I can imagine an
honest government where the costs of transparency overweigh the costs
of corruption (think of the cost of removing every dirt particle from
your house). I can imagine PR wins that don't come from delivering
real benefits to citizens, in fact I see this in a recent tweet by
Sunlight Labs's Ellen Miller:
Most of the raw data released by the OGD most likely isn't for you
to use.
She's grumbling, as does the Washington Post, about the results so far
from the Open Government Directive, which has prompted datasets of
questionable value to be added to data.gov. If this is the future,
where's my flying car? If this is open data, where's my damn
transparency?
There are some promising signs. The UK government data catalogue had a
long beta period where developers were working with the data. The UK
team built a community as well as a catalogue. That's not to say that
the UK effort is all gold--I saw plenty of frustration with RDF while
I was observing the developers--but it stands out simply for the
acknowledgement of users. Similarly, the UK's MySociety defined what
success is to them: they're all about building useful apps for
citizens, and open data is a means not an end to them.
So, after nearly a year in the Open Data trenches, I have some advice
for those starting or involved in open data projects. First, figure
out what you want the world to look like and why. It might be a lack
of corruption, it might be a better society for citizens, it might be
economic gain. Whatever your goal, you'll be better able to decide
what to work on and learn from your experiences if you know what
you're trying to accomplish. Second, build your project around users.
In my time working with the politicians and civil servants, I've
realised that success breeds success: the best way to convince them to
open data is to show an open data project that's useful to real
people. Not a catalogue or similar tool aimed at insiders, but
something that's making citizens, voters, constituents happy. Then
they'll get it.
My next project with Open New Zealand is to build a community of data
users. I want to see users supporting each other, I want to build a
tight feedback loop between those who want data and those who can
provide it, to create an environment where the data users can support
each other, and to make it easier to assess the value created by
government-released open data. Henry Kissinger said, "each success
only buys admission to a more difficult problem". I look forward to
learning what the next problem is.
This site is provided by OnlineGroups.Net, where you can start your own free online groups site, using the open source web-based mailing list manager GroupServer.