In Kreuzberg, Berlin, a team of 12 is working on Wikidata – a €1.3 million project, backed by Google, that will transform the way Wikipedia works.
The project aims to build a free access, user-editable database to sit under Wikipedia, now the world’s fifth largest website. It sounds simple but is a huge leap forward for Wikipedia and the so-called semantic web, the push to bring more overall structure to data available online. All going well, Wikidata and its free software could become the base for other new raw data-based technology and platforms.
Wikidata spokeswoman Lydia Pintscher says the team is already fielding queries from researchers interested in uploading huge amounts of scientific data, something editors will need to decide whether to allow or encourage as parallel projects.
Web giant Google is also interested and is providing initial project funding (€325,000) alongside the Allen Institute for Artificial Intelligence (€650,000) and the San Francisco-based Gordon and Betty Moore Foundation (€325,000).
Google’s director of open source Chris DiBona says Google hopes Wikidata will make large amounts of structured data available for everyone. TechCrunch speculates Google may also use Wikidata to add more facts and direct answers to search result pages.
Languages, lists and the population of Israel
The Wikidata project kicked off in late March this year and is expected to finish mid-2013. Denny Vrandečić, co-founder of Semantic MediaWiki and research associate at the Karlsruhe Institue of Technology, leads the project team from Berlin.
The project’s first phase of three involves synching Wikipedia’s language links (left), currently kept in synch by editors or bots.
Phase two will allow the synching and sharing infobox data (below right), rather than editors needing to update Wikipedia’s millions of pages every time a country’s president changes, say.
This makes total sense for efficiency but does create challenges. Wikipedia is built on the idea that different country or community groups of editors are free to decide on the data in the pages they manage – a simple task for the capital of Japan, more complicated for the correct population of Israel.
To deal with the risk of marginalising some groups in favour of Wikipedia’s core group of “largely young, white, male, and well-educated” editors” (as the Oxford Internet Institute’s Mark Graham puts it), groups of editors will be able to opt-in to Wikidata. Editors may also be able choose which version of data they access.
“Wikidata will not be about The Truth,” Vrandečić wrote in his lengthy response to Graham’s article. “We do not expect the editors to agree on the population of Israel, but we do expect them to agree on what specific sources claim about the population of Israel.”
The project’s third phase will enable automatic list generation – again, rather than editors needing to compile and update thousands of lists by hand.
Want to get involved?
It’s still early days for the Wikidata team but, as users and technology grapple with an increasing bulk of online infomation, it’s definitely a project to watch.