Which dataset has to be released first for the opendata?
Learning from the elder
The open source community learnt that choosing the right application to implement is much more important than initially estimated. The release in 1991 of the first linux kernel launched an international movement that nowadays has change global software market.
Although lots of libre software had been released before linux kernel (i.e. GNU ), it was not until a full operative system was available that the open source community become a relevant international movement.
One of the main reasons for this success is that the community is providing value not only for final users but also for a set of companies which are profiting from the results. Some of them return resources (code, diffusion, products) for the community which help to make it sustainable.
This is the reason why exclusive volunteer communities, or exclusively local public sponsored communities has an unclear future.
Now it is turn of the open data community
Open data community, specially those profiting from public sector information is lobbying public administrations to release as much information as possible. However in order to sustain a long-term community the election of the first datasets to be released is a key factor.
It is well known that geographic and transport information together with public bids is currently providing most of the value for the PSI professional reusers. However this is not necessary same way in a coming future.
I strongly doubt that geographic or transport information will provide most of the value. Not because of the undoubtful value of this datasets but for the comparison with some other datasets.
Where is the gold mine?
Spain, where I live, accounts for 65.000 million euros a year of fiscal fraud. Not only this, but the informal economy accounts for 23% of the GDP (245.000 M€).
It is hard to think that it is not possible by releasing some datasets to reduce these amounts.
1% of informal economy reduction means 4 times the estimated potential turnover for the reuse information sector in Spain. this sector currently which sums only 600 million euro. Fiscal fraud reduction of 1% means ta bit more than this full sector.
So, in terms of profiting and helping the opendata community those datasets helping to reduce informal economy and fiscal fraud should be one of the first datasets to be released.
There is always somebody looking at released data
So ‘with a little help of my friends’ the release of datasets for finding fiscal fraud or informal economy is condemned to succeed.
It is not needed to discover the 2.000 M€ in potential fiscal fraud but only with the menace of being discovered (specially for your competitors) the amount of people not fulfilling their fiscal obligations will be dramatically reduced.
I do not mean to release personal information, but I am pretty sure that relevant information for detecting fiscal fraud could be easily release to the public from the tax administrations.