Methodology for sampling public data sources

In order to carry on a contrast between the economic value created (based on information consumption) and reusability of data released, an extensive sampling of dataset sources has been performed. Unfortunately there is no sampled data about the creation of value classified by type or scope. Only information consumption is provided, and this indicator will be used as a hint of the economic impact.

First of all four categories in data sources accordingly to the scope of information published have been defined . Local, Regional, National and International. International has been rejected as long as there are not enough international data sources to contrast with. Additionally classifications of sources by type of information released share the criteria former reports (MEPSIR, 2006; Red.es, 2011; Red.es, 2012).

Sampling of sources follows this grid of data.

Scope / type information	Business	Geograph	Legal	Meteo	Social	Transport	Other	TOTAL
National	10	7	10	5	6	6	6	50
Regional	14	14	12	3	8	13	14	78
Local	11	10	13	8	11	10	13	76
TOTAL	35	31	35	16	25	29	33	204

Finding the sources has been a challenging task because the traditionally most completed catalogue of public data sources (CTIC map ), seems to be outdated and without maintenance. Therefore some of registers have disappeared (i.e. Open Data Cordoba, Extremadura reutiliza) and some new ones were not available (i.e. Open Data Santander, Alcobendas INE, DGT).

Additionally, once the datasets have been determined for every source, in some cases dataset quality were so low that they were rejected as a valid sample and then changed to other equivalent sources. Detailed data about what entities have been sampled is included.

All the data for these data sources can be freely downloaded for non-commercial uses ²

OPEN

Methodology for sampling public data sources

Leave a Reply Cancel reply