Methodology for sampling public data sources

In order to carry on a contrast between the economic value created (based on information consumption) and reusability of data released, an extensive sampling of dataset sources has been performed. Unfortunately there is no sampled data about the creation of value classified by type or scope. Only information consumption is provided, and this indicator will be used as a hint of the economic impact.

First of all four categories in data sources accordingly to the scope of information published have been defined . Local, Regional, National and International. International has been rejected as long as there are not enough international data sources to contrast with. Additionally classifications of sources by type of information released share the criteria former reports (MEPSIR, 2006; Red.es, 2011; Red.es, 2012).

Sampling of sources follows this grid of data.

Scope / type information

Business

Geograph

Legal

Meteo

Social

Transport

Other

TOTAL

National

10

7

10

5

6

6

6

50

Regional

14

14

12

3

8

13

14

78

Local

11

10

13

8

11

10

13

76

TOTAL

35

31

35

16

25

29

33

204

Finding the sources has been a challenging task because the traditionally most completed catalogue of public data sources (CTIC map), seems to be outdated and without maintenance. Therefore some of registers have disappeared (i.e. Open Data Cordoba, Extremadura reutiliza) and some new ones were not available (i.e. Open Data Santander, Alcobendas INE, DGT).

Additionally, once the datasets have been determined for every source, in some cases dataset quality were so low that they were rejected as a valid sample and then changed to other equivalent sources. Detailed data about what entities have been sampled is included.

All the data for these data sources can be freely downloaded for non-commercial uses2

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.