Last posts

Lack of data standardization will help to destroy our world. Time matters.

December 19, 2022 / maestro / agile standardization

SHARING DATA IS REQUIRED FOR A BETTER WORLD

There is not much discussion about the relevance of data as the primary source for AI, impact analysis, analytics, etc. Hopefully, these technologies will help us to make this a better world. We are urged to substantially fix climate change in less than 10 years. Could you imagine the huge amounts of data required to agree on what to do, how to assess it, and how effective the different taken measures are?

THE HIDDEN VILLAIN

However, there is a villain in our data. Not their poor quality (This is another) but even with the most excellent quality, a few times all the data come from the same source. It means that the data should be compatible, said in more technical words, data sources have to share the same standard.

Therefore perfect data from different sources could lead to incredible disasters. The problem is that in this digital world the size of data grows at a 25% rate annually (depending on different sources), and the different types of data could grow at the same rate. One indicator of this is the % of standardized datasets in open data portals. In these two reports (2019 and 2021) the standardization rate decreased from 25% to 19% in Spain in the data made available from open data portals, when the amount of data increased by 62%. So looks like more data but less standardized.

Could we speed up the work of the standardization bodies to fulfill this growth? I would not put my money on it. Currently, a period of 3 years is common when standardizing data. A review of a standard could take more than one year. These time spans look like an insult in a sector where innovation is assessed in terms of weeks, eventually months. Could a project wait for those periods? The answer is no, and as a consequence, the project creates a new way of coding the data, resulting in a growing divergence between data sources.

AGILE STANDARDIZATION IS A COMPLEMENTARY APPROACH

Agile standardization is based on these seven principles

0. Don't just standardize, be agile and standardize
1. Do not reinvent the wheel
2. Normalize real cases
3. Be open
4. Don't be overly specific
5. Flat not Deep
6. Sustainability is key

Thus, seven principles have been documented in the MAS manifesto which has been openly released.

Currently, these are the 7 principles, extracted from the experience of 3 years of work and more than 1000 data models on the Smart DAta Models initiative. Based on these 7 principles the release time for a new data model when all input data are available could take less than a week. In this period, 7 human-readable translations are created (EN, SP, FR, DE, IT, JA, CHI), a dedicated search database is updated, linked data services are updated, examples are generated and made available, contributors are credited in a specific database, likewise, adopters of the data model and a technical validator is made available and some other technical elements are released. It could be 100 times shorter (but of course not the same outcomes are created)

Agile standardization supports working projects in several ways:

  • Providing free and open licensed data models, also customizable and extendible.
  • Help to map already open and adopted standards (and make it feasible or just simpler)
  • Support them in the generation of new data models when nothing else is available.

But most important of all is to do it timely compared with data market speed.

CONCLUSION

If we really want to make a better world and to fix serious problems like climate change, inequality, and corruption, those where data play a relevant role, you will have to adopt agile standardization the sooner the better.

 

 

Comments are currently closed.