Data factory or how to create new data from others

When the time comes for generating valuable data for our business, we can distinguish 2 types of data:

  1. 1. The data that has value for my business by itself
  2. 2. The data that needs normalization treatment, and/or connection with other data in order to be considered valuable information

This activation of the data, or its transformation so that the business areas improve their strategies define the concept of “Factory”.

A factory defines an-establishment equipped with machinery, tools and facilities necessary for manufacturing objects, obtaining certain products or transforming an energy source. To adapt this definition of the RAE to the digital world, we can make the following parallelism:

Machinery and facilities: Big data and treatment power

Tools: Processes, algorithms and artificial intelligence

data transformation

The processes involved to give value to the data are the following:

  1. 1. Identification of the data to be processed
  2. 2. Automation of data extraction
  3. 3. Basic data processing (normalization, deduplication)
  4. 4. Data labeling (or creation of dictionaries and taxonomies)
  5. 5. Optional: Create a Knowledge Graph
  6. 6. Simple or complex algorithms applied to data (AI/ML)
  7. 7. Integration of results for consumption: Publications, enrichment of other systems (BI, CRM…)
  8. 8. Verification of the integrity of the information

What everyone is looking for with the intelligent use of data is the information, sometimes hidden, behind that data.

Once the information is discovered, you have to know how to activate it at the business level. This translates into use cases, projects and/or services that consume this data: KYC & KYB (knowing your customers and potential customers, your competitors, your suppliers), market research, campaign preparation, impact measurement, management of fraud, analysis of geographical coverage of services, improvement of predictive mathematical models and so on. Next we will see some examples.

Energy Eficiency Certificate (EEC)

Due to the growing awareness of clients, banking entities and insurance companies regarding climate change and the correct fulfillment of their obligations in terms of corporate social responsibility, the need arises to imminently have the Energy Efficiency Certificates (EEC) of its portfolio of real estate properties.

Currently less than 20% of the entire real estate park in Spain has an official EWC. At Deyde DataCentric, a system has been developed that allows real energy labels to be extracted from the different sources that publish them. And for properties without this official certification, mathematical models have been developed that, fed by data from real certifications of TINSA and appraisals, allow estimating the letters and numbers of emissions and consumption.

Enviromental Risks

The growing of extreme natural phenomena as a result of climate change generates the need to control the risk of the insured assets as much as possible

For this, a series of cartographic layers have been generated with information on the existence of natural risks for the spanish national territory, which will be incorporated at the registration level. There are 3 different layers corresponding to:

  • Flood risk (river and sea)
  • Risk of desertification.
  • Seismic risk.

In each of these layers, in addition to the indicators associated with each type of risk, several additional indicators have been built as the frequency indicator,which provides information on the probability or the magnitude indicator,which provides information on the expected damage.

Digital Maturity

The digital maturity of a company is not a fact that exists as such in any information source, but it can be an important variable when it comes to identify potential clients of technological products

In this case we start from the digital footprint of the companies, which corresponds to all the information that can be obtained from their domains and web pages.

After securely associating a company with its domains, at Deyde DataCentric we apply a series of processes based on NER (Name Entity Recognition) and NLP (Natural language processing) to extract the information from this raw data.

Through different indicators that we extract from this digital footprint, we have created an indicator of Digital Maturity of these companies and their evolution over time.

Home Reconstruction Value

One of the main parameters to estimate the value of a home is the reconstruction value of the property. The value is obtained by multiplying the square meters of built area by the average reconstruction value of a house with the same characteristics. In other words, it is not just about square meters, but also the type of housing, predominant construction materials and geographical area.

The banking sector, for example, needs a value as close to reality to have a reliable estimate of a property when a mortgage is foreclosed, and the insurance sector uses it to calculate premiums.

We see that in both cases there is a lack of processes that allow these data to be extracted, normalized and transformed into useful information for the business areas. This set of processes will have to be executed cyclically with tools and methodologies that ensure accurate and up-to-date information.

Alingment of the data strategy in the company

Many times the business areas also know what information could improve their decision-making, but they do not know where is the data that would create this information.

business areas

The Data factory is a joint responsibility of several areas within the company that must align their objectives to give coherence and meaning to the data strategy.

  • Business areas by:

-> exposing your business strategies

-> adapting to change

  • The systems and operations areas by:

-> making available to everyone the necessary means for handling data throughout the life cycle

  • The areas of analytics and advanced analytics by:

-> generating and exposing solutions adapted to each one.

In this sense, with the end to end Pyramid solution,at Deyde DataCentric we help activate the data, enriching it with valuable information, allowing the different business areas to have a “data oriented” vision, so that the different teams and managers of your company make better decisions in less time.

In Pyramid you will find a set of unique and already treated data that have been converted into valuable information.

  • Business (B2B): Universe of companies, individual entrepreneurs and organizations together with their associated commercial information
  • Context (B2C): Sociodemographic, economic, real estate and meteorological indicators that qualify the environment of a location.
  • Geo:Cartographic layers and associated physical information that allow the description, division and geographical characterization of the Spanish territory.
  • Digital: Spanish web data and audiences with online origin, digital footprint of companies through weekly crawling of more than 250 million web pages.

The advantages of this solution to integrate this consumption into normalized cycles, as understood in the concepts of DataMesh or DataFabric, are:

  • Simple and direct access for SQL-like queries
  • Generic APIs for unitary or massive consumption
  • The creation of custom scripts (Python, R…)
  • The connector with SALESFORCES and other market tools

This creates an agile dynamic for the use of data, allowing its integration into DataOps-type methodologies, or MLOps.

Olivier Lefauconnier

Business Development Manager