by Tommaso Motta
Data risk management is the controlled process an organization uses when acquiring, storing, transforming, and using its data, from creation to retirement, to eliminate data risk; all these actions are made by informative systems. Within organizations the role of informative systems has become increasingly important, due to the increasing importance of the resource they manage: information. This type of resource is characterized by three dimensions:
- Intangibility, meaning that information is not physical matter that can be exchanged, but it is abstract.
- Self–regeneration, using an information grants the generation of new information
- Not perishability, utilizing information does not consume it, but rather regenerates it
But what is an information?
To determine it, it is necessary to define other concepts that build up into information, those are
- Data, a fact, a measure, or an instance of reality that has a unit of measure and that if taken alone does not have a meaning
- Information, a set of data that, if put into a context, start making sense when analysed by a competent person. It can be considered as the output of an interrogation of several data.
Knowledge, a set of information that together can make the decision maker take the optimal route before an event happens. In a sense, wisdom is the art to predict the future based on the information available.
How data are acquired
In the past, data were acquired manually through the acknowledgement of company’s report or Excel files; this process was burdensome in terms of time and resources needed to acquire the information, and later store them properly. Nowadays the system is way more rapid flexible since data are collected and acquired autonomously and are ready to be stored. One instrument that has been developed by FinScience is the Knowledge Discovery Machines, whose goal is to take as input documents – regardless of the format they have – and extract and synthetize those data in order to be ready and available to the user. The process of extraction starts with the digitalization and synthesis of documents, which is entirely made by deep learning algorithms, after that, data are classified into categories based on the subject of the synthesis, this is also done by machine learning tools. Finally, data are labelled and enriched with other information obtained through an alternative data analysis in order to have data constantly updated.
How data are stored
In the present days data can be stored in two ways: utilizing a physical storage or a virtual one. Physical storages include hard-drive and physical document folders and they have the advantage of being easily identifiable, but several cons, such as being hard to update, hard to store when there are too much information and also it is hard to find the information the user is explicitly searching. Therefore, the virtual solution is one that is having a grasp on companies, particularly on those which interact with a lot of data. The main solutions are:
- Data warehouses, an OLAP system (Online analytical processing) that is basically an online database used as a decision supporting system and characterized by a limited number of transaction it can execute simultaneously, and the strategic importance of the data it stores. Its architecture is composed by three components:
- The sources, all the internal and external sources from which the data that will populate the data warehouse is extracted
- The staging area, intermediate database between the sources and the data warehouse used in the population phase
- The data marts, small thematic data warehouses aimed at delocalizing and carrying information in case the data warehouse contains too much information, they are very useful to protect some parts of data in case of an attack on the data warehouse
- Data warehouses, an OLAP system (Online analytical processing) that is basically an online database used as a decision supporting system and characterized by a limited number of transaction it can execute simultaneously, and the strategic importance of the data it stores. Its architecture is composed by three components:
The process of upload of data into the data warehouse is fairly simple, since after the extraction sub-process, where the data selected by the user are extracted from the source, there is only an intermediate step where data is translated into one single language in order to make the information homogeneous and legible for the user, where Knowledge Discovery Machines – previously cited as tools to acquire data – come in handy; in fact during the extraction process, data starting as heterogeneous due to being taken from different sources (some documents might have been stored as photos, some as .pdf, some as .csv, etc…) are then converted into one single format, which the user can easily read and use for his analysis. Only after this step, data are uploaded in the data warehouse.
- Cloud computing storages, another solution that is gaining popularity is the cloud computing solution. The main difference from the previous one is that it is not necessary to access the system in a specific area (the company’s’ network, the company’s computer room… ), but anyone connected to the network with the right credential can enter the database, regardless of where the user is situated. There are 3 types of cloud computing services:
- IaaS (Infostructure as a Service), the physical infrastructural elements (servers, storage…) are made virtual machines on which users can install their software and applications
- PaaS (Platform as a Service), a platform capable of supporting developers in creating Cloud-ready applications
- SaaS (Software as a Service), provides for the provision of applications via the Web without the need for installations
The employment of data
Having collected and stored efficiently data is non-sufficient condition to a successful business; in fact, data that are sit still in a database and are not used, are as good as not having them at all.
For this reason, FinScience has developed a platform that utilizes both data stored in databases within the company that it is using the service, and alternative data, that are external to the company and are typically extracted from non-conventional sources, such as social media and Web pages, and are rich of weak signals – added information that can be identified within another information – which add further info that can be analysed.
- The results of the analysis are then shown in a really simple dashboard that calculates, among others, The Digital Popularity Value (DPV), that measures the popularity of a digital signal on the web related to specific companies or topics.
- The Sentiment measuring an entity’s perception within a specific environment (e.g. news, blog-posts, social posts, etc.). It can be positive, negative, or neutral.
- The Investors DPV is a DPV component calculated considering exclusively digital content related to the financial ecosystem.
- The DPV Volatility is the amount of DPV change an entity experiences over a given period of time.