Master data management

Last updated

Master data management (MDM) is a technology-enabled discipline in which business and information technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official shared master data assets. [1] [2]

Contents

Drivers for master data management

Organisations, or groups of organisations, may establish the need for master data management when they hold more than one copy of data about a business entity. Holding more than one copy of this master data inherently means that there is an inefficiency in maintaining a "single version of the truth" across all copies. Unless people, processes and technology are in place to ensure that the data values are kept aligned across all copies, it is almost inevitable that different versions of information about a business entity will be held. This causes inefficiencies in operational data use, and hinders the ability of organisations to report and analyze. At a basic level, master data management seeks to ensure that an organization does not use multiple (potentially inconsistent) versions of the same master data in different parts of its operations, which can occur in large organizations.

Other problems include (for example) issues with the quality of data, consistent classification and identification of data, and data-reconciliation issues. Master data management of disparate data systems requires data transformations as the data extracted from the disparate source data system is transformed and loaded into the master data management hub. To synchronize the disparate source master data, the managed master data extracted from the master data management hub is again transformed and loaded into the disparate source data system as the master data is updated. As with other Extract, Transform, Load-based data movement, these processes are expensive and inefficient to develop and to maintain which greatly reduces the return on investment for the master data management product.

There are a number of root causes for master data issues in organisations. These include:

  1. Business unit and product line segmentation
  2. Mergers and acquisitions

Business unit and product line segmentation

As a result of business unit and product line segmentation, the same business entity (such as Customer, Supplier, Product) will be serviced by different product lines; redundant data will be entered about the business entity in order to process the transaction. The redundancy of business entity data is compounded in the front- to back-office life cycle, where the authoritative single source for the party, account and product data is needed but is often once again redundantly entered or augmented.

A typical example is the scenario of a bank at which a customer has taken out a mortgage and the bank begins to send mortgage solicitations to that customer, ignoring the fact that the person already has a mortgage account relationship with the bank. This happens because the customer information used by the marketing section within the bank lacks integration with the customer information used by the customer services section of the bank. Thus the two groups remain unaware that an existing customer is also considered a sales lead. The process of record linkage is used to associate different records that correspond to the same entity, in this case the same person.

Mergers and acquisitions

One of the most common reasons some large corporations experience massive issues with master data management is growth through mergers or acquisitions. Any organizations which merge will typically create an entity with duplicate master data (since each likely had at least one master database of its own prior to the merger). Ideally, database administrators resolve this problem through deduplication of the master data as part of the merger. In practice, however, reconciling several master data systems can present difficulties because of the dependencies that existing applications have on the master databases. As a result, more often than not the two systems do not fully merge, but remain separate, with a special reconciliation process defined that ensures consistency between the data stored in the two systems. Over time, however, as further mergers and acquisitions occur, the problem multiplies, more and more master databases appear, and data-reconciliation processes become extremely complex, and consequently unmanageable and unreliable. Because of this trend, one can find organizations with 10, 15, or even as many as 100 separate, poorly integrated master databases, which can cause serious operational problems in the areas of customer satisfaction, operational efficiency, decision support, and regulatory compliance.

Another problem concerns determining the proper degree of detail and normalization to include in the master data schema. For example, in a federated HR environment, the enterprise may focus on storing people data as a current status, adding a few fields to identify date of hire, date of last promotion, etc. However this simplification can introduce business impacting errors into dependent systems for planning and forecasting. The stakeholders of such systems may be forced to build a parallel network of new interfaces to track onboarding of new hires, planned retirements, and divestment, which works against one of the aims of master data management.

People, process and technology

Master data management is enabled by technology, but is more than the technologies that enable it. An organisation's master data management capability will include also people and process in its definition.

People

Several roles should be staffed within MDM. Most prominently the Data Owner and the Data Steward. Probably several people would be allocated to each role, each person responsible for a subset of Master Data (e.g. one data owner for employee master data, another for customer master data).

The Data Owner is responsible for the requirements for data quality, data security etc. as well as for compliance with data governance and data management procedures. The Data Owner should also be funding improvement projects in case of deviations from the requirements.

The Data Steward is running the master data management on behalf of the data owner and probably also being an advisor to the Data Owner.

Process

Master data management can be viewed as a "discipline for specialized quality improvement" [3] defined by the policies and procedures put in place by a data governance organization. It has the objective of providing processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing master data throughout an organization to ensure a common understanding, consistency, accuracy and control, [4] in the ongoing maintenance and application use of that data.

Processes commonly seen in master data management include source identification, data collection, data transformation, normalization, rule administration, error detection and correction, data consolidation, data storage, data distribution, data classification, taxonomy services, item master creation, schema mapping, product codification, data enrichment, hierarchy management, business semantics management and data governance.

Technology

A master data management tool can be used to support master data management by removing duplicates, standardizing data (mass maintaining), [5] and incorporating rules to eliminate incorrect data from entering the system in order to create an authoritative source of master data. Master data are the products, accounts and parties for which the business transactions are completed.

Where the technology approach produces a "golden record" or relies on a "source of record" or "system of record", it is common to talk of where the data is "mastered". This is accepted terminology in the information technology industry, but care should be taken, both with specialists and with the wider stakeholder community, to avoid confusing the concept of "master data" with that of "mastering data".

Implementation models

There are a number of models for implementing a technology solution for master data management. These depend on an organisation's core business, its corporate structure and its goals. These include:

  1. Source of record
  2. Registry
  3. Consolidation
  4. Coexistence
  5. Transaction/centralized
Source of record

This model identifies a single application, database or simpler source (e.g. a spreadsheet) as being the "source of record" (or "system of record" where solely application databases are relied on). The benefit of this model is its conceptual simplicity, but it may not fit with the realities of complex master data distribution in large organisations.

The source of record can be federated, for example by groups of attribute (so that different attributes of a master data entity may have different sources of record) or geographically (so that different parts of an organisation may have different master sources). Federation is only applicable in certain use cases, where there is clear delineation of which subsets of records will be found in which sources.

The source of record model can be applied more widely than simply to master data, for example to reference data.

Transmission of master data

There are several ways in which master data may be collated and distributed to other systems. [6] This include:

  1. Data consolidation – The process of capturing master data from multiple sources and integrating into a single hub (operational data store) for replication to other destination systems.
  2. Data federation – The process of providing a single virtual view of master data from one or more sources to one or more destination systems.
  3. Data propagation – The process of copying master data from one system to another, typically through point-to-point interfaces in legacy systems.

Change management in implementation

Master data management can suffer in its adoption within a large organization if the "single version of the truth" concept is not affirmed by stakeholders, who believe that their local definition of the master data is necessary. For example, the product hierarchy used to manage inventory may be entirely different from the product hierarchies used to support marketing efforts or pay sales reps. It is above all necessary to identify if different master data is genuinely required. If it is required, then the solution implemented (technology and process) must be able to allow multiple versions of the truth to exist, but will provide simple, transparent ways to reconcile the necessary differences. If it is not required, processes must be adjusted. Without this active management, users that need the alternate versions will simply "go around" the official processes, thus reducing the effectiveness of the company's overall master data management program.

See also

Related Research Articles

<span class="mw-page-title-main">Data warehouse</span> Centralized storage of knowledge

In computing, a data warehouse, also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.

A management information system (MIS) is an information system used for decision-making, and for the coordination, control, analysis, and visualization of information in an organization. The study of the management information systems involves people, processes and technology in an organizational context.

A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product for a particular customer or customers. Business processes occur at all organizational levels and may or may not be visible to the customers. A business process may often be visualized (modeled) as a flowchart of a sequence of activities with interleaving decision points or as a process matrix of a sequence of activities with relevance rules based on data in the process. The benefits of using business processes include improved customer satisfaction and improved agility for reacting to rapid market change. Process-oriented organizations break down the barriers of structural departments and try to avoid functional silos.

<span class="mw-page-title-main">Information management</span> Organisational activity concerning information lifecycle

Information management (IM) is the appropriate and optimized capture, storage, retrieval, and use of information. It may be personal information management or organizational. IM for organizations concerns a cycle of organizational activity: the acquisition of information from one or more sources, the custodianship and the distribution of that information to those who need it, and its ultimate disposal through archiving or deletion.

Record linkage is the task of finding records in a data set that refer to the same entity across different data sources. Record linkage is necessary when joining different data sets based on entities that may or may not share a common identifier, which may be due to differences in record shape, storage location, or curator style or preference. A data set that has undergone RL-oriented reconciliation may be referred to as being cross-linked.

Data migration is the process of selecting, preparing, extracting, and transforming data and permanently transferring it from one computer storage system to another. Additionally, the validation of migrated data for completeness and the decommissioning of legacy data storage are considered part of the entire data migration process. Data migration is a key consideration for any system implementation, upgrade, or consolidation, and it is typically performed in such a way as to be as automated as possible, freeing up human resources from tedious tasks. Data migration occurs for a variety of reasons, including server or storage equipment replacements, maintenance or upgrades, application migration, website consolidation, disaster recovery, and data center relocation.

Enterprise content management (ECM) extends the concept of content management by adding a timeline for each content item and, possibly, enforcing processes for its creation, approval and distribution. Systems using ECM generally provide a secure repository for managed items, analog or digital. They also include one methods for importing content to bring manage new items, and several presentation methods to make items available for use. Although ECM content may be protected by digital rights management (DRM), it is not required. ECM is distinguished from general content management by its cognizance of the processes and procedures of the enterprise for which it is created.

Enterprise software, also known as enterprise application software (EAS), is computer software used to satisfy the needs of an organization rather than individual users. Such organizations include businesses, schools, interest-based user groups, clubs, charities, and governments. Enterprise software is an integral part of a (computer-based) information system; a collection of such software is called an enterprise system. These systems handle a number of operations in an organization to enhance the business and management reporting tasks. The systems must process the information at a relatively high speed and can be deployed across a variety of networks.

Product information management (PIM) is the process of managing all the information required to market and sell products through distribution channels. This product data is created by an internal organization to support a multichannel marketing strategy. A central hub of product data can be used to distribute information to sales channels such as e-commerce websites, print catalogues, marketplaces such as Amazon and Google Shopping, social media platforms like Instagram and electronic data feeds to trading partners. Moreover, the significant role that PIM plays is reducing the abandonment rate by giving better product information.

Data architecture consist of models, policies, rules, and standards that govern which data is collected and how it is stored, arranged, integrated, and put to use in data systems and in organizations. Data is usually one of several architecture domains that form the pillars of an enterprise architecture or solution architecture.

Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial and scientific domains. Data integration appears with increasing frequency as the volume and the need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. Data integration encourages collaboration between internal as well as external users. The data being integrated must be received from a heterogeneous database system and transformed to a single coherent data store that provides synchronous data across a network of files for clients. A common use of data integration is in data mining when analyzing and extracting information from existing databases that can be useful for Business information.

Real-time business intelligence (RTBI) is a concept describing the process of delivering business intelligence (BI) or information about business operations as they occur. Real time means near to zero latency and access to information whenever it is required.

In information science and information technology, single source of truth (SSOT) architecture, or single point of truth (SPOT) architecture, for information systems is the practice of structuring information models and associated data schemas such that every data element is mastered in only one place, providing data normalization to a canonical form. Any possible linkages to this data element are by reference only. Because all other locations of the data just refer back to the primary "source of truth" location, updates to the data element in the primary location propagate to the entire system, providing multiple advantages simultaneously: greater efficiency/productivity, easy prevention of mistaken inconsistencies, and greatly simplified version control. Without SSOT architecture, rampant forking impairs clarity and productivity, imposing laborious maintenance needs.

Master data represents "data about the business entities that provide context for business transactions". The most commonly found categories of master data are parties, products, financial structures and locational concepts.

An information server is an integrated software platform consisting of a set of core functional modules that enables organizations to integrate data from disparate sources and deliver trusted and complete information, at the time it is required and in the format it is needed. Similar to how an application server is a software engine that delivers applications to client computers, an information server delivers consistent information to consuming applications, business processes and portals.

Microsoft SQL Server Master Data Services (MDS) is a Master Data Management (MDM) product from Microsoft that ships as a part of the Microsoft SQL Server relational database management system. Master data management (MDM) allows an organization to discover and define non-transactional lists of data, and compile maintainable, reliable master lists. Master Data Services first shipped with Microsoft SQL Server 2008 R2. Microsoft SQL Server 2016 introduced enhancements to Master Data Services, such as improved performance and security, and the ability to clear transaction logs, create custom indexes, share entity data between different models, and support for many-to-many relationships.

Reference data is data used to classify or categorize other data. Typically, they are static or slowly changing over time.

ISO 8000 is the global standard for Data Quality and Enterprise Master Data. It describes the features and defines the requirements for standard exchange of Master Data among business partners. It establishes the concept of Portability as a requirement for Enterprise Master Data, and the concept that true Enterprise Master Data is unique to each organization.

Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can provide a single customer view of the overall data.

A metadata repository is a database created to store metadata. Metadata is information about the structures that contain the actual data. Metadata is often said to be "data about data", but this is misleading. Data profiles are an example of actual "data about data". Metadata adds one layer of abstraction to this definition– it is data about the structures that contain data. Metadata may describe the structure of any data, of any subject, stored in any format.

References

  1. "Gartner Glossary: Master Data Management". Gartner. Retrieved 6 June 2020.
  2. Rouse, Margaret (2018-04-09). "Definition from WhatIs.com". SearchDataManagement. Retrieved 2018-04-09.
  3. DAMA-DMBOK Guide, 2010 DAMA International
  4. "Learn how to create a MDM change request – LightsOnData". LightsOnData. 2018-05-09. Retrieved 2018-08-17.
  5. Jürgensen, Knut (2016-05-16). "Master Data Management (MDM): Help or Hindrance?". Simple Talk. Retrieved 2018-04-09.
  6. "Creating the Golden Record: Better Data Through Chemistry", DAMA, slide 26, Donald J. Soulsby, 22 October 2009