Becoming a data-driven organization means using available data to make informed business decisions, but there’s an important prerequisite: the available data needs to be correct. To achieve a high level of accuracy and consistency, organizations often turn to a set of core data entities known as “master data.”
Master data serves as the foundation for an organization’s data-driven activities, providing a single, consistent view of critical data such as customer information, product details, supplier data, and employee records. The practice of improving your organization’s data quality by ensuring that identifiers and key data dimensions about entities are accurate and consistent is called master data management (MDM).
While it seems obvious that data quality is important, defining your master data management strategy is not an endeavor to be taken lightly. If your organization is currently using a more ad-hoc approach to data management (e.g., every department handles data differently, data sanitization practices are unclear, etc.), it’s certain that your data somewhere doesn’t match expectations, and is likely leading your teams astray.
In this article, you’ll learn what an MDM strategy is, why you need it, and the components of a strong MDM strategy. You’ll also learn a bit about the tools and organizational structures required to implement an MDM, regardless of your company size or industry.
Why You Need an MDM Strategy
Data is likely being produced in all corners of your organization. Consequently, many stakeholders are involved. Before companies adopt an MDM strategy, they might be said to have an “ad-hoc” data management approach, meaning that each department or team handles data in its own way.
An ad-hoc approach often leads to failure because individual initiatives operate in isolation. Instead of working towards common goals, departments or teams end up working at in silos, duplicating efforts, and wasting resources.
Ad-hoc data management strategies also tend to focus on technology rather than business effort or results. This may lead to a situation where technology solutions are implemented without a clear understanding of the business requirements, resulting in costly investments in technology that do not deliver the intended value.
What is an MDM Strategy?
A master data management strategy is used to manage and maintain a centralized, consistent, and accurate view of the data used across an organization. This helps ensure the quality, accuracy, and consistency of data across business units. Teams leading MDM efforts are taking on significant responsibility, and their success can have a substantial impact on an organization.
An MDM strategy typically considers the following areas:
Governance is designed to ensure data quality, consistency, and accuracy. It involves defining data standards, identifying data owners, and establishing guidelines for data use and access. Questions that should be answered include:
- What are the critical data elements (CDEs) that need to be managed across the organization, and who owns them?
- What are the data retention policies, and how are they enforced to ensure compliance with regulations?
- What are the procedures for managing data changes and updates, and how are they documented and tracked?
- How is data governance enforced and audited to ensure compliance and adherence to data policies and standards?
Modeling involves defining an organization-wide data model that provides a clear understanding of the relationships between different entities. It ensures that data is organized in a consistent and logical way across the organization, and answers questions like:
- How will data entities be identified and defined (including their key attributes) in the organization’s data model?
- What are the relationships between different data entities, and how will these relationships be represented?
- What are the rules for creating, updating, and deleting data entities, and how will these rules be enforced?
Overseeing data quality requires a set of data quality standards, the identification of existing and potential data quality issues, and implementing processes to improve data quality over time. It will answer questions like:
- How will data quality be measured and monitored?
- What are the consequences and costs of poor data quality?
- How will data quality metrics be communicated across the organization?
- What processes will be put in place to ensure ongoing data quality?
As the number of data sources you have grows, integrating this data from across your organization becomes an increasingly critical job. As you consider this part of your MDM strategy, consider:
- What are the different data sources within the organization?
- How might these data sources vary or be misaligned?
- What are the data integration technologies and tools that can be used to automate this process?
Finally, your MDM strategy should define the security measures required to protect data from unauthorized access, loss, or corruption. This will answer questions like:
- Who has access to the master data, and how is access granted and managed?
- How is data encrypted, and what key management processes are in place to secure data?
- How is data backed up and restored in the event of a data loss or security breach?
- How is data privacy and confidentiality maintained, and does it vary by region as legal requirements vary?
I’ve already hinted at some of the benefits of having an MDM, but in the next section, we’ll look at them more deeply.
Benefits of Having an MDM Strategy
In an ad-hoc approach, business requirements are often defined a posteriori and are the result of progressive insight.
Since an MDM pinpoints the current state of data management and focuses on the business and functional requirements, the target state is clear from the start. The result is that fewer resources are wasted because the priorities and the accompanying efforts are all part of an established roadmap aimed at reaching the target state.
With all business and functional requirements outlined, defining an adequate MDM technology stack is much easier, whether it results in a build or buy approach. Requests for proposals will be closer to the mark, a proof of concept will generate business value faster, and the evaluation of various solutions should become a much simpler exercise.
Finally, with the current state, the target state, the technology stack, and the roadmap defined, the definition of success and the metrics used to measure it can easily be formulated. Some typical MDM-related metrics are:
- The number of identified inaccuracies in the data
- The percent of data products containing measures and metrics to evaluate their adoption and accuracy
- The speed and efficiency of data reconciliation processes
- The number of duplicate records in the system
- The frequency and number of data requests from business units
- The number of data quality issues reported and resolved
- The number of successful data merges
Finally, by implementing an MDM strategy, you can help your entire organization see the benefits they’ll get by working together to carry the strategy out.
Drafting an MDM Strategy
Rather than reinventing the wheel, organizations should rely on established best practices and frameworks when defining their master data management strategy. The COBIT Governance and Management Objectives provides a great starting point for building an MDM strategy, so this section is heavily influenced by that framework.
In a previous section, I described how MDM involves many stakeholders because of the omnipresent character of data. This is why the number of processes that need to be established is very high. While these processes will vary based on the nature of your data and business, there are four I’d like to highlight:
1. Systematic Data Quality Approach
The most popular descriptions of data quality involve the following six dimensions:
Since MDM is all about establishing a golden standard for your organization’s data, an MDM strategy should contain the processes and techniques that should be used to perform data quality tests.
A systematic approach to data quality implies that data is tested on a regular basis. Typically, data testing can occur in three different phases:
- Tests on the serving layer
- Tests throughout the data processing pipelines
- Tests embedded in the data modeling
However, not all data quality issues are detected a priori. For the unknown unknowns, there should be a systematic collection and processing of data quality issues reported by stakeholders.
Finally, you have to consider the source of all your data and its impact on quality. For example, if your business relies on location data, it’s important to keep it up-to-date as boundaries, addresses, and postcodes change quite often. Reliable datasources (GeoPostcodes for instance) can help, but your organization is ultimately responsible for all imported data, regardless of the source.
2. Systematic Data Cleansing Approach
Data cleansing is known by many names and done in many ways, but it is highly contingent on the chosen data integration paradigm: ETL or ELT.
The former cleanses data before loading it into the data warehouse, the latter loads everything into the data warehouse, and the data cleansing is part of data modeling. By defining the data cleansing approach and standards, data providers can be held accountable for the data products through SLAs.
3. Systematic Data Product Lifecycle Approach
The demand and supply of data assets are never in equilibrium. Generally, the number of data points that an enterprise produces is too much to maintain forever, especially if the dataset grows exponentially over time.
That’s why there should be a systematic approach to deciding which data is onboarded, cleansed, and served. Furthermore, the approach should define how and when data is documented and how changes to the data are implemented.
Establishing processes that describe how and which data should be onboarded or cleansed is one thing, but making these processes clear to all stakeholders is another challenge entirely. An important part of any MDM process is to establish the way data issues are communicated to ensure that MDM efforts gain traction across the entire organization.
Besides the processes, there should be organizational structures that are accountable and responsible for their implementation.
A popular technique is the RACI matrix, which maps responsibilities to roles within an organization. For most of the processes, a data management function (such as a Director of Data Management) should be ultimately accountable.
When it comes to implementing processes and managing policies, responsibilities could be set up horizontally or vertically.
A vertical approach implies that responsibilities are scoped by the data product lifecycle. Data cleansing policies are the responsibility of the data engineering manager, data quality policies are part of the business intelligence manager’s job, and the data lifecycle policies are assigned to the portfolio manager.
A horizontal approach implies that responsibilities are scoped by the data entities. If there is a product owner for each entity (customer, product, partner, etc.), each of the product owners will be responsible for all policies, but only within their entity.
A horizontal approach can be more scalable, but it may also lead to divergence in implementation as each entity’s owner may interpret the strategy differently.
People, Skills, and Capabilities
Your MDM strategy should also outline the people, skills, or capabilities required for implementing and maintaining the master data. Essential skillsets include:
- Data and/or analytics engineering: extracting, loading, and transforming the data to turn it into a data product that can easily be consumed;
- Data analytics: supporting business stakeholders with their business questions regarding the data and how it should be interpreted or used;
- Data management: setting the principles and guidelines for the data, defining the data model, specifying the serving layer, etc.
One of the most common pitfalls in an MDM effort is handling it exclusively from a technological lens. Tools should not dictate the strategy; strategies should dictate the tools.
Nevertheless, infrastructural decisions need to be made, and there are four architectural MDM paradigms to consider:
A registry approach is the most lightweight way to handle master data management. Instead of processing and cleaning the data, it lets the data remain where it sits. It is read-only and creates a registry of existing data assets by matching data from various sources via unique keys. This approach holds the benefits that no data is duplicated, there are no regulatory implications, and no data is overwritten in source systems.
The big data revolution has enabled near-free storage and low-cost processing in the cloud, so consolidating data has started to top many enterprise priority lists. The benefits are that the process of integrating data is unidirectional and fairly easy to manage. Many DataOps best practices are built around this data pipeline approach, in which data at rest (but nowadays often also in-flight) is cleaned and processed in a single stack to be consumed for analytical purposes.
A coexistence approach adds complexity to the consolidation approach. Instead of only synchronizing data from production systems to the central hub (most often a data warehouse), the synchronization also happens in the other direction. This approach has two main benefits: there is a single version of the truth in each and every system, and the processed data can be used for operational purposes.
Instead of duplicating data to a centralized hub, as in the consolidation or coexistence approach, the transaction approach sees the centralized hub as a data cleaning and enhancement tool. Source systems can subscribe to the hub to enhance or correcting their data. The benefit is that you get to update/correct the data in the source systems via a transparent set of rules while also using it for operational purposes without compromising legal obligations due to data duplication.
Policies and Procedures
An MDM strategy should loosely define the several policies that define how the MDM is handled. Some policies that may be defined include:
- Data cleansing policy: Prescribes frequency, guidelines, and accountability; documents available means (methods, solutions, and tools) through which data cleansing challenges are tackled
- Data management policy: Prescribes how a data product moves through its lifecycle, from ideation to creation, production, deployment, and archival
- Data quality assessment policy: Prescribes how data quality tests are implemented and what they test for
A final aspect of a data management strategy is to outline the desired organizational culture in which to embed all previous components. Two cultural properties are relevant with regard to MDM:
- Shared responsibility of data assets: although the accountability and responsibilities are clearly described within an MDM strategy, everyone is responsible for spotting and reporting data quality issues.
- Awareness around integrity, quality, accuracy & completeness: various efforts should be set up to ensure that all stakeholders know what to expect from the master data.
Hopefully, you can see that an ad-hoc approach to master data management has flaws and that a well-defined strategy is your best bet for achieving success with MDM. Data integrity and accuracy are paramount to maintaining business agility and profitability in the long run, so carefully considering your MDM strategy is essential.
Because data is so omnipresent within modern organizations, your MDM strategy will influence many processes, actors, and organizational structures. Creating this strategy isn’t going to happen overnight, but this article has given you a high-level blueprint on how to get started.
If you’d like to learn more about MDM tools, check out our article comparing the most popular ones.
Note: this article applies the APO02 and APO14 management objectives of the Control Objectives for Information and Related Technology (COBIT) IT governance control framework to MDM strategy.