Improve the data! That’s the goal of every Product Information Management (PIM) project, right? Make it more valuable. Make it more accessible. Standardize it. Make it more trustworthy. Improve the data! A great way to make sure this happens is to dedicate a work stream to Data Quality (DQ). The below talking points on profiling product data will be the first in a series of entries dedicated to utilizing DQ in PIM.
When starting a new PIM Implementation, there are a lot of moving parts. To start everything off on the right foot, the best place to start is determining how the Product information will best be modeled and managed within the new PIM system. Early on the team will constantly be crossing the bridge of ‘the way we do things’ vs ‘the way we SHOULD be doing things’. One of the critical success factors that needs to be emphasized is actually a common theme on PIM implementations and can determine your long-term success. However, it is not always planned and executed the right way.
So what are we looking at?
During the beginning stages of the implementation, it is always recommended to perform structural profiling of the source system Product data. This structural data profiling analysis will produce benefits and validate the design by providing volume counts, data types, completeness percentages, and the consistency of tables/extracts. It will also benefit the data conversion to get an accurate baseline for expected data loads.
How does my data best fit the target model structure in PIM?
If the goal is to move from a flat, SKU-based product data structure to a two or three tiered data structure, it is vital to ensure proper data profiling is performed to validate the integrity and mapping of the source product data into the PIM data structure. Doing this will also allow identification of the level in which each field should reside in your PIM solution (such as Product Level, Variant Level, or Item Level). Profiling will also give insight to whether a field is applicable to all products, or a certain subset of products (PIM Attribute). This exercise will ensure the design is safe and sound, and the correct attribution is defined within the PIM data model.
Other common DQ problems
Implementing a PIM solution without performing a data quality analysis effort can lead to headaches throughout the duration of the project. In the requirements phase, there are a handful of ad-hoc data profiling exercises which can and should be performed. Implementing a new solution is a great chance for a ‘clean slate’ and some proper housekeeping is needed. Some common Data quality issues that we’ve seen are:
- Primary keys not exactly ‘unique’
- Duplication of IDs or Product records
- Fields that are currently or have been ‘free form’, leading to non-standardized fields
- Invalid date fields (such as a Create Date of 01/01/2101… is there a time traveler inputting data?!?)
- Orphaned items
Without a DQ work stream during the implementation, it is possible the solution could even be built with an incorrect Product identifier (making assumptions on identifiers is an industrial-sized no-no). Not discovering this architectural mistake until late in the game can leave the team scrambling to recover.
Another issue commonly encountered during DQ analysis revolves around Country not being standardized. One client of ours had more than seven different variations of storing values for the United States (US, U.S.A. United States of America, etc.) and this field directly fed their ecommerce website! Nothing would be more embarrassing than seeing on a highly visible product page that your goods are made in the “Untied States”.
The next blog in this series will discuss how to utilize Informatica’s IDQ Tool with PIM. Stay tuned!