I recently have been involved in a Business Intelligence/data warehousing project, focused on providing spend visibility information to procurement teams across the electricity generation and transmission/distribution operating units of a public utility. The project is using an agile framework and is receiving phase-based funding only, so our ability to deliver incremental value is essential to the program’s survival: data quality issues, no matter their source, can be fatal to the program if identified too late in a sprint. As we transition from our most recent delivery phase into our next, I thought I’d share a few observations from the last three sprints.
Coming into our final supply chain release for an Oracle Business Intelligence Enterprise Edition (OBIEE) rollout, the functional team reviewed the remaining metrics to deliver and found many of them had been bumped from previous releases due to data quality issues in the enterprise resource planning (ERP) system. The business anticipated some level of data cleanup would be required, but they could not quantify how much and underestimated the level of effort that would be required. The data issues were related to typos or operating units not finding (or not looking for) an existing supplier within the vendor master.
Our agile approach to BI delivery allowed Infoverity to largely absorb the delays by shifting to the next priority metric while the business did cleanup. However, it did result in a number of false starts and time lost to research an issue to determine if it was a data issue or a defect. Working with the business and program management, Infoverity created a prioritized list of data issues for the business to work through. As the business completed their corrections, the metric would be released to the development team to resume their work. As work piled up, project status would move to ‘yellow’ to encourage clean-up effort by the business. However, being able to quickly scope the data clean up for the elements to be transformed or utilized in hierarchies would have empowered the project team to get a large commitment from the business for cleanup, or incorporate a systematic data cleanup as an enablement phase to the BI project.
Lasting Damage of Duplicate Vendors
Duplicate suppliers presented the greatest risk. Supplier Spend and Open Purchase Order Dollars by Supplier metrics were understated because spend aggregation was spread across the duplicates. The Infoverity project team identified the duplicate vendors to be merged, triggering all associated transactions to move to the surviving supplier in the ERP system. The fix was easy enough; however, it opened the door for questions about the accuracy of the BI solution, even though the metric was revealing a data issue within the ERP system. The effort to satisfy on-going concerns greatly outweighed the data fix effort. An important lesson learned by Infoverity was to include basic data profiling for any aggregate or drill element as a prerequisite prior releasing a prototype to the functional team for review.
ERP users can manually enter a promised date or first needed by date for Purchase Orders in the ERP system. They usually get it right. However, a couple mistakes are inevitable and can wreak havoc when transforming transactional dates into normalized foreign keys. It took 90 minutes to hit a date conversion error during a full load, and if you have a dozen date errors, that is a lot of wasted processing and waiting. An important lesson learned by Infoverity was to press for data profiling using pattern matching to catch transformation errors as part of project setup. This saves project team from burning extra cycles and gets data issues identified early to allow maximum resolution time before data issues become a road block.
Clean Data Opportunities
Accurate, complete data does not happen by chance. Realizing information as an asset is not the result of a single project or a single stewardship team. Utilizing data quality processes improves scoping, deliverable quality and reduces schedule surprises. Profiling can be done as an ad-hoc analysis or as an ongoing gate keeper to ensure only accurate and complete information moves between systems and is promoted into decision making.