Skip to main content

The importance of data quality automation to healthcare

Ray Wright

How data automation is used in healthcare

When we think of automation for healthcare, the first thing that comes to mind is RPA (robotic process automation). With RPA, mundane and repetitive processes are the ideal target for improvement. Examples of these processes include gathering information from multiple sources, consolidating that information, and producing a monthly management report, such as bed availability and utilization, or COVID cases and outcomes. That said, automated processes can be implemented across many business needs, including billing, scheduling, patient intake, and claims, among other tasks. 

Automation can also cover a wide range of activities from an IT perspective, including automatic backup and recovery, file transfers and consolidations, automated systems management and software updates, and user self-service and issue tracking. 

Beyond processes and systems, healthcare professionals are now using automation in the form of machine learning and AI to better understand and analyze health data. Patient data is being used in health research to find more effective treatments and improve the quality of care. Natural language processing is being used to answer patient questions and facilitate appointments, and startups are using AI applications to help prevent errors, for instance, by flagging unusual prescriptions. 

However, IT specialists and health data analysts must ensure that data used in patient communication and healthcare decision-making is complete and accurate before it’s used to train AI models or produce actionable insights or answers. 

What role does data quality automation play?

Underlying automation is the actual patient data, and like most other datasets, it’s prone to misinterpretation, incompleteness, duplication, and outright error. Yet, data accuracy and completenessdata qualityis a key requisite when making treatment decisions, particularly if the patient is unable to communicate when admitted Data is also increasingly used to help predict future requirements about health trends and patients needs and to determine optimal treatment plans on a more immediate basis. If the data has errors, is incomplete, or worse, is duplicated, then the outcomes may be less than optimal. In fact, earlier studies have shown that incorrectly identifying patients because of duplicated records may have had seriously detrimental results. And beyond resources and treatments, data errors can affect claims, billing, patient management, and ultimately costs and revenues. 

It’s a commonly quoted statistic that data analysts spend 80% of their time fixing data quality issues and only 20% doing analytics. [1] Not only is this a waste of a scarce and important resource, but it also adds delay and impacts decision-making agility. The challenge when analyzing large datasets is that data quality errors are often difficult to diagnose. If specific errors are known, for example, when dates are incorrectly formatted, fixes can be coded and easily implemented. But when the errors or inaccuracies are unknown then different approaches are required.  

Looking for and fixing data issues is time consuming, tedious, and often frustrating. And fixing a problem once doesn’t mean it’s fixed forever. Data decays. Meaning that data at rest gets old and can become incorrect or inadmissible over time. Contact data changes by some estimates at about 2% per year. [2] 

Where patients live and work can easily change. Home phone services may be discontinued. Insurance carriers and primary care providers change. Medications and treatments change, and so on. Even when data is updated, if it’s done manually, there’s a chance that errors will be made when new data is entered. Sometimes data errors are the result of fraud. Multiple uses of the same social security number or address for different patients can indicate a lack of insurance coverage. Frequently, when data is collected on paper forms, there are omissions because the data is unavailable at the time, which leads to incompleteness. 

The answer to these challenges is data quality automation. 

By establishing acceptable thresholds of data quality and monitoring data on an ongoing basis, it’s possible to automate data quality processes. As a first step, the data can be profiled to identify common issues such as multiple formats for the same values, various uses of patient-unique vales across different patient records (duplication), incompleteness of data fields, and out of range valuessuch as dates of birth that suggest patients have yet to be born or are well over 100 years old! By identifying such issues, the efforts of data analysts can be prioritized and directed to the most important areas, without them having to guess about potential issues. 

Next, rules can be created that check data validity and completeness when changes are made or when new data becomes available. For example, length of stay is often a critical KPI for healthcare organizations. Assuming the underlying data is available (and correctly formatted) it’s possible to create a rule that checks admission and discharge dates against length of stay to see if there’s an error. Obviously, this is just one simple example. Rules can be created to check any number of possibilities, and they don’t require programming skills with today’s technology. In fact, the process of creating rules can also be automated, using machine learning or simple drag and drop actions. 

Beyond profiling and rule creation

Identifying where errors or omissions occur is only half the battle. If every error found had to be investigated, then not much would be saved in analysts time. However, automation can go further.  

Data cataloging solutions can identify not only what data exists, but which datasets are most frequently used, by whom, and for which purpose. This knowledge helps with the prioritization of effort. In addition, some incorrect or incomplete data can be automatically corrected or added. Invalid or missing contact information, for example, can be updated or appended automatically. This includes mailing addresses, email addresses, and mobile phone numbers. Other values can be inferred from the data and best suggestions can be made to the analyst to reduce further investigation and speed up the correction process. 

When trust in data accuracy has been restored, data quality automation checks can then be performed on the data to ensure continued compliance with the established rules. Thresholds help identify when further issues occur, either because of data decay or the influx of new records, and checks can be run periodically or whenever the data changes. 

The benefit of data quality automation

Data-driven decision-making is now a key component of today’s successful organizationsHowever, decisions based on bad data can have consequences. Healthcare organizations run on tight budgets and lean resources and can benefit enormously from access to accurate data. It makes for greater efficiencies, improves patient outcomes, and helps avoid costly mistakes. 

[1] Ruiz Gabernet, Armand, and Jay Limburn. “Breaking the 80/20 Rule: How Data Catalogs Transform Data Scientists’ Productivity.” Ibm.com, IBM, 23 Aug. 2017, www.ibm.com/cloud/blog/ibm-data-catalog-data-scientists-productivity. Accessed 14 June 2023.  

[2] Wickey, William. “How Is an Email List like the Golden Gate Bridge?” Social Media Today, 7 Jan. 2015, www.socialmediatoday.com/content/how-email-list-golden-gate-bridge. Accessed 14 June 2023.  

 

Ready to learn more about Experian’s data quality solutions?

Fill out the form below to get started!