SSG Blog

Disease Surveillance: You’re only as good as your data

Posted on October 6th, 2021   |   John Schaeffer, CEO

As our now nearly two-year struggle with the COVID-19 pandemic has taught us, disease surveillance systems are more important than ever and we can’t let them fall behind. With the interval between major disease outbreaks becoming smaller and smaller, disease surveillance and pandemic preparedness programs must be on high-alert full-time. 

Fortunately, advancements in technology have been keeping pace. Data-crunching and analytics capabilities, including artificial intelligence (AI) and Big Data, are evolving at dizzying speeds. Fully evidence-based decision-making is well within our grasp, as long as we incorporate these capabilities into our disease surveillance systems. 

Evidence-based decision-making is built on a foundation of data, which, in turn, is built on a foundation of data collection. The Centers for Disease Control (CDC) offers standards for data collection types and intervals for major diseases. Based on these standards, we know one thing is certain: any health surveillance program is only as good as its data. Now, let’s talk about how to ensure programs can get the most from their data. 

  1. Digitize, digitize, digitize. While there are breakthroughs in medical technology almost daily, many physicians are still notoriously using fax transmissions for communications. In many health care organizations, data is still being recorded on handwritten forms. Given the volume of data, analysis by hand is impossible, so these hand-written notes must be digitized to let the machines do their work. Transcription is time-consuming; optical character recognition has improved by leaps and bounds, but still is vulnerable to mistranslations and must be monitored carefully. Getting the digitization of records as close as possible to the origin—the patient’s bedside, the laboratory station—will speed collection by orders of magnitude and ensure maximum accuracy and data  Handheld wireless and cloud-based technologies are key to bringing data collection to the source.
  2. Cleanliness. Data cleanliness is an ongoing battle, especially when incorporating legacy data into a new system, or integrating data from many sources. There are several dimensions to data integrity: Accuracy, completeness, consistency of formatting, timeliness and uniqueness (ensuring there is no duplication). Uniqueness offers a challenge within a challenge. It’s not simply deduplication. Entity resolution must recognize that, in different records, William James might be Will James might be Bill James, and collate the records accordingly. Data screening—making sure the data is valid and complete at the point of capture—is the first line of defense against dirty data. If the data is captured by a handheld application, for example, software should flag missing or incorrect fields so the user can ensure the data goes into the system. The use of drop-down menus and check boxes, where possible, can be a timesaver for the collector and a boon for data consistency.
  3. Accommodate the types of Surveillance. The type of data collected—and how it is collected—varies according to the type of surveillance. 

* Passive surveillance: The challenge in passive surveillance is the completeness and timeliness of the reports, since it relies on a huge network of health workers. Similar techniques to those of data screening can help, but with diverse systems, co-ordination among entities and their systems is key. This may call for actual data cleansing or scrubbing to align the data exchanged field-by-field. 

* Sentinel surveillance: This requires more detailed data, specialized staff and sophisticated laboratories. If at all possible, health authorities should have a data relationship with sentinel hospitals built on identical input and output formats. 

* Active surveillance and active search: The challenge for active surveillance and search is integration of data from many formats and media. This includes input from manual records searches, interviews with staff, clinic and hospital ward visits, and sometimes door-to-door canvassing. They are resource-intensive, expensive and make heavy demands on data management systems. 

But the CDC notes that the change technology is driving in field surveillance data collection is working. Thanks to mobile devices, smart devices, electronic health records (EHR), social media and automated information systems, “a shift is occurring to a new normal in which field response data collection is integrated with existing infrastructure [and] uses jurisdictional surveillance and informatics staff.” Analysis can be performed by skilled staff at the health authority without them having to be on-site. 

Business process management (BPM) technologies can also play a role. BPM technologies can model processes to weed out inefficiencies, apply externally generated business rules, and monitor performance. But it’s the reporting tools that really shine. Processes are essentially self-documenting, activity can be constantly monitored, and performance reporting can be integrated, allowing authorities to focus on key performance indicators and processes that can improve them. 

Data is the foundation of evidence-based decision-making in disease surveillance. Principally, moving data collection closer to the patient by eliminating manual data collection processes, ensuring the integrity of the data, and using emerging technologies to streamline investigative processes are the keys to facing the evolving challenges in tracking the ever-increasing number of epidemics and pandemics.