Friday, July 07, 2017

Digitizing History

This post comes from Grayce Mack, Student Assistant for the LSTA digitization project. Grayce is an MLIS student at Emporia State University who is about to graduate from the program and head off into the professional library world.

Last week we wrapped up work on our two-year digitization project Public Health in Oregon: Discovering Historical Data. Since June 2015, student archives assistants have been digitizing collection materials that document the fascinating history of public health in Oregon during the 19th and early 20th centuries. The collection includes county health surveys, patient records, reports from medical institutions such as the Portland Free Dispensary and the Oregon State Tuberculosis Hospital, and much more.

After scanning the collection, we began transcribing tables of data from the records into Excel. By creating digital spreadsheets of the information from these records, researchers will be able to use the historical data more easily and efficiently, and will allow researchers from data-driven professions to analyze records that are usually limited to study by historians. The content varies widely, from vital statistics to tuberculosis fatalities, even an annual budget of hospital groceries. We soon found that working with legacy health data presents unique challenges, including issues of patient privacy and data loss.

Our goal was to transcribe records as closely as possible, but this often proved difficult because they were written in a time before records needed to be machine-readable. Many records contained inconsistent terms and abbreviations, spelling mistakes, and data placed in the wrong columns. Combine these errors with century-old cursive penmanship, and we had our work cut out for us. On most records, when small corrections to original errors could be reasonably made, we changed the data and included a note on the digital record.

Other data we could not alter so easily. In order to maintain the integrity of the original record, we decided not to transcribe any data presented in an unstructured (or non-tabular) format, even when this resulted in some data loss. We also could not always transcribe non-textual symbols on analog records. Certain words on the original may be circled, written over, or underlined in red. These may have been important notations to the original users, but we often did not have a key to interpret their meaning, and, more importantly, no way to adequately represent these symbols in the Excel document.

We also ran into issues with transcribing and publishing protected health information (PHI) in our digitized records. While the medical records we transcribed are between 50 and 120 years old, HIPAA protections for patients still apply until 50 years after the death of the patient. If the date of death was unknown (which was frequently the case), we redacted PHI for patients born after 1867 or anyone whose birth date was also unknown.

Redaction goes beyond just omitting the patient’s name. Following the Safe Harbor methodology for satisfying HIPAA, we redacted names, addresses, cities of residence, phone numbers, family members, and admission and discharge dates. While it can be frustrating to lose valuable data during this process, redaction is a necessary and ethical safety measure to protect the legacy and families of patients of OHSU’s past institutions.

As the project comes to a close, we have scanned 8,634 pages of archival documents, redacted over 3,500 pages, and transcribed the data from around 6,700 pages. By redacting and transcribing historical health records, we learned about the challenges of preserving legacy health information. While it is impossible to digitize archival materials without losing some original data, we have expanded the usability of these enlightening historical records and can now share this collection with researchers all over the world.

Check out the collection to view the digitized records and transcribed data files. You can also visit our online exhibit for context about the history of public health in Oregon documented by this project.

No comments: