This post comes from Grayce Mack, Student Assistant for the LSTA digitization project. Grayce is an MLIS student at Emporia State University who is about to graduate from the program and head off into the professional library world.
Last week we wrapped up work on our two-year
digitization project Public Health in Oregon: Discovering Historical Data. Since June 2015, student archives
assistants have been digitizing collection materials that document the
fascinating history of public health in Oregon during the 19th and
early 20th centuries. The collection includes county health surveys,
patient records, reports from medical institutions such as the Portland Free
Dispensary and the Oregon State Tuberculosis Hospital, and much more.
After scanning the collection, we began transcribing tables
of data from the records into Excel. By creating digital spreadsheets of the
information from these records, researchers will be able to use the historical
data more easily and efficiently, and will allow researchers from data-driven
professions to analyze records that are usually limited to study by historians.
The content varies widely, from vital statistics to tuberculosis fatalities,
even an annual budget of hospital groceries. We soon found that working with
legacy health data presents unique challenges, including issues of patient
privacy and data loss.
Our goal was to transcribe records as closely as possible,
but this often proved difficult because they were written in a time before records
needed to be machine-readable. Many records contained inconsistent terms and
abbreviations, spelling mistakes, and data placed in the wrong columns. Combine
these errors with century-old cursive penmanship, and we had our work cut out
for us. On most records, when small corrections to original errors could be
reasonably made, we changed the data and included a note on the digital record.
Other data we could not alter so easily. In order to maintain
the integrity of the original record, we decided not to transcribe any data
presented in an unstructured (or non-tabular) format, even when this resulted
in some data loss. We also could not always transcribe non-textual symbols on
analog records. Certain words on the original may be circled, written over, or
underlined in red. These may have been important notations to the original
users, but we often did not have a key to interpret their meaning, and, more
importantly, no way to adequately represent these symbols in the Excel document.
We also ran into issues with transcribing and publishing protected
health information (PHI) in our digitized records. While the medical records we
transcribed are between 50 and 120 years old, HIPAA protections for patients still
apply until 50 years after the death of the patient. If the date of death was
unknown (which was frequently the case), we redacted PHI for patients born
after 1867 or anyone whose birth date was also unknown.
Redaction goes beyond just omitting the patient’s name. Following
Harbor methodology for satisfying HIPAA, we redacted names, addresses, cities
of residence, phone numbers, family members, and admission and discharge dates.
While it can be frustrating to lose valuable data during this process,
redaction is a necessary and ethical safety measure to protect the legacy and
families of patients of OHSU’s past institutions.
As the project comes to a close, we have scanned 8,634 pages
of archival documents, redacted over 3,500 pages, and transcribed the data from
around 6,700 pages. By redacting and transcribing historical health records, we
learned about the challenges of preserving legacy health information. While it
is impossible to digitize archival materials without losing some original data,
we have expanded the usability of these enlightening historical records and can
now share this collection with researchers all over the world.
Check out the collection to
view the digitized records and transcribed data files. You can also visit our online
exhibit for context about the history of public health in Oregon documented
by this project.