Mining for gold in patient records

The machine reading of free-form patient records is challenging, as even the simplest matters can be expressed in a myriad of ways. Lingsoft is responsible for indexing and text mining the Southwest Finland Hospital District's data lake in cooperation with Auria Services

Auria Biobank is the first of its kind in Finland. Its sample collection contains approximately one million samples that are used for medical research. Even though the biobank samples are intrinsically valuable to researchers, their value rises exponentially when they are combined with the data contained in patient records. Patient record data is managed by the Auria Clinical Informatics, which is also the first of its kind in Finland. Its purpose is to organise, harmonise and maintain the countless patient records stored in the information systems of the hospital district, so that the data can be made as easily available as possible to researchers and other experts.

Lingsoft has specialised in the analysis of clinical language since 2008, and health care is one of our fastest-growing client segments. Our extensive expertise and unique core technology also helped convince Auria Services, with whom we began indexing and text mining their patient record data lake in autumn 2017.

Proven data, safe treatment 

The machine reading of free-form patient records is challenging, as even the simplest things, such as smoking, can be expressed in a myriad of ways, not to mention more complex phenomena. Other issues, such as spelling errors and slang, can hinder machine reading: used by nurses, the term noradrenaline has been found to have a staggering 60 different basic forms in Finnish alone.
Looking for data in free-form text, otherwise known as text mining, requires language technology that, in the case of Auria, can be used to analyse both Finnish and clinical language. Lingsoft's language technology makes it possible to find the desired phenomena in patient records: we can restore the words to their basic forms and enrich them with semantic data. Semantic data that is based on ontologies, along with superordinate and subordinate concepts and entry terms, can provide a new dimension to information searches, which helps reduce some of the challenges associated with synonymy. In addition to this, texts can be identified by other desired structures, such as names or social security numbers, which is made possible by our anonymisation and pseudonymisation technology. The indexing work that is done to enable text mining is conducted in a firewall-protected data lake maintained by Auria Clinical Informatics, meaning that no patient data is transferred anywhere else.
““Effective, safe treatment is evidence based. I have high expectations for text mining, and time will tell what types of tools we can develop to help doctors and researchers,” explains Head of Auria Clinical Informatics Arho Virkki.

International interest in Finnish biobank and patient data has also been extensive: the data is reliable, digital, linkable with different systems and covers a long time period. The Finnish genome is also of interest to many researchers. However, without the right tools, one cannot hope to strike gold. The same applies to other fields whose challenges can be tackled using solutions provided by Lingsoft.