Q&A: Dr Anoop Shah on using data from health records

Dr Anoop Shah

Information from health records can be extremely useful in medical research, but at the moment not all the data can be extracted automatically. At University College London, Dr Anoop Shah is developing software to pull out more detail without compromising patient anonymity. By Michael Regnier.

How are health records used?
The General Practice Research Database (GPRD) has been collecting data from GPs’ electronic health record systems for 25 years. The information is anonymised and can be used for medical research, such as studies on drug safety. About 5 per cent of the UK population is covered by the database, so it has millions of patient records, many more than you could access by setting up a new research project.

Why isn’t all the data available?
GPs record major diagnoses using a system called Read Codes, which is standardised across the NHS. But doctors also enter information as free text: this could include specific symptoms, a suspected diagnosis or even a negative diagnosis. If a researcher wants to use the information in the free text, someone has to manually look at each record, anonymise it and pull out the relevant data. It’s not very practical.

How does your program work?
I have developed software that identifies pieces of free text that could potentially be coded in Read, even if they have been written using non-standard terms. It also detects the context of the diagnosis. For example, the coded term in a patient’s record might be “chest pain” but in the free text the GP may have written “not myocardial infarction”. Both bits of information are relevant and the program would pick them up while recognising that myocardial infarction was a negative diagnosis. On the other hand, the program deliberately omits any information that could identify the patient, such as names and locations.

Initially, we tested whether the program could detect causes of death recorded in free text. It was very successful – of diagnoses detected by the program, 98 per cent were correct. More generally, it can be harder because it has become common to include correspondence between GPs, hospitals and patients. These letters can have complex language structures that confuse the program, but we are continually developing and improving it.

What could it be used for?
If you are studying patients with a particular disease, you probably want to know whether they eventually died of that disease or some complication linked to their treatment. Much of that information will be in the free text. It may also contain suspected diagnoses, which cannot be recorded in the coded data.

With further development, this type of software could also help doctors to fill in records: analysing their free text in real time, the software could suggest standardised terms. If it wasn’t right, the doctor could rephrase the information.

What drew you to this problem?
I started developing the program when I worked at the Medicines and Healthcare products Regulatory Agency, which holds the GPRD. I’ve had an interest in computing for years but it had always been more of a hobby. Now I’ve combined it with my research.

My medical training was in clinical pharmacology and general medicine, and now I have a Wellcome Trust Research Training Fellowship to do a PhD looking at biomarkers and prognosis of coronary heart disease, mostly using electronic health records. This program is intended to be a resource for anyone but it would certainly help me as well.

This feature also appears in issue 72 of ‘Wellcome News’.

Further reading

Shah AD et al. The Freetext Matching Algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records. BMC Med Inform Decis Mak 2012;12:88.

Filed under: Biomedical Sciences, Data Sharing and Open Access, Fellowships, Wellcome Trust Publications Tagged: applications of technology, Dr Anoop Shah, electronic health records, GP, GPRD, heath records, medical language, medical terminology, Open Science, Research Training Fellow, Research Training Fellowship, University College London

Q&A: Dr Anoop Shah on using data from health records

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112