By Melanie Senior, Analyst and Specialist, Writer in Pharma, Life Sciences
Machine reading allows pharma firms to tune into customers’ views of drug value — in real-time
You have launched a new drug. Imagine if you could tune into all the conversations about that drug — among physicians, between physicians and their patients, and among patients and caregivers. Imagine you could rapidly capture a summary of all written materials prescribing, describing and judging the impact of that product. What do patients like or dislike about it? What aspects benefited patients the most? Why do doctors prescribe it — and, as importantly, why don’t they?
That level of real-world feedback, from multiple stakeholders, would allow you to immediately improve how you position your product, who you talk to about it, and how you further develop it. It would help you ensure that your drug’s value is understood, and that it reaches those who can most benefit from it. The most common complaints or perceived drawbacks could be identified. Such a comprehensive picture of a product’s impact could save money and resources for drug manufacturers and for health systems.
It is not a fairy tale. The data is out there — lots and lots of it, from multiple sources. It is in health system documentation, on social media, in drug companies’ sales force records. The challenge is capturing what is relevant and making sense of it all — turning it from real-world data into real-world evidence.
Our brains aren’t up to it. We would need months or years to read everything about drug X, from tens of thousands of patients and their doctors. And that is before summarizing or interpreting it.
Fortunately, computers are now very good at ‘reading’ and interpreting text. Decades ago, machines could examine large collections of written information to find high frequency words or associations — so-called “text mining”.
Today, “text mining has evolved and matured into machine-reading,” says Janik Jaskolski, founder and CEO of Semalytix, which provides AI-based business insights to pharmaceutical firms.
Data scientists now have the tools and computer processing power needed to construct neural networks and deep learning systems that enable machines to find and extract meaning from written words. Computers can already recognize voices and talk to us — think of Alexa or Siri. They can recognize faces and interpret medical images — as in photo-tagging, facial recognition or AI-enabled diagnoses. Text recognition and so-called ‘Machine Reading’ (MR) are next. That said, machines cannot (yet) reliably read and interpret all kinds of text.
“We don’t have a generalized language model,” says Jaskolski. “But we have a very good domain-specific system that understands medical and pharmaceutical language.”
Semalytix’s Pharos business intelligence platform extracts meaning and insight from the multiple sources of un-structured text in the pharmaceuticals space — from sales force transcripts or healthcare provider surveys, to healthcare blogs, patient forums or indeed Twitter. Their machines work fast, “reading” an average of 1 million characters per minute (that’s about five hundred times faster than an average adult could manage). Semalytix’ programs pick up not just key words such as a product name or a particular side-effect, but also judgement, opinion and choice. Why does a patient prefer this product to another? The algorithms are designed and trained to determine causal relationships between particular ideas or entities — such as an association between “more painful” and a specific delivery device.
The result is “transparent, directly actionable data visualizations that are more sophisticated than word-clouds,” says Jaskolski.
Importantly, Semalytix’ approach is designed to provide concrete, actionable answers to specific questions — in something close to real-time. The insights provided by Pharos can help companies understand the key motives driving prescriptions, and the relationship between investment in a given product or therapy area and product sales. And since the Pharos platform works across 10 languages, companies can perform the analysis in each local market, and compare across markets, too.
The idea is to rapidly generate business intelligence that is better and cheaper than traditional market research or customer survey methods, which are typically expensive, resource-heavy and only provide tiny part of the picture.
“Our primary goal is to improve on [current techniques] and create a comprehensive, 360-degree perspective in response to a given query,” says Jaskolski.
How does it work?
Given a particular question, such as ‘what are the perceived advantages, among patients, of drug X over drug Y?’, Semalytix first combs through 20–30 million data sources and websites to find the most relevant, credible and useful ones. It then isolates the most pertinent individual posts or threads, continuously sifting for more targeted information. So for example, having isolated credible patient-generated threads on the subject of cancer, it may then select further for a specific type of cancer, such as prostate cancer. This is also known as removing all the “noise”, or data that is not relevant to the question.
That process may still leave several million relevant posts, however — still too large a collection of real-world evidence to connect directly to decision-making. “We go the last mile by further prioritizing the data, given the actual intelligence question,” says Jaskolski. The machines are trained in ‘question-answering’ — an information-retrieval technique that enables computer systems to predict the likelihood that a single data point or term, like ‘painful’ or ‘uncomfortable’, is relevant to a given question or purpose.
So for example, the ‘painful’ in “I prefer this range of injection pen because it is less painful,” is relevant to a question about a particular delivery device; while “I had a painful stomach ache so did not want to take any injections today” is less so.
Of course, each type of data, and each voice, has its own style and peculiarities. Not all patient or physician feedback is written in the same way — the language used on a patient forum is different to that used in field force logs, or in physician surveys, or indeed on Twitter. Semalytix’s systems take that into account. “We try to cluster sources by the way people talk,” says Jaskolski, and according to whose voice it is — that of a physician, patient or sales rep. But patients’ voices are never mixed with physician voices, or indeed with sales rep feedback. All forms of stakeholder feedback have their own bias; each is valuable in its own right.
“We are very up-front about the bias inherent in each and every data source, and that are part of the analysis,” says Jaskolski.
Patients may not express everything in a survey, or to their physician. Twitter conversations tend to be self-promotional. Sales reps may over-estimate their own persuasiveness. These challenges affect every type of market research, however. Understanding them helps overcome them, enabling as clear a picture as possible of all stakeholders’ views.
Semalytix has worked with multiple pharmaceutical partners to provide insights that have directly influenced business decisions. The four-year old company, spun out of the University of Bielefeld in Northwestern Germany, now has over 60 employees from 25 different countries.
It is nevertheless up against some very large competitors. Google and other “Big Tech” are working hard on machine reading, including training general purpose NLP models to “read” any kind of text. (Reports suggest that one of Google’s newest NLP models is better at finding answers within small sections of text than humans are, though this was a very specific kind of activity.) East-Coast-based Signals Analytics and Meaningcloud both serve the pharmaceutical industry — but not exclusively. Cambridge, UK-based SciBite and Linguamatics focus on machine reading scientific texts. (Linguamatics was bought by IQVIA in January 2019.)
Yet the need, and demand, for specialist real-world evidence is enormous. The drug industry is spending many hundreds of millions of dollars gathering real world data, as sources and types multiply. Such data is being used along the value chain, from drug discovery and development, through to market access and commercialization.
Few companies are effectively making use of all the data at their disposal, though — most do not comprehensively mine their own field-force records, let alone the multiple external sources of commercial data. So there is plenty of real-world data to go around.