AI in medical diagnosis

Deep Learning in Medical Diagnosis: How AI Saves Lives and Cuts Treatment Costs

“Symptoms never lie,” said Dr. House, the most brilliant diagnostician of all time, who, alas, existed on TV screens only. In real life, symptoms are often tricky to spot even by the best experts, while diagnostic mistakes are acknowledged as the most frequent and harmful medical errors, with between 12 to 18 million Americans facing some type of misdiagnosis each year.

There is hope that artificial intelligence (AI) and machine learning (ML) can change this unsettling situation for the better. This article highlights the most successful examples of machine learning applications in diagnosis, accentuates its potential, and outlines current limitations.

AI in disease detection: the current state of things

In 2016, Geoffrey Hinton, a notable computer scientist often referred to as the “Godfather of Deep Learning,, predicted that radiologists —  specialists who diagnose diseases from medical imaging like X-rays, computed tomography (CT) scans, and magnetic resonance imaging (MRI) — would soon lose their jobs. “People should stop training radiologists right now,” he announced, “It’s obvious that within five years deep learning is going to do better than humans.”

Four years later, deep learning remains the most promising and widely used ML technique for radiology in particular and disease detection in general. It comes as no surprise as diagnostic imaging prevails in clinical diagnosis and image recognition is a natural fit for deep learning algorithms. That’s what they can do best.

However, it’s no less obvious that machines still can’t replace live experts. “What you see is that [deep learning] is used to support the doctors or do a pre-selection and prioritize cases if there are many patients in the queue,” describes the actual situation Erwin Bretscher, a healthcare consultant at Conclusion, who among other things, advises businesses on artificial intelligence.

There are several drivers that push forward the use of deep learning in radiology and other diagnostic practices:
  • the continued growth of computing power and storage technologies,
  • declining cost of hardware,
  • rising cost of healthcare,
  • the shortage of healthcare workers, and
  • an abundance of medical data to train models. In the US alone, 60 billion radiology images are generated annually — not to mention other data.
Today, most deep learning algorithms augment the diagnostic workflow, but by no means replace human specialists. Below, we’ll explore the most promising use cases of AI in healthcare and give examples of ML-driven solutions, commercially available in North America (FDA-cleared), Europe (CE-marked), or both.

Breast cancer screening

According to the World Health Organization (WHO), breast cancer is the most common oncology disease among women, that leads to around 627,000 deaths annually. To save lives, many countries have introduced screening programs aiming to detect the cancer at an early stage.

The procedures vary from country to country. For instance, American women go for a mammogram (X-ray of the breast) every one to two years and each image is analyzed by a single radiologist. British women are screened once every three years but with two experts providing results. Though neither approach is perfect, double reading shows better accuracy.

AI advancement and promised benefits

At the very beginning of 2020, Google’s artificial intelligence division DeepMind introduced a deep learning model that reportedly improved results of an average radiologist by 11.5 percent and significantly reduced the workload of the second reader in the British scenario.

Another recent research run by Korean academic hospitals revealed that AI had higher sensitivity in detecting cancer compared to human experts — especially, when dealing with fatty breasts (90 vs 78 percent).

The studies are still in their early stages with more clinical trials needed. For now, models can serve as an additional reader to automatically produce the second opinion. Potentially, they will plug a growing shortage of trained radiologists.

Commercially available solutions

Breast Health Solutions by iCAD (based in New Hampshire, USA, FDA-cleared, CE-marked). The AI suite applies deep learning algorithms to 2D mammography, 3D mammography (digital breast tomosynthesis or DBT), and breast density assessment. Its ProFound AI technology became the first artificial intelligence solution for 3D mammography approved by the FDA.

Transpara by ScreenPoint Medical (based in the Netherlands, FDA-cleared, CE-marked). Trained on over a million mammograms, Transpara deep learning algorithm helps radiologists analyze both 2D and 3D mammograms. The solution is already in use in 15 countries, including the USA, France, and Turkey.

Early melanoma detection

Skin diseases are the fourth most frequent cause of disability worldwide while skin cancer is the world’s most common malignancy, hitting 20 percent of people by age 70. Luckily, 99 percent of the cases are curable if they are spotted and treated on time. And that’s where AI can play a meaningful role. Similar to radiologists, dermatologists largely rely on visual pattern recognition.

AI advancement and promised benefits

In 2017, computer scientists from Stanford University created a convolutional neural network (CNN) model that was trained on 130,000 clinical images of skin pathologies to detect cancer. The algorithm reached the accuracy demonstrated by dermatologists.

That’s how a CNN developed at Stanford classifies skin lesions from images

That’s how a CNN developed at Stanford classifies skin lesions from images. Source: ExtremeTech

A year later the European Society for Medical Oncology (ESMO) showed even better results: The CNN correctly detected melanomas in 95 percent of cases while the accuracy of dermatologists was 86.6 percent.

Finally, in March 2020, the Journal of Investigative Dermatology published the study by researchers from Seoul National University. Their CNN model learned from over 220,000 images to predict malignancy and classify 134 skin disorders. Again, AI proved its capability to distinguish between melanoma and birthmarks at human expert level.

Besides enhancing the speed and accuracy of diagnosis, there are plans to run CNN algorithms on smartphones for non-professional skin exams. This can encourage people to visit dermatologists for lesions that might be ignored otherwise.

Commercially available solutions

Despite all promising studies, no AI skin cancer detection software is currently authorized by FDA for marketing in North America due to the potential harm from poor diagnosis. At the same time, two solutions for spotting melanoma are granted CE marks, which means they meet European safety standards.

SkinVision (based in the Netherlands, CE-marked). The app is designed for assessing the risk of cancer based on photos of suspicious moles or other marks. Its AI algorithm was trained to spot warning signs on 3.5 million pictures. SkinVision has already contributed to the diagnosing of 40,000 cases of skin cancer. The app is available for iOS and Android worldwide, except for the US and Canada. However, it by no means can be a substitute for a visit to a dermatologist.

skinScan by TeleSkin ApS (based in Denmark, CE-marked). The iOS app available for downloading in Scandinavia, New Zealand, and Australia, uses an AI algorithm for distinguishing a typical mole from an atypical one.

Lung cancer screening

Lung cancer is the world’s deadliest oncology disease: It leads the list of cancer-related mortality and is second only to skin cancer in the prevalence rate. As with other malignancies, early detection may be lifesaving. Unfortunately, lung cancer symptoms are very similar to those of pneumonia or bronchitis. And this is why it is spotted only in advanced stages in about 70 percent of cases.

AI advancement and promised benefits

The 2019 research by Google showed a promising result: A deep learning model created in collaboration with Northwestern Medicine and trained on 42,000 chest CT scans was better at diagnosing lung cancer than radiologists with eight years of expertise. The algorithm was able to find malignant lung modes 5 to 9.5 percent more often than human specialists. Earlier, another CNN model proved its ability to spot chronic obstructive pulmonary disease (COPD) which often develops into cancer.

The odds are that before long AI systems will assist radiologists in analyzing large numbers of CT images, thus contributing to successful treatment and increasing survival rate.

Commercially available solutions

Veye Chest by Aidense (based in the Netherlands, CE-marked). The AI solution automatically detects suspicious nodules in the lungs from low-dose CT scans, measures them, and compares them with previous images to identify the growth rate.

Veye Chest analyzes nodules using AI

Veye Chest analyzes nodules using AI.

ClariCT.AI by ClariPi (based in South Korea, FDA-cleared). This solution doesn’t detect cancer, but denoises low-dose and ultra-low-dose CT scans, thus improving the confidence of radiologists. The CNN model was trained on over a million images of different parts of the body, but ClariPi accentuates lung cancer screening as a key application of their algorithm.

Diabetic retinopathy screening

In the field of ophthalmology, AI is mostly used for retina image analysis — and specifically for diabetic retinopathy (DR) detection. This eye complication can cause blindness and strikes one in three patients with diabetes, amounting to 422 million globally. Early detection prevents the risk of vision loss. But the problem is that DR often shows no symptoms until it becomes difficult to treat.

AI advancement and promised benefits

IBM’s deep learning technology launched in 2017 reached an accuracy score of 86 percent in detecting DR and classifying its severity — from mild to proliferative.

This result was outperformed by Google. In collaboration with its sister organization, Verily, the tech giant had been training a deep neural network for three years, using a dataset of 128,000 retinal images. In 2018, Google’s AI Eye Doctor demonstrated 98.6 percent accuracy, on par with human experts. Now the algorithm serves to help doctors at Aravind Eye Hospital in India.

Five levels of DR severity detected on retinal images

Five levels of DR severity detected on retinal images. Source: Adafruit

In view of the growing number of people with diabetes, AI-fueled screening systems may reduce the burden on eye technicians and ophthalmologists. Early detection also means a cheaper treatment: the drug cost for severe pathology may increase more than tenfold compared with early phase treatment.

Commercially available solutions

IDx-DR by IDx (based in Iowa, USA, FDA-cleared, CE-marked). Known as the first AI system for DR diagnosis approved by FDA, IDx-DR software can be paired only with a particular retinal camera called Topcon. The deep learning algorithm provides one of two results:

1) visit an ophthalmologist (for more than mild DR spotted) or

2) rescreen in 12 months (for mild and negative results).

IRIS (based in Florida, USA, FDA-cleared). Intelligent Retinal Imaging Systems can work with different cameras as it automatically enhances the quality of original images. The company benefits from Microsoft’s Azure Machine Learning Package for Computer Vision.

Cardiac risk assessment from electrocardiograms (ECGs)

Heart disease is the number one cause of death among men and women in the US and worldwide. Timely risk assessment based on ECGs — the quickest and simplest test of heart activity — may significantly decrease mortality and prevent heart attacks.

AI advancement and promised benefits

With more than 300 million ECGs preformed globally each year, algorithms obtain a huge data pool for learning. Multiple studies show that AI already not only spots current abnormalities from ECGs but predicts future risks as well. For example, RiskCardio technology developed in 2019 at MIT assesses the likelihood of cardiovascular death within 30 to 365 days for patients who have already survived acute coronary syndrome (ACS).

In turn, a group of researchers from Geisinger Medical Center used over two million ECGs for training deep neural networks to pinpoint patients at a higher risk of dying within a year. The key finding is that algorithms were able to recognize risk patterns overlooked by cardiologists.

AI is expected to save human experts considerable time and cut the number of misdiagnoses. Paired with low-cost hardware, deep learning algorithms may potentially enable the use of ECG as a diagnostic tool in places where cardiologists are rare or absent.

Commercially available solution

KardiaMobile by AliveCor (based in California, USA, FDA-cleared, CE-marked). The personal ECG solution consists of a small recording device that captures an ECG in 30 seconds and a mobile app that utilizes a deep neural network to detect slow and fast heart rhythms (bradycardia and tachycardia), atrial fibrillation (AF), and normal rhythms. Once taken, ECG recording can be sent to a clinician for further analysis.

Early stroke diagnosis from head CT scans

Stroke or the sudden death of brain cells due to lack of oxygen is the second major cause of death and the third leading cause of long-term disability globally. This dangerous condition requires immediate diagnosis and treatment: Statistics show that patients who receive professional help within three hours after the first symptoms typically make a better and faster recovery. But, unfortunately, emergency medical service (EMS) workers overlook roughly 15 percent of strokes which leads to delays in critical care and increases risks of fatal outcomes.

AI advancement and promised benefits

Data scientists from Geisinger collected over 46,000 brain CT scans to create a model capable of flagging the signs of intracerebral hemorrhage (ICH) — the deadliest type of stroke with 40 percent mortality within 30 days and hard disabilities in most survivors. They implemented the algorithm into routine care and tested it for three months. In some cases, this translated to a decrease of diagnostic time by 96 percent. Researchers also reported the ability of the algorithm to spot subtle symptoms of ICH missed by radiologists.

According to multiple studies, AI can be also successfully applied in diagnosing ischemic stroke caused by large vessel occlusion or LVO. And experiments with Google’s Teachable Machine showed that trained algorithms correctly identify the type of stroke in 77.4 percent of cases.

In most cases, AI algorithms sufficiently differentiate ischemic strokes caused by blood clots from hemorrhagic strokes caused by bleeding

In most cases, AI algorithms sufficiently differentiate ischemic strokes caused by blood clots from hemorrhagic strokes caused by bleeding. Source: Young Scientist Journal

Potentially, AI trained by neuroradiologists may deliver a reliable “second opinion” to non-expert medical service providers so that they can make fast decisions and minimize damages.

Commercially available solutions

Viz LVO and Viz-ICH by (based in California, USA, and Israel, FDA-cleared and CE-marked). The deep learning algorithms analyze CT scans to detect suspected ICH and LVO strokes. The system automatically alerts specialists, saving precious time and brain cells.

AI Stroke by Aidoc (based in Israel, FDA-cleared and CE-marked). AI Stroke package covers two types of stroke — ICH and LVO. The system automatically flags suspected cases, enabling radiologists to quickly decide on the course of action.

e-Stroke Suite by Brainomix (based in the UK, CE-marked). The AI-driven imaging software automatically assesses CT scans of stroke patients. Currently, the algorithm identifies only the ischemic stroke that amounts to 85 percent of all cases.

Barriers to ML adoption in healthcare

The use of AI in diagnostic workflow might be much more extensive if it were not for several obstacles. What slows down AI adoption in medical diagnostics? The first thing that comes to mind is money: ML projects are costly, labor-intensive, and require huge computing resources. Health facilities often operate on tight budgets, while potential investors may doubt future profitability — in view of the dearth of validated use cases.

But besides financial issues, which are common for many fields, the healthcare sector adds industry-specific layers of complexities.

Regulatory issues

Software meant for diagnostic purposes is subject to strict regulations, protecting patient safety. To sell AI-based solutions in Europe, a company needs to obtain a CE (Conformité Européenne) mark while entering the US market requires authorization from the FDA (the US Food and Drug Administration). In both cases, the certification process takes a lot of time, money, and energy, requiring clinical trials, evaluations, and tons of technical documentation. This can pose a huge challenge for small businesses and startups.

Shortage of data on new diseases

The overwhelming majority of diseases have been here for decades and centuries, with mountains of data amassed on them. Yet, it’s not the case with novel infections like COVID-19. The shortage of large datasets is a key reason why machine learning is not effective for tracing coronavirus symptoms so far.

AI-assisted diagnosis for COVID-19 from computed tomography scans

AI-assisted diagnosis for COVID-19 from computed tomography scans. Built in China, the intelligent system still lacks data to be broadly adopted. Source: medRxiv

Why is a wealth of data so important for the success of ML algorithms? Roughly, the more images of a pathology you run through the machine in the training stage, the better it can recognize particular anomalies on its own. For coronavirus, the current lack of historical data is worsened by another, more permanent problem — limitations on sharing health information.

Data silos and privacy rules

More often than not, hospitals and research institutions keep medical data siloed and separated, beyond the reach of the scientific community. This fragmentation is additionally supported by data-protection regulations like GDPR or HIPAA that put restrictions on the sharing of patient information. The idea of centralizing sensitive data in a cloud server accessible for tech companies is extremely unpopular in the US, UK, and other countries.

To address the problem of privacy, Google offered a new approach, called federated learning. It allows for training the current algorithm at different hospitals using local datasets. Then, the updates are sent to central storage to improve a shared model. This way, institutions exchange models, not sensitive data. However, the privacy-first technique is not without its pitfalls. For example, it requires hospitals to have infrastructures and personnel capable of training models.

Lack of standardization

Even if health records were open to the public, it wouldn’t solve the quality and standardization issues. Medical information is collected in many formats, with standards varying greatly across organizations. So, it takes scientists significant time to clean and label data before feeding it to models.

Black box aspect and lack of trust

Typically, deep learning algorithms work as black boxes: They don’t explain why they jump to certain conclusions. While for many areas the lack of interpretability is not a problem, it certainly matters in healthcare, where people’s lives are at stake. Clinicians and their patients need to know what makes the machine generate its verdicts and if there is evidence behind them. Otherwise, they can hardly rely on diagnoses suggested by IT systems.

To illustrate the issue with trust, Erwin Bretscher puts an example of a project detecting cardiomyopathy, a disease of the heart muscle, from diagnostic images. “The anomaly is recognizable [to machines],” he explains, “However, specialists often see a problem on scans, where everything seems to be fine. And most of the time they are right! Which brings me to the question: Can a computer replace human intuition? And who is responsible for the outcome?”

In the long run, the trust problem can be solved by so-called explainable AI (XAI) — an emerging area in machine learning that aims to provide domain experts with clear justifications for results produced by models.

The difference between today’s ML models and XAI

The difference between today’s ML models and XAI. Source: DARPA

XAI solutions, currently developed, are simple and find limited usage. Yet, it is expected that such algorithms will eventually dominate in healthcare as they bring transparency into decision-making processes.

AI vs MD: who’s the boss here?

Human brains — even of such a genius as House MD — have limitations in terms of the volume of data they can store and process. AI may address this, accelerating time to diagnosis and treatment. With smart algorithms, physicians get “a second pair of eyes” to detect a problem that can be overlooked due to weariness, distractions, lack of expertise or other human factors.

“AI can relieve the pressure on healthcare systems,” Erwin Bretscher adds. “In many countries, the population is getting older and demanding more care, but the sector fails to grow equally.”

In the coming years, we’ll see more diagnostic solutions utilizing deep learning algorithms to bring enormous improvements to patient care. But who will make a final decision and bear responsibility? Apparently, a live professional: AI is still too young for this.