Cleaning and preprocessing public medical datasets
Medical imaging datasets are extremely hard to obtain because of strict privacy regulations in the healthcare sector and the high cost of manual annotation as each instance must be examined by three professionals. Creating them from the ground up exceeds the time and budget constraints of the prototype project. Instead, our team explored the resources of Kaggle, the world’s largest community of data scientists, boasting 50,000 public datasets.
After doing some research, we selected, cleaned, and pre-processed three datasets with the greatest relevance to our goals. This included de-identified chest x-rays and corresponding lung masks collected for a tuberculosis control program; over 112, 000 radiographs with disease labels from about 30,000 unique patients; and a dataset provided by the Society for Imaging Informatics in Medicine for pneumothorax segmentation.
Defining three CNN architectures to produce correct predictions
Our set of tasks belongs to the computer vision field where convolutional neural networks (CNNs) shine. After running numerous experiments, we defined CNN architectures that produced the best results for each of the three problems we outlined.
In our tool, lung segmentation rests on U-Net architecture with B5 EfficientNet backbone (feature extractor). The model achieves 0.93 of IoU (Intersection-over-Union), a metric that shows the area of overlap between the predicted segmentation and the ground truth. Disease classification is held by B4 EfficientNet. And pneumothorax localization is based on Feature Pyramid Network (FPN) with B5 EfficientNet backbone. It hits a Dice score of 0.83 (Dice score measures the similarity of two samples.)
Creating user-friendly UI with both doctors and patients in mindAll three CNNs were united into a single tool with a simple interface that would be comprehensible for everybody, no matter the level of medical expertise. The software enables uploading X-rays to be processed by the AI-based engine. The output is the image with the lung mask that shows the borders of the organ. If there is a pneumothorax, the damaged area is marked with red coloring. Also, the tool calculates the probability level of three lung conditions — pneumothorax, pneumonia, and fibrosis. The user can export results in one click to share them with a doctor or colleague.
Approach and technical info
The prototype was completed in 2 months by a team of 4 experts: 2 ML engineers, 1 software engineer, and 1 UX/UI designer. The total human effort amounted to 4-5 person/month.
Wondering how to build
an exclusive software product for your business?
Discuss your project needs with our architects.