Colorectal Cancer Segmentation

Background

Identifying whether a biopsy contains cancerous cells is often a difficult and time-consuming test.

Colorectal cancer is the second most common cause of cancer-related in the United States with an estimated 53,000 deaths in 2021 alone. An important part of cancer treatment is proper diagnosis. Generally this is done by a pathologist looking at a stained tissue sample from the patient's biopsy. Two stains conducted for this generally are pan-cytokeratin and H&E. Pan-cytokeratin is sensitive but not specific it will pick up all tumor cells and may also falsely identify non-tumor cells as tumor while H&E is the opposite. The two must be used in conjunction for optimal results.

The flaws with pan-cytokeratin include the fact that it is expensive, not readily used, and uses up more tissue samples as it should be used with H&E staining for the most accurate tumor identification.

Machine learning has been highlighted as a potential solution to this issue. There are generally two types of machine learning models for image segmentation: convolution neural networks(CNN) and vision transformers.

Vision transformers have specifically been noted for being more accurate for the semantic segmentation of images, which is essentially the labeling of pixels to different classes, than CNNs. The Swin transformer is an extremely novel form of a standard vision transformer. Swin transformers differ from other vision transformers in that they use a shifted window approach to ensure that different image patches can attenuate with each other that may generally not be able to as well as use patch merging as the image is processed further and further through the transformers in the model.

Objective

With the development of this promising novel transformer model, the overall objective of this study is to utilize the Swin transformer model to successfully identify the tumors within H&E images of patient biopsies.

Methodology

For segmenting the tumor region, the Swin transformer model from the demo provided by the paper that introduced it was first manipulated to be able to process images of the cells.

A total of 648 images from 119 patients were utilized for training the models. The 119 patients were split into 5 groups. A unique selection of 4 groups were used for training and the remaining group was used for testing how well the model identified tumor regions. A total of 5 unique models were created in this manner in a process called cross-validation.

During the training process the H&E images and their corresponding ground truth were fed into the untrained model. The model trains for 40,000 iterations. Upon completion of training, H&E images from the testing set were given to the trained model. The model produced predicted binary masks of what it thought was the tumor and non-tumor regions were. To see how accurate the models were, these masks were then compared to the ground truth of the H&E images from the testing set.

Results

The two metrics used for comparing the model masks with the ground truth masks are accuracy and intersection over union (IOU). The average of the 5 models has an accuracy and IOU for the tumor class of 95.73% ± 1.71% and 92.77% ± 1.25% respectively and 99.03% ± 0.31% and 97.77% ± 0.31% respectively for the non-tumor class.

Conclusion & Future Directions

These results show that Swin transformers can be used to accurately identify tumor regions in H&E images.

The future direction of this project is to modify the already made models to be able to identify tumor buds. These buds are a collection of 3-4 tumor cells that serve as a very important prognostic tool but are very difficult to identify.

The results from this project will be presented as a poster at the 2021 BMES conference and potentially as part of an oral presentation at the 2022 SPIE Medical Imaging conference.