Dataset - DigestPath2019 - Grand Challenge

In the challenge, participants will be provided with 2 datasets:

Signet ring cell dataset¶

Signet ring cell carcinoma is a type of rare adenocarcinoma with poor prognosis. Early detection of such cells leads to huge improvement of patients' survival rate. However, there is no existing public dataset with annotations for studying the problem of signet ring cell detection.

This dataset has positive samples and negative samples. Training positive samples contain 77 images from 20 WSIs, with cell bounding boxes written in xml. Training negative samples contain 378 images from 79 WSIs.These negative WSIs have no signet ring, but could contain other kinds of tumor cells.

Each signet ring cell is labeled by experienced pathologists with a rectangle bounding box tightly surrounding the cell. Each image is of size 2000X2000. The training images are from 2 organs, including gastric mucosa and intestine. Because of the difficulty of manual annotation, there exist some signet ring cells who are missed by pathologists. In other words, this dataset is a noisy dataset with its positive images not fully annotated.

For method evaluation, another 56 patients' 227 images are utilized, in which 27 images from 11 patients contain ring cells. For the normal region false positives, some negative samples which are either healthy or infected by other types of cancer will be added for training and evaluation.

All whole slide images were stained by hematoxylin and eosin and scanned at X40.

Sample of parts of images:

Sample of a full image:

Colonoscopy tissue segment dataset¶

Colonoscopy pathology examination can find cells of early-stage colon tumor from small tissue slices. Pathologists need to daily examine hundreds of tissue slices, which is a time consuming and exhausting work. Here we propose a challenge task on automatic colonoscopy tissue segmentation and screening, aiming at automatic lesion segmentation and classification of the whole tissue (benign vs. malignant).

This dataset has positive samples and negative samples. Training positive samples contain 250 images of tissue from 93 WSIs, with pixel-level annotation in jpg format, where 0 means background and 255 for foreground (malignant lesion). You could simply get binary mask by a threshold 128. Training negative samples contain 410 images of tissue from 231 WSI. This negative images have no annotation because they don't have any malignant lesion.

For colonoscopy pathology examination, there are 10 or more tissues in a single WSI. To make this task easier, we selected one or two tissues in a WSI and did segment annotation by our pathologists. Also, we notice a small number of malignant grands could be missed by pathologists.

The average size of all images are of 5000x5000 pixels, some of them are extremely huge. We will also provide another 152 patients' 212 tissues as the testing set, in which 90 images from 65 patients contain lesion. All whole slide images were stained by hematoxylin and eosin and scanned at X20.

The data in the challenge will show great variations in terms of appearance because the data are collected from 4 medical centers, especially from several small centers in developing countries/regions. Image style differences can be an obstacle for the screening task. Holding the challenge and releasing the large quantity of expert level annotations will attract much attention from the medical imaging community and substantially advance the research on automatic colonoscopy screening.

The criteria for distinguishing between benign(negative) and malignant(positive) is really hard. Again, to make this task easier for a academic competition, according to WHO classification of tumours of the digestive system, we regard the following diseases as malignant lesion: high grade intraepithelial neoplasia and adenocarcinoma, including papillary adenocarcinoma, mucinous adenocarcinoma, poorly cohesive carcinoma and signet ring cell carcinoma. Low grade intraepithelial neoplasia and severe inflammation are usually hard case for pathologists. Then this dataset will not include these hard cases. Notice that in practical clinical diagnosis, pathologists would face more difficult and complicated situations.

Sample of part of image:

Sample of a full image:

The challenge only releases the training set and keeps testing set secret.