Task 1: Signet ring cell detection

Each team's submission will be ranked by the following evaluation metrics separately first. The average rank of the evaluation metrics of each team will be used to as the overall rank of each team.

Detection evaluation measures include (1) instance-level recall, (2) normal region false positives, (3) Free-response Receiver Operating Characteristic (FROC). 

The ground truth rectangles for each image are of [n1,4] array, each image shall submit a prediction array of  [n2,5] array , which is [ x1,y1,x2,y2, confidence score in [0,1] ],  x1,y1,x2,y2 are the top left point and bottom right point of this rectangle. Two boxes are match if their Intersection of Union (IOU) is greater than 0.3.

Instance-level recallFor there exists overcrowded regions of Signet ring cell, as well as various appearance, it is impossible to get perfect annotation as shown in image above. The yellow cells may be signet ring cell but unlabeled in overcrowded region. Pathologists can only guarantee the labeled cells are really signet ring cell, while the unlabeled cells, it may be as well. Thus, in this problem we seriously consider instance-level recall, when precision is more than 20%. There are two types of images in test data. Positive images contain signet ring cells and negative images contain don’t. Instance-level recall is the sum of matched ground truth boxes divided by the total number of ground truth boxes, ranging from 0 to 1.

Normal region false positives: Normal region false positives is the average number of false positive predictions in the negative images. For evaluation FPs will be written as Max(100 – Normal region false positives, 0).

FROC: By adjusting confidence threshold, we can get various versions of prediction array. When the numbers of normal region false positives are 1, 2, 4, 8, 16, 32 , FROC is the average recall of these different versions of predictions. 





Task 2: Colonoscopy tissue segmentation and classification

Each team's submission will be ranked by  the following evaluation metrics separately first. The average rank of the evaluation metrics of each team will be used to as the overall rank of each team.

Evaluation measures include lesion segmentation Dice Similarity Coefficient (DSE)  and classification classification area under the curve (AUC). 

Dice Similarity Coefficient (DSC): The Dice metric measures area overlap between segmentation results and annotations. Dice is computed by where A is the sets of foreground pixels in the annotation and B is the corresponding sets of foreground pixels in the segmentation result, respectively.

Area under the curve (AUC): AUC is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative'). This can be seen as follows: 


where X_1 is the score for a positive instance and X_0  is the score for a negative instance. TPR means true positive rate, FPR means False positive rate.