## Task 1: Signet ring cell detection¶

Each team's submission will be ranked by the following evaluation metrics separately first. The average rank of the evaluation metrics of each team will be used to as the overall rank of each team.

Detection evaluation measures include (1) instance-level recall, (2) normal region false positives, (3) Free-response Receiver Operating Characteristic (FROC).

**Instance-level recall**: For there exists overcrowded regions of
Signet ring cell, as well as various appearance, it is impossible to get
perfect annotation as shown in image above. The yellow cells may be
signet ring cell but unlabeled in overcrowded region. Pathologists can
only guarantee the labeled cells are really signet ring cell, while the
unlabeled cells, it may be as well. Thus, in this problem we seriously
consider instance-level recall, when precision is more than 20%. There
are two types of images in test data. Positive images contain signet
ring cells and negative images contain don’t. Instance-level recall is
the sum of matched ground truth boxes divided by the total number of
ground truth boxes, ranging from 0 to 1.

**Normal region false positives**: Normal region false positives is the
average number of false positive predictions in the negative images. For
evaluation FPs will be written as Max(100 – Normal region false
positives, 0).

**FROC**: By adjusting confidence threshold, we can get various versions
of prediction array. When the numbers of normal region false positives
are 1, 2, 4, 8, 16, 32 , FROC is the average recall of these different
versions of predictions.

For submission: When you write your prediction result to xmls files, please use your own threshold first. Although we have FROC to deal with threshold problem using 'confidence' , Recall and FP@normal are calculated with ALL predicted cells.

## Task 2: Colonoscopy tissue segmentation and classification¶

Each team's submission will be ranked by the following evaluation metrics separately first. The average rank of the evaluation metrics of each team will be used to as the overall rank of each team.

Evaluation measures include lesion segmentation Dice Similarity Coefficient (DSE) and classification classification area under the curve (AUC).

**Dice Similarity Coefficient (DSC)**: The Dice metric measures area
overlap between segmentation results and annotations. Dice is computed
by where A is the sets of foreground pixels in the annotation and B is
the corresponding sets of foreground pixels in the segmentation result,
respectively.

**Area under the curve (AUC)**: AUC is equal to the probability that a
classifier will rank a randomly chosen positive instance higher than a
randomly chosen negative one (assuming 'positive' ranks higher than
'negative'). This can be seen as follows:

where X_1 is the score for a positive instance and X_0 is the score for a negative instance. TPR means true positive rate, FPR means False positive rate.

The mask files are jpg images. We strongly advise participants to get binary mask using threshold 128 to train your model. When evaluation, we also use 128 as threshold to get binary mask of your predictions and mask, then do DSE calculation.