# Table 2 For each classifier and threshold combination (threshold picked using validation data), we report three numbers: The number of “passing” problems (out of 30), where some test instances obtained a probability no less than the threshold *τ*, the number of “valid” problems, i.e., those passing problems for which the ratio of (true) positive test instances with score exceeding *τ* to all such instances is at least *τ*, and the average recall at threshold *τ* (averaged over the valid problems only). Note that if we average the recall over all problems, at *τ*=0.99 Append^{+} gets 0.06 (i.e., \(0.6 \times \frac{3}{30.0}\), since Append^{+} achieves 3 valid problems), while NoisyOR and AVG get respectively 0.21 and 0.26. Both the number of valid problems and recall are indicative of performance

From: On using nearly-independent feature families for high precision and confidence

Threshold τ
| ||
---|---|---|

≥0.99 | ≥0.95 | |

Audio | (0, 0, 0) | (8, 4, 0.32) |

Visual | (8, 3, 0.653) | (24, 20, 0.56) |

Append (early fuse) | (3, 1, 0.826) | (26, 16, 0.50) |

Append^{+} (early fuse)
| (7, 3, 0.60) | (23, 20, 0.63) |

NoisyOR | (24, 18, 0.35) | (29, 22, 0.56) |

AVG | (0, 0, 0) | (13, 13, 0.19) |

Calibrated AVG | (17, 12, 0.65) | (30, 26, 0.62) |