Maximizing Recall for Fixed Precision Thresholds (Bachelor's thesis)

Sunita Sarawagi, IIT Bombay
In many classification problems like node labelling in Social networks, a core hurdle to any prediction effort is that for many samples there is insufficient evidence for inferring a label, resulting in low overall accuracy. We propose that instead of providing high accuracy over all samples, we make those predictions which will be correct with high probability. This problem is equivalent to Maximizing Recall for a fixed Precision Threshold. The objective being non-convex we looked at Structural SVM framework by Joachims, tested a few non-convex objectives and concluded with a theoretical analysis for its failure on this objective. After experimentation with simple coordinate ascent, we obtained a 2x improvement in comparison with the baseline Logistic Classifier. We also proposed a new convex formulation which should capture the problem better than Structural SVMs. We also explored generalization to higher dimensional spaces using gradient based methods for non-convex optimization.