Multiple-instance learning

Multiple-instance learning (MIL) is a variation on supervised learning. Instead of receiving a set of instances which are labeled positive or negative, the learner receives a set of bags that are labeled positive or negative. Each bag contains many instances. The most common assumption is that a bag is labeled negative if all the instances in it are negative. On the other hand, a bag is labeled positive if there is at least one instance in it which is positive. From a collection of labeled bags, the learner tries to either (i) induce a concept that will label individual instances correctly or (ii) learn how to label bags without inducing the concept.

Multiple-instance learning was originally proposed under this name by Dietterich, Lathrop & Lozano-Pérez (1997), but earlier examples of similar research exist, for instance in the work on handwritten digit recognition by Keeler, Rumelhart & Leow (1990). Recent reviews of the MIL literature include Amores (2013), which provides an extensive review and comparative study of the different paradigms, and Foulds & Frank (2010), which provides a thorough review of the different assumptions used by different paradigms in the literature.

Examples of where MIL is applied are:

Molecule activity
Predicting function for alternatively spliced isoforms Li, Menon & et al. (2014)Eksi et al. (2013)
Image classification Maron & Ratan (1998)
Text or document categorization

Numerous researchers have worked on adapting classical classification techniques, such as support vector machines or boosting, to work within the context of multiple-instance learning.

References

Dietterich, Thomas G.; Lathrop, Richard H.; Lozano-Pérez, Tomás (1997), "Solving the multiple instance problem with axis-parallel rectangles", Artificial Intelligence, 89 (1–2): 31–71, doi:10.1016/S0004-3702(96)00034-3.

Amores, Jaume (2013), "Multiple instance classification: Review, taxonomy and comparative study", Artificial Intelligence, 201: 81–105, doi:10.1016/j.artint.2013.06.003.

Foulds, James; Frank, Eibe (2010), "A Review of Multi-Instance Learning Assumptions", Knowledge Engineering Review, 25 (1): 1–25, doi:10.1017/S026988890999035X.

Keeler, James D.; Rumelhart, David E.; Leow, Wee-Kheng (1990), "Integrated segmentation and recognition of hand-printed numerals", Proceedings of the 1990 Conference on Advances in Neural Information Processing Systems (NIPS 3), pp. 557–563.

Li, H.D.; Menon, R.; et al. (2014), "The emerging era of genomic data integration for analyzing splice isoform function", Trends in Genetics, doi:10.1016/j.tig.2014.05.005 {{citation}}: Explicit use of et al. in: |last3= (help); Unknown parameter |pii= ignored (help).

Eksi, R.; Li, H.D.; Menon, R.; et al. (2013), "Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data", PLoS Comput Biol: Nov, 9(11):e1003314, doi:10.1371/journal.pcbi.1003314 {{citation}}: Explicit use of et al. in: |last4= (help)CS1 maint: unflagged free DOI (link).

Maron, O.; Ratan, A.L. (1998), "Multiple-instance learning for natural scene classification", Proceedings of the Fifteenth International Conference on Machine Learning, pp. 341–349.