User:Felix QW/Inductive logic programming

Approaches to ILP

Bottom-up search

Sketches of Golem and the Model Inference System
Inverse resolution
Least general generalisations

Top-down search

Sketch of Progol/Aleph
Beam search

Metainterpretative learning

Metarules
Constraint-based learning

Common issues

Language bias
Scoring functions
Predicate invention
Learning recursive programs

Probabilistic ILP

The problem that PILP aims at solving can be expressed as:

Given

background knowledge as a probabilistic logic program $B$ ,
a set of positive and negative examples E⁺ and E^-

Find a probabilistic logic program P such that the probability of positive examples according to P ∪ B is maximized and the probability of negative examples is minimized.

This problem has two variants: parameter learning and structure learning. In the first, we are given the structure (the rules) of P and we just want to infer the parameters of P, while in the second we want to infer both the structure and the parameters of P. Moreover, the examples can be given in the form of (partial) interpretations, ground atoms, or (partial) proofs.

4.1. Parameter Learning

Parameter learning for languages following the distribution semantics has been performed by using the Expectation Maximization (EM) algorithm or by gradient descent.

The EM algorithm is used to estimate the probability of models containing random variables that are not observed in the data. This is the case of PLP under the distribution semantics because of the use of combining rules: these imply the presence of unobserved variables. The EM algorithm consists of a cycle in which the steps of Expectation and Maximization are repeatedly performed. In the Expectation step, the distribution of the hidden variables is computed according to the current values of the parameters, while in the Maximization step, the new values of the parameters are computed. Examples of approaches that use EM are PRISM (Sato and Kameya, 2001), LFI-ProbLog (Fierens et al., 2013), and EMBLEM (Bellodi and Riguzzi, 2013). The latter two use knowledge compilation for computing the distribution of the hidden variables. RIB (Riguzzi and Di Mauro, 2012) is a system for parameter learning that uses a special EM algorithm called information bottleneck that was shown to avoid some local maxima of EM.

Gradient descent methods compute the gradient of the target function and iteratively modify the parameters moving in the direction of the gradient. An example of these methods is LeProbLog (Gutmann et al., 2008) that uses a dynamic programming algorithm for computing the gradient exploiting BDDs.

4.2. Structure Learning

One of the first structure learning works is (Koller and Pfeffer, 1997) where the authors learn the structure of first-order rules with associated probabilistic uncertainty parameters. Their approach involves generating the underlying graphical model using a Knowledge Based Model Construction approach. EM is then applied on the graphical model.

De Raedt et al. (2008b) presented an algorithm for performing theory compression on ProbLog programs. Theory compression means removing as many clauses as possible from the theory in order to maximize the probability. No new clause can be added to the theory.

SEM-CP-logic (Meert et al., 2008) learns parameters and structure of ground CP-logic programs. It performs learning by considering the Bayesian networks equivalent to CP-logic programs and by applying techniques for learning Bayesian networks. In particular, it applies the Structural Expectation Maximization (SEM) algorithm (Friedman, 1998): it iteratively generates refinements of the equivalent Bayesian network and it greedily chooses the one that maximizes the BIC score (Schwarz, 1978).

ProbFOIL (De Raedt and Thon, 2010) combines the rule learner FOIL (Quinlan and Cameron-Jones, 1993) with ProbLog. Logical rules are learned from probabilistic data in the sense that both the examples themselves and their classifications can be probabilistic. The set of rules has to allow to predict the probability of the examples from their description. In this setting, the parameters (the probability values) are fixed and the structure has to be learned.

SLIPCASE (Bellodi and Riguzzi, 2011) performs a beam search in the space of LPADs by iteratively refining probabilistic theories and optimizing the parameters of each theory with EMBLEM. This is possible as parameter learning is usually fast. SLIPCOVER (Bellodi and Riguzzi, 2014) is an evolution of SLIPCASE that uses bottom clauses generated as in Progol (Muggleton, 1995) to guide the refinement process, thus reducing the number of revisions and exploring more effectively the search space. Moreover, SLIPCOVER separates the search for promising clauses from that of the theory: the space of clauses is explored with a beam search, while the space of theories is searched greedily. Both of them use the log likelihood of the data as the guiding heuristics in the search phases, evaluated by EMBLEM.^[1]

This article incorporates text from a scholarly publication published under a copyright license that allows anyone to reuse, revise, remix and redistribute the materials in any form for any purpose: Riguzzi, Fabrizio; Bellodi, Elena; Zese, Riccardo (2014-09-18). "A History of Probabilistic Inductive Logic Programming". Frontiers in Robotics and AI. 1. doi:10.3389/frobt.2014.00006. ISSN 2296-9144.{{cite journal}}: CS1 maint: unflagged free DOI (link) Please check the source for the exact licensing terms.

References

^ Riguzzi, Fabrizio; Bellodi, Elena; Zese, Riccardo (2014-09-18). "A History of Probabilistic Inductive Logic Programming". Frontiers in Robotics and AI. 1. doi:10.3389/frobt.2014.00006. ISSN 2296-9144.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[1] Riguzzi, Fabrizio; Bellodi, Elena; Zese, Riccardo (2014-09-18). "A History of Probabilistic Inductive Logic Programming". Frontiers in Robotics and AI. 1. doi:10.3389/frobt.2014.00006. ISSN 2296-9144.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[1]