Then the rabbi said, "Golem, you have not been completely formed, but I am about to finish you now...You will do as I will tell you." Saying these words, Rabbi Leib finished engraving the letter Aleph. Immediately the golem began to rise." From The Golem by Isaac Bashevis Singer with illustrations by Uri Shulevitz.
The development of Aleph owes much to the advice and research of many people. Special thanks are due to: Michael Bain, Rui Camacho, Vitor Costa, James Cussens, Ross King, Donald Michie, Stephen Moyle, Stephen Muggleton, David Page, Mark Reid, Claude Sammut, Jude Shavlik, Jan Wielemaker, and Filip Zelezny.
This document provides reference information on A
Learning Engine for Proposing
Hypotheses (Aleph).
Aleph is an Inductive Logic Programming (ILP) system.
This manual is not intended to be a tutorial on ILP. A good
introduction to the theory, implementation and applications of ILP can
be found in S.H. Muggleton and L. De Raedt (1994), Inductive Logic
Programming: Theory and Methods, Jnl. Logic Programming,
19,20:629--679, available at
ftp://ftp.cs.york.ac.uk/pub/ML_GROUP/Papers/lpj.ps.gz.
Aleph is intended to be a prototype for exploring ideas. Earlier incarnations (under the name P-Progol) originated in 1993 as part of a fun project undertaken by Ashwin Srinivasan and Rui Camacho at Oxford University. The main purpose was to understand ideas of inverse entailment which eventually appeared in Stephen Muggleton's 1995 paper: Inverse Entailment and Progol, New Gen. Comput., 13:245-286, available at ftp://ftp.cs.york.ac.uk/pub/ML_GROUP/Papers/InvEnt.ps.gz. Since then, the implementation has evolved to emulate some of the functionality of several other ILP systems. Some of these of relevance to Aleph are: CProgol, FOIL, FORS, Indlog, MIDOS, SRT, Tilde, and WARMR. See section Related versions and programs for more details on obtaining some of these programs.
Aleph is written in Prolog principally for use with the Yap Prolog compiler. It should also run, albeit less efficiently, with SWI Prolog. It is maintained by Ashwin Srinivasan at the University of Oxford, and can be found at:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/aleph.pl.
If you obtain this version, and have not already done so, then subscribe to the Aleph mailing list. You can do this by e-mailing majordomo@comlab.ox.ac.uk with the body of the message containing the command: subscribe aleph. This version is free for academic use (research and teaching). If you intend to use it for commercial purposes then please contact Ashwin Srinivasan (ashwin at comlab dot ox dot ac uk). NB: Yap is available at:
Aleph requires Yap 4.1.15 or higher, compiled with the DEPTH_LIMIT flag set to 1 (that is, include -DDEPTH_LIMIT=1 in the compiler options). Aleph 5 requires SWI Version 5.1.10 or higher.
SWI Prolog is available at:
The Texinfo source file of this manual is available at:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/aleph.tex
A "Makefile" is available for generating a variety of output formats:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/Makefile.txt
During routine use, Aleph follows a very simple procedure that can be described in 4 steps:
A more advanced use of Aleph (see section Advanced use of Aleph) allows alteration to each of these steps. At the core of Aleph is the "reduction" step, presented above as a simple "subset-selection" algorithm. In fact, within Aleph, this is implemented by a (restricted) branch-and-bound algorithm which allows an intelligent enumeration of acceptable clauses under a range of different conditions. More on this can be found in section On how the single-clause search is implemented.
Aleph code is contained in a single file, usually called `alephX.pl' (the X stands for the current version number, for example aleph4.pl refers to Version 4). To load Aleph, you will need to consult this file into your Prolog compiler, with sufficient stack and heap size (the more, the better!). Here is an example of loading Aleph into the Yap compiler, with a stack size of 5000 K bytes and heap size of 20000 K bytes:
yap -s5000 -h20000 [ Restoring file startup ] yes ?- [aleph4].
Aleph requires 3 files to construct theories. The most straightforward use of Aleph would involve:
read_all(filestem)
command.
See section Read all input files.
induce
command
See section Construct a theory.
All background knowledge for Aleph is contained in a file with
a .b extension. Background knowledge is in the form of Prolog
clauses that encode information relevant to the domain. It can also
contain any directives understood by the Prolog compiler being used
(for example, :- consult(someotherfile).
). This file also contains
language and search restrictions for Aleph. The most
basic amongst these refer to modes, types and determinations
(see section Mode declarations, section Type specifications, and section Determinations).
These declare the mode of call for predicates that can appear in any clause hypothesised by Aleph. They take the form:
mode(RecallNumber,PredicateMode).
where RecallNumber
bounds the non-determinacy of a form of predicate call,
and PredicateMode
specifies a legal form for calling a predicate.
RecallNumber
can be either (a) a number specifying the number of
successful calls to the predicate;
or (b) * specifying that the predicate has bounded non-determinacy.
It is usually easiest to specify RecallNumber
as *.
PredicateMode
is a template of the form:
p(ModeType, ModeType,...)
Here are some examples of how they appear in a file:
:- mode(1,mem(+number,+list)). :- mode(1,dec(+integer,-integer)). :- mode(1,mult(+integer,+integer,-integer)). :- mode(1,plus(+integer,+integer,-integer)). :- mode(1,(+integer)=(#integer)). :- mode(*,has_car(+train,-car)).
Each ModeType is either (a) simple; or (b) structured. A simple ModeType is one of: (a) +T specifying that when a literal with predicate symbol p appears in a hypothesised clause, the corresponding argument should be an "input" variable of type T; (b) -T specifying that the argument is an "output" variable of type T; or (c) #T specifying that it should be a constant of type T. All the examples above have simple modetypes. A structured ModeType is of the form f(..) where f is a function symbol, each argument of which is either a simple or structured ModeType. Here is an example containing a structured ModeType:
:- mode(1,mem(+number,[+number|+list])).
With these directives Aleph ensures that for any hypothesised
clause of the form H:- B1, B2, ..., Bc
:
Types have to be specified for every argument of all predicates
to be used in constructing a hypothesis. This specification is
done within a mode(...,...)
statement (see section Mode declarations).
For Aleph types are just names, and no type-checking
is done. Variables of different types are treated distinctly, even if
one is a sub-type of the other.
Determination statements declare the predicated that can be used to construct a hypothesis. They take the form:
determination(TargetName/Arity,BackgroundName/Arity).
The first argument is the name and arity of the target predicate, that is, the predicate that will appear in the head of hypothesised clauses. The second argument is the name and arity of a predicate that can appear in the body of such clauses. Typically there will be many determination declarations for a target predicate, corresponding to the predicates thought to be relevant in constructing hypotheses. If no determinations are present Aleph does not construct any clauses. Determinations are only allowed for 1 target predicate on any given run of Aleph: if multiple target determinations occur, the first one is chosen.
Here are some examples of how they appear in a file:
:- determination(eastbound/1,has_car/2). :- determination(mult/3,mult/3). :- determination(p/1,'='/2).
Positive examples of a concept to be learned with Aleph are written in a file with a .f extension. The filestem should be the same as that used for the background knowledge. The positive examples are simply ground facts. Here are some examples of how they appear in a file:
eastbound(east1). eastbound(east2). eastbound(east3).
Code exists for dealing with non-ground positive examples. However, this has never been tested rigorously.
Negative examples of a concept to be learned with Aleph are written in a file with a .n extension. The filestem should be the same as that used for the background knowledge. The negative examples are simply ground facts. Here are some examples of how they appear in a file:
eastbound(west1). eastbound(west1). eastbound(west1).
Non-ground constraints can be a more compact way of expressing negative information.
Such constraints can be specified in the background knowledge file
(see section User-defined constraints). Aleph is capable of learning from
positive examples only. This is done using a Bayesian evaluation function
(see posonly
in section Evaluation functions).
Once the `filestem.b, filestem.f' and `filestem.n' files are in place, they can be read into Aleph with the command:
read_all(filestem).
Finer-grain specification of the example files can be achieved by setting
the train_pos
and train_neg
flags (see section Setting Aleph parameters).
The basic command for selecting examples and constructing a theory is:
induce.
When issued Aleph does the four steps described earlier (see section The basic Aleph algorithm). The result is usually a trace that lists clauses searched along with their positive and negative example coverage, like:
eastbound(A) :- has_car(A,B). [5/5] eastbound(A) :- has_car(A,B), short(B). [5/5] eastbound(A) :- has_car(A,B), open_car(B). [5/5] eastbound(A) :- has_car(A,B), shape(B,rectangle). [5/5]
and the final result that looks like:
[theory] [Rule 1] [Pos cover = 5 Neg cover = 0] eastbound(A) :- has_car(A,B), short(B), closed(B). [pos-neg] [5]
induce
also reports the performance on the training data as a confusion
matrix that looks like:
[Training set performance] Actual + - + 5 0 5 Pred - 0 5 5 5 5 10 Accuracy = 100%
Performance on a test data is also reported if values for the parameters test_pos
and test_neg
are set (see section Setting Aleph parameters).
The simplest use of induce
implements a simple greedy cover-set algorithm.
Aleph allows you to experiment with a number of
other ways of searching for answers (see section Advanced use of Aleph).
The final theory constructed by Aleph can be saved in a file FileName
using
the command:
write_rules(FileName).
Alternatively, the command:
write_rules.
calls write_rules/1
with the current setting for the parameter rulefile
.
Besides automatic performance reporting, the theory constructed by Aleph can be evaluated on examples in any data file using the command:
test(File,Flag,Covered,Total)
File
is the name of the data file containing the examples.
Flag
is one of show
or noshow
to show examples covered or otherwise.
Both File
and Flag
have to be provided.
test/4
then returns the following numbers.
Covered
is the number of examples in the data file covered by current theory.
Total
is the total number of examples in the data file.
Some simple examples of Aleph usage can be found in
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip
In each sub-directory you should find Aleph input files and, usually, a typescript of Aleph running on the data provided to accomplish some task.
Advanced use of Aleph allows modifications to each of the steps to the basic algorithm (see section The basic Aleph algorithm):
samplesize
in section Setting Aleph parameters).
The best clause obtained from reducing each corresponding
bottom clause is then added to the theory. Alternatively, no sampling need
be performed, and every example can be saturated and reduced
(see induce
in section Altering the search).
construct_bottom
in section Setting Aleph parameters).
Literals in the a bottom clause may be evaluated "lazily"
(see lazy_evaluate
in section Other commands). Individual bottom
clauses can be constructed and examined (see sat
in
section Other commands).
reduce
in section Other commands).
induce
in section Altering the search).
There is now some software in place that allows exploration of the following:
These are all at very early stages of development and therefore even less reliable than the rest of the code (probably).
The set/2
predicate forms the basis for setting a number of
parameter values for Aleph. Parameters are set to values
using:
set(Parameter,Value)
The current value of a parameter is obtained using:
setting(Parameter,Value)
A parameter can be un-set by using:
noset(Parameter)
Meaningful set/2
statements for Aleph are:
set(abduce,+V)
true
or false
(default false
).
If V is true
then abduction and subsequent generalisation
of abduced atoms is performed within the induce
loop. Only
predicates declared to be abducible by abducible/1
are candidates for abduction. See section Abductive learning for more details.
set(best,+V)
set(cache_clauselength,+V)
set(caching,+V)
true
or false
(default false
).
If true
then clauses and coverage are cached for future use.
Only clauses upto length set by cache_clauselength
are stored in
the cache.
set(check_redundant,+V)
true
or false
(default false
). Specifies whether a call to redundant/2
(see section Other commands) should be made for checking redundant
literals in a clause.
set(check_useless,+V)
true
or false
(default false
). If set to true
, removes literals in
the bottom clause that do not contribute to establishing variable
chains to output variables in the positive literal, or produce
output variables that are not used by any other literal in the
bottom clause.
set(classes,+V)
set(clauselength,+V)
set(clauselength_distribution,+V)
set(clauses,+V)
set(condition,+V)
true
or false
(default false
).
If true
then randomly generated examples are obtained after conditioning
the stochastic generator with the positive examples.
set(confidence,+V)
set(construct_bottom,+V)
saturation
, reduction
or false
(default saturation
). Specifies the stage at which the bottom
clause is constructed. If reduction
then it is constructed lazily
during the search. This is useful if the bottom clause is too large to be
constructed prior to search. This also sets the flag lazy_bottom
to true
. The user has to provide a refinement operator
definition (using refine/2
). If not, the refine
parameter
is set to auto
. If false
then no bottom clause is
constructed. The user would normally provide a refinement operator
definition in this case.
set(dependent,+V)
set(depth,+V)
set(explore,+V)
true
or false
(default false
).
If true
then forces search to continue until the point
that all remaining elements in the search space are
definitely worse than the current best element (normally, search would stop
when it is certain that all remaining elements are no better than
the current best. This is a weaker criterion.)
All internal pruning is turned off (see section Built-in and user-defined pruning).
set(evalfn,+V)
coverage
, compression
, posonly
,
pbayes
, accuracy
, laplace
, auto_m
,
mestimate
, entropy
,
gini
, sd
, wracc
, or user
(default coverage
).
Sets the evaluation function for a search.
See section Altering the search.
set(good,+V)
true
or false
(default false
). If true
then stores a Prolog encoding of "good" clauses found in the search.
A good clause is any clause with
utility above that specified by the setting for minscore
.
If goodfile
is set to some filename then this encoding is stored externally in that
file.
set(goodfile,+V)
set(gsamplesize,+V)
set(i,+V)
set(interactive,+V)
true
or false
(default false
). If true
then constructs theories interactively with induce_rules
and induce_tree
.
set(language,+V)
inf
(default inf
). Specifies the number of occurences of a predicate symbol
in any clause.
set(lazy_on_contradiction,+V)
true
or false
(default false
). Specifies if theorem-proving should proceed if
a constraint is violated.
set(lazy_on_cost,+V)
true
or false
(default
false
). Specifies if user-defined cost-statements require clause
coverages to be evaluated. This is normally not user-set, and decided
internally.
set(lazy_negs,+V)
true
or false
(default false
).
If true
then theorem-proving
on negative examples stops once bounds set by noise
or minacc
are violated.
set(lookahead,+V)
refine
to
auto
).
set(m,+V)
set(max_abducibles,+V)
2
). Sets an
upper bound on the maximum number of ground atoms within any abductive
explanation for an observation. See section Abductive learning.
set(max_features,+V)
inf
). Sets an
upper bound on the maximum number of boolean features constructed
by searching for good clauses. See section Feature Construction
set(minacc,+V)
set(mingain,+V)
0.05
).
Specifies the minimum expected gain from splitting a leaf when
constructing trees.
set(minpos,+V)
minscore
setting.
set(minposfrac,+V)
minpos
setting.
set(minscore,+V)
-inf
).
Set a lower bound on the utility of
of an acceptable clause. When constructing clauses,
If the best clause has utility
below this number, then it is not added to the current theory.
Beware: you can get counter-intuitive results in conjunction with
the minpos
setting.
set(moves,+V)
search
is set to rls
and rls_type
is set to an appropriate value.
set(newvars,+V)
inf
(default inf
).
Set upper bound on the number of existential variables that can be introduced
in the body of a clause.
set(nodes,+V)
set(noise,+V)
set(nreduce_bottom,+V)
true
or false
(default false
). If true
then removes literals
in the body of the bottom clause using the negative examples. The
procedure is as described by S. Muggleton and C. Feng in
Efficient induction of logic programs, Inductive Logic Programming,
S. Muggleton (ed.), AFP Press.
set(openlist,+V)
inf
(default inf
).
Set an upper bound on the beam-width to be used
in a greedy search.
set(optimise_clauses,+V)
true
or false
(default false
). If true
performs query optimisations
described by V.S. Costa, A. Srinivasan, and R.C. Camacho in
A note on two simple transformations for improving the efficiency of an ILP system.
set(permute_bottom,+V)
true
or false
(default false
). If true
randomly permutes literals in
the body of the bottom clause, within the constraints imposed by the
mode declarations. The utility of is described by P. Tschorn in
Random Local Bottom Clause Permutations for Better Search Space
Exploration in Progol-like ILP Systems. (short papers, ILP 2006).
set(portray_examples,+V)
true
or false
(default false
). If true
executes goal
aleph_portray(Term)
where Term
is one
of train_pos
, train_neg
, test_pos
, or test_neg
when executing the command show(Term)
.
set(portray_hypothesis,+V)
true
or false
(default false
). If true
executes goal
aleph_portray(hypothesis)
. This is to be written by the user.
set(portray_literals,+V)
true
or false
(default false
). If true
executes goal
aleph_portray(Literal)
where Literal
is some literal.
This is to be written by the user.
set(portray_search,+V)
true
or false
(default false
). If true
executes goal
aleph_portray(search)
. This is to be written by the user.
set(print,+V)
set(proof_strategy,+V)
restricted_sld
, sld
, or user
(default
restricted_sld
). If restricted_sld
, then examples covered
are determined by forcing current hypothesised clause to
be the first parent clause in a SLD resolution proof. If sld
then this
restriction is not enforced. The former strategy is efficient, but not
refutation complete. It is sufficient if all that is needed is to determine
how many examples are covered by the current clause, which is what is
needed when Aleph is used to construct a set of non-recursive clauses
greedily (for example using the induce/0
command: see section Construct a theory). If set to user
then Aleph expects a user-defined
predicate prove/2
, the first argument of which is a clause C
,
and the second is an example E
. prove(C,E)
succeeds if
example E
is provable using clause C
and the background
knowledge.
set(prooftime,+V)
inf
(default inf
). Sets an
upper bound on the time (in seconds) for testing whether an example
is covered. Overrides any value set for searchtime
.
set(prune_tree,+V)
true
or false
(default false
).
Determines whether rules constructed by the tree learner are subject
to pessimistic pruning (see section Tree-based theories).
set(record,+V)
true
or false
(default false
).
If true
then trace of
Aleph execution is written to a file.
The filename is given by recordfile
.
set(recordfile,+V)
record
is set to true
.
set(refine,+V)
user
, auto
,
or false
(default false
). Specifies the nature of
the customised refinement operator. In all cases, the resulting clauses are
required to subsume the bottom clause, if one exists. If false
then no
customisation is assumed and standard operation results.
If user
then the user specifies a domain-specific refinement
operator with refine/2
statements. If auto
then an automatic
enumeration of all clauses in the mode language (see section Mode declarations) is performed.
The result is a breadth-first branch-and-bound search starting from the
empty clause.
This is useful if a bottom clause is either not constructed or is
constructed lazily. No attempt is made to ensure any kind of optimality
and the same clauses may result from several different refinement paths.
Some rudimentary checking can be achieved by setting caching
to true
. The user has to ensure the following for
refine
is set to auto
: (1) the setting to auto
is
done after the modes and determinations commands, as these are used to
generate internally a set of clauses that allow enumeration of clauses
in the language; (2) all arguments that are
annotated as #T in the modes contain
generative definitions for type T. These are called be the
clauses generated internally to obtain the appropriate constants; and
(3) the head mode is clearly specified using the modeh
construct.
set(resample,+V)
inf
(default 1
). Sets the
number of times an example is resampled when selected by induce/0
or induce_cover/0
. That is, is set to some integer N, then the example
is repeatedly selected N times by induce/0
or induce_cover/0
.
set(rls_type,+V)
gsat
, wsat
, rrr
, or anneal
.
Sets the randomised search method to be one of GSAT, WSAT, RRR or
simulated annealing. Requires search
to be set to rls
,
and integer values for tries
and moves
.
See section Randomised search methods.
set(rulefile,+V)
write_rules/0
).
set(samplesize,+V)
induce
or induce_cover
commands. The best clause from the sample is added to the theory.
A value of 0 turns off random sampling, and the next uncovered example
in order of appearance in the file of training examples is selected.
set(scs_percentile,+V)
search
is set to scs
.
set(scs_prob,+V)
search
is set to scs
.
set(scs_sample,+V)
search
is set to scs
.
his overrules
any samplesizes calculated from settings
for scs_percentile
and scs_prob
.
set(search,+V)
bf
, df
, heuristic
,
ibs
, ils
, rls
, scs
id
, ic
, ar
, or false
(default bf
).
Sets the search strategy. If false
then no search is performed.
See section Altering the search.
set(searchtime,+V)
inf
(default inf
). Sets an
upper bound on the time (in seconds) for a search.
set(skolemvars,+V)
set(splitvars,+V)
true
or false
(default false
). If
set to true
before constructing a bottom clause, then variable
co-references in the bottom clause are split apart by new variables.
The new variables can occur at input or output positions of the head literal,
and only at output positions in body literals. Equality literals
between new and old variables are inserted into the bottom clause to maintain
equivalence. It may also result in variable renamed versions of other literals being
inserted into the bottom clause. All of this increases the search space considerably
and can make the search explore redundant clauses. The current version also
elects to perform variable splitting whilst constructing the bottom clause
(in contrast to doing it dynamically whilst searching). This was to avoid
unnecessary checks that could slow down the search when variable splitting was
not required. This means the bottom clause can be extremely large, and the
whole process is probably not very practical for large numbers of co-references.
The procedure has not been rigourously tested to quantify this.
set(stage,+V)
saturation
, reduction
or command
(default command
). Sets the stage of current execution. This is
normally not user-set, and decided internally.
set(store_bottom,+V)
true
or false
(default false
).
Stores bottom clause constructed for an example for future re-use.
set(subsample,+V)
true
or false
(default false
).
If true
then uses a sample of the examples (set by value
assigned to subsamplesize
) to evaluate the utility of a clause.
set(subsamplesize,+V)
inf
(default inf
). Sets an
upper bound on the number of examples sampled to evaluate the utility of
a clause.
set(temperature,+V)
search
to be set to rls
and
rls_type
to be set to anneal
.
set(test_pos,+V)
set(test_neg,+V)
set(threads,+V)
set(train_pos,-V)
read_all
command.
set(train_neg,-V)
read_all
command.
set(tree_type,+V)
classification
, class_probability
,
regression
, or model
(see (see section Tree-based theories).
set(tries,+V)
search
is set to rls
and rls_type
is set to an appropriate value.
set(typeoverlap,+V)
induce_modes/0
to determine if a pair of different types
should be given the same name. See section Mode learning for more
details.
set(uniform_sample,+V)
true
or false
(default false
). Used
when drawing clauses randomly from the clause-space. If set
set to true
then clauses are drawn by uniform random selection from
the space of legal clauses. Since there are usually many more longer
clauses than shorter ones, this will mean that clauses drawn randomly
are more likely to be long ones. If set to false
then assumes a
uniform distribution over clause lengths (up to the maximum length allowed
by clauselength
). This is not necessarily uniform over legal clauses.
If random clause selection is done without a bottom clause for guidance
then this parameter is set to false
.
set(updateback,+V)
true
or false
(default true
).
If false
then clauses found by the induce
family are not
incorporated into the background. This is experimental.
set(verbosity,+V)
verbose
to the same value. A value of 0 shows
very little.
set(version,-V)
set(walk,+V)
0
and 1
. It represents
the random walk probability for the Walksat algorithm.
set(+P,+V)
record
). For
example, set(experiment,'Run 1 with background B0')
.
Aleph allows the basic procedure for theory construction to be
altered in a number of ways. Besides the induce
command, there are several other commands that can be used
to construct a theory. The induce
family of commands are:
induce/0
. This has already been described in
detail previously (see section Construct a theory);
induce_cover/0
.
This command
is very similar to induce
.
The only difference is that positive examples covered by a clause are
not removed prior to seeding on a new (uncovered) example. After a
search with induce_cover
Aleph only removes the
the examples covered by the best clause are removed from a pool of
seed examples only. After this, a new example or set of examples
is chosen from the seeds left, and the process repeats.
The theories returned by induce
and induce_cover
are dependent
on the order in which positive examples are presented;
induce_max/0
.
The theory returned by this command is unaffected by the ordering
of positive examples. This is because it saturates and reduces every
example. The search is made more efficient by
remembering the coverage of the best clause obtained so far for
each example being generalised. Both induce_cover
and induce_max
are slower than induce
, and usually produce clauses with a great
deal of overlap in coverage. A separate program will have to be
used to find some subset of these clauses that minimises this
overlap (see T-Reduce in section Related versions and programs).
induce_incremental/0
.
This command constructs a theory in an incremental mode: the
user is allowed to update the examples and background knowledge.
This mode of learning is described further in section Incremental construction of theories.
induce_clauses/0
.
This command is simply induce/0
or induce_incremental/0
depending on whether the flag interactive
is false
or
true
.
induce_theory/0
.
This command abandons the clause-by-clause approach to theory
construction. Instead, search is done at the theory-level. This
is untested and the current implementation should not be considered
definitive. See section Theory-level search for more details.
induce_tree/0
.
This command abandons the clause-by-clause approach to theory
construction. Instead, search is done by constructing a
tree using the standard recursive-partitioning approach.
See section Tree-based theories for more details.
induce_constraints/0
.
This command abandons the search for predictive clauses.
Instead, search results in all constraints that hold within the
background knowledge provided.
See section Constraint learning for more details.
induce_modes/0
.
This command searches for a mode and type assignment that
is consistent with the background knowledge provided.
See section Mode learning for more details.
induce_features/0
.
This command searches for boolean features given the
examples and the background knowledge.
See section Feature Construction for more details.
The search for individual clauses (when performed) is principally
affected by two parameters. One
sets the search strategy (search
) and the other sets the
evaluation function (evalfn
).
A search strategy is set using set(search,Strategy)
.
The following search strategies apply to the
clause-by-clause searches conducted by Aleph:
ar
pos_fraction
.
bf
df
heuristic
ibs
nodes
applies to any 1 iteration.
ic
induce_constraints
(see section Constraint learning)
id
ils
bf
search strategy that, starting from 1, progressively increases the
upper-bound on the number of occurrences of a predicate symbol in any
clause. Limit set by value for nodes
applies to any 1 iteration.
This language-based search was developed by Rui Camacho and is described in
his PhD thesis.
rls
rls_type
(see section Setting Aleph parameters). GSAT, RRR, and annealing
all employ random multiple restarts, each of which serves as the
starting point for local moves in the search space. A limit
on the number of restarts is specified by the parameter
tries
and that on the number of moves by moves
.
Annealing is currently restricted to a using a fixed temperature,
making it equivalent to an algorithm due to Metropolis. The temperature
is specified by setting the parameter temperature
. The implementation
of WSAT requires a "random-walk probability", which is specified by the
parameter walk
. A walk probability of 0 is equivalent to GSAT.
More details on randomised search can be found in
section Randomised search methods.
scs
scs_sample
or is calculated from the settings for
scs_prob
and scs_percentile
. These represent: the
minimum probability of selecting a "good" clause; and the meaning
of a "good" clause, namely, that it is in the top K-percentile of clauses.
This invokes GSAT search with tries
set to the sample size
and moves
set to 0. Clause selection can either be blind or
informed by some preliminary Monte-Carlo style estimation. This is
controlled by scs_type
. More details can be found in
section Randomised search methods.
false
An evaluation function is set using set(evalfn,Evalfn)
.
The following clause evaluation functions are recognised by Aleph:
accuracy
P/(P+N)
, where P
, N
are the number of positive and negative examples covered by the clause.
auto_m
mestimate
below)
with the value of m
automatically
set to be the maximum likelihood estimate of the best value of m
.
compression
P - N - L + 1
, where P
, N
are the number of positive and negative examples covered by the clause, and L
the number of literals in the clause.
coverage
P - N
, where P
, N
are the number of positive and
negative examples covered by the clause.
entropy
p log p + (1-p) log (1-p)
where p = P/(P + N)
and
P
, N
are the number of positive and
negative examples covered by the clause.
gini
2p(1-p)
where p = P/(P + N)
and
P
, N
are the number of positive and
negative examples covered by the clause.
laplace
(P+1)/(P+N+2)
where P
, N
are the positive and
negative examples covered by the clause.
mestimate
m
is set by set(m,M)
.
pbayes
posonly
modeh
declaration is necessary.
sd
user
-C
, where C
is the value returned
by a user-defined cost function. See section User-defined cost specification.
wracc
Two sorts of pruning can be distinguished within Aleph when performing a clause-level search. Internal pruning refers to built-in pruning that performs admissible removal of clauses from a search. This is currently available for the following evaluation functions: auto_m, compression, coverage, laplace, mestimate, posonly, and wracc. User-defined prune statements can be written to specify the conditions under which a user knows for certain that a clause (or its refinements) could not possibly be an acceptable hypothesis. Such clauses are pruned from the search. The "prune" definition is written in the background knowledge file (that has extension `.b'). The definition is distinguished by the fact that they are all rules of the form:
prune((ClauseHead:-ClauseBody)) :- Body.
The following example is from a pharmaceutical application that states that every extension of a clause representing a "pharmacophore" with six "pieces" is unacceptable, and that the search should be pruned at such a clause.
prune((Head:-Body)) :- violates_constraints(Body). violates_constraints(Body) :- has_pieces(Body,Pieces), violates_constraints(Body,Pieces). violates_constraints(Body,[_,_,_,_,_,_]). has_pieces(...) :-
The use of such pruning can greatly improve Aleph's efficiency. They can be seen as a special case of providing distributional information about the hypothesis space.
The use of a user-specified cost function is a fundamental construct in statistical decision theory, and provides a general method of scoring descriptions. Aleph allows the specification of the cost of a clause. The cost statements are written in the background knowledge file (that has extension `.b'), and are distinguished by the fact that they are all rules of the form:
cost(Clause,ClauseLabel,Cost):- Body.
where ClauseLabel
is the list [P,N,L]
where P
is the number of positive examples covered by the
clause, N
is the number of negative examples covered by the clause
L
is the number of literals in the clause.
It is usually not possible to devise automatically admissible pruning strategies for an arbitrary cost function. Thus, when using a user-defined cost measure, Aleph places the burden of specifying a pruning strategy on the user.
Aleph accepts integrity constraints that should not be violated by a hypothesis. These are written in the background knowledge file (that has extension `.b') and are similar to the integrity constraints in the ILP programs Clint and Claudien. The constraints are distinguished by the fact that they are all rules of the form:
false:- Body.
where Body
is a set of literals that specify the condition(s) that should
not be violated by a clause found by Aleph.
It is usual to use the
hypothesis/3
(see section Other commands) command
to obtain the clause currently being considered by Aleph.
The following example is from a pharmaceutical
application that states that hypotheses
are unacceptable if they have fewer than three "pieces" or
which do not specify the distances between all pairs of pieces.
false:- hypothesis(Head,Body,_), has_pieces(Body,Pieces), length(Pieces,N), N =< 2. false:- hypothesis(_,Body,_), has_pieces(Body,Pieces), incomplete_distances(Body,Pieces).
The use of constraints is another way for Aleph to obtain interesting hypothesis without negative examples. Ordinarily, this will result in a single clause that classifies every example as positive. Such clauses can be precluded by constraints. Note also that an integrity constraint does not state that a refinement of a clause that violates one or more constraints will also be unacceptable. When constructing clauses in an incremental mode, Aleph can be instructed to add a special type of constraint to prevent the construction of overly general clauses (see section Incremental construction of theories).
Aleph allows a method of specifying the refinement operator to
be used in a clause-level search. This is done
using a Prolog definition for the predicate
refine/2
.
The definition specifies the transitions in the
refinement graph traversed in a
search.
The "refine" definition is written in the background knowledge
file (that has extension ".b"). The definition is distinguished
by the fact that they are all rules of the form:
refine(Clause1,Clause2):- Body.
This specifies that Clause1 is refined to Clause2. The definition can be nondeterministic, and the set of refinements for any one clause are obtained by repeated backtracking. For any refinement Aleph ensures that Clause2 implies the current most specific clause. Clause2 can contain cuts ("!") in its body.
The following example is from a pharmaceutical application that states that searches for a "pharmacophore" that consists of 4 "pieces" (each piece is some functional group), and associated distances in 3-D space. Auxilliary definitions for predicates like member/2 and dist/5 are not shown. representing a "pharmacophore" with six "pieces" is unacceptable, and that the search should be pruned at such a clause.
refine(false,active(A)). refine(active(A),Clause):- member(Pred1,[hacc(A,B),hdonor(A,B),zincsite(A,B)]), member(Pred2,[hacc(A,C),hdonor(A,C),zincsite(A,C)]), member(Pred3,[hacc(A,D),hdonor(A,D),zincsite(A,D)]), member(Pred4,[hacc(A,E),hdonor(A,E),zincsite(A,E)]), Clause = (active(A):- Pred1, Pred2, dist(A,B,C,D1,E1), Pred3, dist(A,B,D,D2,E2), dist(A,C,D,D3,E3), Pred4, dist(A,B,E,D4,E4), dist(A,C,E,D5,E5), dist(A,D,E,D6,E6)).
To invoke the use of such statements requires
setting refine
to user
.
For other settings of refine
see entry for refine
in section Setting Aleph parameters.
Up to early variants of Version 5 has never, in any satisfactory manner, been able to perform a specific-to-general search (in the sense, say, of Golem or CIGOL): the only way to do this was to use a user-defined refinement operator in the manner just described, that progressively generalises a clause. For example:
refine(false,Clause):- !, bottom(Clause). refine(Clause1,Clause2):- generalise(Clause1,Clause2).
(The definition for bottom/1
is available within Aleph. The definition
for generalise/2
has to be written separately.)
From Version 5 (time stamp Sun Mar 11 03:25:37 UTC 2007), a slightly more interesting approach is possible by setting the values of specific parameters. For example, with the following parameter settings:
:- set(samplesize,4). :- set(resample,4). :- set(permute_bottom,true). :- set(nreduce_bottom,true). :- set(search,false).
a call to induce/0
will perform a specific-to-general search in the
following manner: four examples are chosen at random (samplesize
is
set to 4
). Each example is resampled four times (resample
is
set to 4
), resulting in a sequence of 16 trials in which each of
the four examples appear four times in the sequence. For each entry in the
sequence, the following steps are performed: (1) The bottom clause is
constructed with body literals shuffled (permute_bottom
is set to
true
); (2) The bottom clause is generalised by using the negative
examples (nreduce_bottom
is set to true
); (3) No further
search is performed (search
is set to false
) and the
resulting clause is evaluated. The best clause is added to the
theory, the examples covered removed, and the entire process repeated.
The procedure is akin to, but not the same as that used by Golem.
A combination of a specific-to-general and other search strategies
can be used if search
is not set to false
. In this case,
a search of lattice of clauses subsuming the negative-reduced bottom
will be performed using the setting for search
.
The simplest kind of randomised search is the following: sample N elements (clauses or theories) from the search space. Score these and return the best element. Ordinal optimisation is a technique that investigates the loss in optimality resulting from this form of search. See:
http://hrl.harvard.edu/people/faculty/ho/DEDS/OO/OOTOC.html
A study of the use of this in ILP can be found in: A. Srinivasan, A study of two probabilistic methods for searching large spaces with ILP (under review), available at:
ftp://ftp.comlab.ox.ac.uk/pub/Packages/ILP/Papers/AS/dami99.ps.gz
For a clause-level search, this
is invoked by setting the parameter search
to scs
(to
denote "stochastic clause selection"). The number N is either
set by assigning a value to scs_sample
or calculated automatically
from settings for scs_prob
and scs_percentile
. If these values
are denoted "P" and "K" respectively, then the sample size is calculated
to be log(1-P)/log(1-K/100)
, which denotes the number of clauses
that have to be sampled before obtaining, with probability at least P, at least
one clause in the top K-percentile of clauses
Sampling is further controlled by
by specifying the setting scs_type
to be one of blind
or informed
.
If "blind" then clauses are uniform random selections from the space of all
legal clauses. If "informed" then they are drawn from a specific distribution
over clauselengths. This can either be pre-specified (by setting
clauselength_distribution
) or obtained automatically by a
Monte-Carlo like scheme that attempts to estimate, for each clause length,
the probablity of obtaining a clause in the top K-percentile. In either
case, the resulting distribution over clauselengths is used to first
decide on the number of literals "l" in the clause. A legal clause with
"l" literals is then constructed.
In fact, this simple randomised search is a degenerate form of a more general algorithm known as GSAT. Originally proposed within the context of determining satisfiability of propositional formulae, the basic algorithm is as follows:
currentbest:= 0 (comment: ``0'' is a conventional default answer) for i = 1 to N do current:= randomly selected starting point if current is better than currenbest then currentbest:= current for j = 1 to M do begin next:= best local move from current if next is better than currenbest then currentbest:= next current:= next end return currentbest
N
and M
represent the number of tries and moves
allowed. It is apparent that when searching for clauses,
a M
value of 0 will result
in the algorithm mimicking stochastic clause selection as described above.
A variant of this algorithm called Walksat introduces a further random
element at the point of selecting next
. This time, a biased coin
is flipped. If a "head" results then the choice is as per GSAT (that is,
the best choice amongst the local neighbours), otherwise next
is randomly assigned to one of any "potentially good" neighbours.
Potentially good neighbours are those that may lead to a better
score than the current best score. This is somewhat like
simulated annealing, where the choice is the best element if that
improves on the best score. Otherwise, the choice is made according
to a function that decays exponentially with the difference in
scores. This exponential decay is usually weighted by a "temperature"
parameter.
The randomly selected start clause is usually constructed as follows:
(1) an example is selected; (2) the bottom clause is constructed for
the example; (3) a legal clause is randomly drawn from this bottom
clause. The example may be selected by the user (using
the sat
command). If bottom clauses are not allowed (by
setting construct_bottom
to false
) then legal clauses
are constructed directly from the mode declarations. The clause selected
is either the result of uniform random selection from all legal clauses,
or the result of a specific distribution over clauselengths
(specified by setting clauselength_distribution
).
The latter is the only method permitted when bottom clauses are not allowed.
(In that case, if there is no value specified for clauselength_distribution
,
then a uniform distribution over all allowable lengths
is used.)
RRR refers to the `randomised rapid restarts' as described by F. Zelezny, A. Srinivasan, and D. Page in Lattice Search Runtime Distributions May Be Heavy-Tailed available at:
ftp://ftp.comlab.ox.ac.uk/pub/Packages/ILP/Papers/AS/rrr.ps.gz
In the current implementation,
RRR stops as soon as a clause with an requisite minimum positive coverage
(set using minpos
) and acceptable utility is reached (set using minscore
).
The procedure in the paper above stops as soon as a minimum acceptable
accuracy is reached. This same effect can be achieved by setting evalfn
to accuracy
.
It is intended that the randomised local search methods (GSAT, Walksat, RRR and annealing) can be used either for clause-level search or theory-level search. No equivalent of stochastic clause selection is provided for theory-level search: this has to be mimicked by using the randomised local search, with appropriate settings. At the clause level, local moves involve either adding or deleting a literal from the current clause. Normally, local moves in the clause-space would also involve operations on variables (introducing or removing variable co-references, associating or disassociating variables to constants). These have to accomplished within Aleph by the inclusion of an equality predicate with appropriate mode declarations. Local moves for a theory-level search are described in section Theory-level search.
Randomised local search
is invoked within Aleph by setting the parameter search
to rls
. In
addition, the type of search is specified by setting rls_type
to one of gsat
, wsat
, rrr
or anneal
. Walksat requires
a specification of a biased coin.
This is done by setting the parameter walk
to a number between
0
and 1
. This represents an upper bound on the
probability of obtaining a "tail" with the coin.
The implementation of simulated annealing
is very simple and uses a fixed temperature.
This is done by setting the parameter temperature
to some real value.
Most prominent ILP systems are "batch learners": all examples and background knowledge are in place before learning commences. The ILP system then constructs a hypothesis for the examples. A less popular, but nevertheless interesting alternative is that of "incremental learning", where examples, background and hypothesis are incrementally updated during the course of learning. Aleph allows such an incremental construction of clauses by typing:
induce_incremental.
This results in Aleph repeatedly performing the following steps:
covers
or mode(*,has_car(+train,-car))
;
Note: the command induce_clauses/0
with the flag interactive
set to true
simply performs the same function as
induce_incremental
.
The incremental mode does not preclude the use of prior sets of examples
or background information. These are provided in the usual way (in files
with .b
, .f
and .n
suffixes).
An example of using the incremental learner to construct a program for list
membership can be found in the incremental
sub-directory in:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip
An adequate explanation for a set of examples typically requires several clauses. Most ILP systems attempt to construct such explanations one clause at a time. The procedure is usually an iterative greedy set-covering algorithm that finds the best single clause (one that explains or "covers" most unexplained examples) on each iteration. While this has been shown to work satisfactorily for most problems, it is nevertheless interesting to consider implementations that attempt to search directly at the "theory-level". In other words, elements of the search space are sets of clauses, each of which can be considered a hypothesis for all the examples. The implementation in Aleph of this idea is currently at a very rudimentary level, and preliminary experiments have not demonstrated great benefits. Nevertheless, the approach, with development, could be promising. The implementation within Aleph is invoked by the command:
induce_theory.
This conducts a search that moves from one set of clauses to another. Given a clause set S local moves are the result of the following:
As noted in section Randomised search methods, the use of an equality predicate with appropriate mode declarations may be needed to achieve variable co-references, etc.
Currently, induce_cover
starts with an initial set of
at most C clauses, where this number is specified by
setting the clauses
parameter. Each of these are
randomly selected legal clauses. induce_cover
then
performs theory-level search
either using as search strategy a randomised local search method (obtained
by setting the search
parameter to rls
: see
section Randomised search methods),
or a markov chain monte carlo technique (obtained by setting
search
to mcmc
). The latter is untested.
The only evaluation function allowed is accuracy
. For
theories, this is the number (TP+TN)/(TP+TN+FP+FN)
where
TP,TN
are are the numbers of positive and negative
examples correctly classified respectively; FP
is the
numbers of negative examples incorrectly classified as positive; and
FN
is the number of positive examples incorrectly classified
as positive.
The algorithm embodied in induce
can
be seen as the first-order equivalent of a propositional rule-learning
algorithms like Clark and Niblett's CN2. There is now a substantial body of
empirical work (done by researchers in Leuven and Freiburg) demonstrating
the utility of first-order equivalents of propositional tree-learning procedures.
Tree-based learning can be seen as a special case of theory learning
and the implementation in Aleph uses the standard recursive-partitioning
approach to construct classification, regression, class probability, or
model trees. Tree-based theory construction is invoked by the command:
induce_tree.
The type of tree constructed is determined by setting tree_type
to one of: classification
, regression
, class_probability
,
or model
.
The basic procedure attempts to construct a tree to predict the output argument
in the examples. Note that the mode declarations must specify only a single
argument as output. Paths from root to leaf constitute clauses.
Tree-construction is viewed as a refinement operation: any leaf can currently
be refined (converted into a non-leaf) by extending the corresponding clause (resulting
in two new leaves). The extension is done using
Aleph's automatic refinement operator that extends clauses by a single
literal within the mode language . That is, Aleph sets refine
to auto
. Note
that using the automatic refinement operator means that the user has to
ensure that all arguments that are annotated as #T in the modes contain
generative definitions for type T. The lookahead
option allows additions of
several literals at once. The impurity function is specified by the setting the evalfn
parameter.
Currently for classification
and class_probability
trees
evalfn
must be one of entropy
or gini
. For regression
trees the evaluation function is automatically set to sd
(standard deviation).
For model
trees, evalfn
must be one of mse
(mean square error)
or accuracy
.
In all cases, the result is always presented a set of rules. Rules for
class_probability
and regression
trees make their predictions
probabilistically using the random/2
predicate provided within Aleph.
In addition, settings for the following parameters are
relevant: classes
, the list of classes occuring in examples
provided (for classification
or class_probability
trees only);
dependent
, for the argument constituting the dependent variable in the examples;
prune_tree
, for pruning rules from a tree; confidence
,
for error-based pruning of rules as described by J R Quinlan in the C4.5 book;
lookahead
, specifying the lookahead for the refinement operator to mitigate
the horizon effect from zero-gain literals; mingain
, specifying the
minimum gain required for refinement to proceed; and minpos
specifying
the minimum number of examples required in a leaf for refinement to proceed.
Forward pruning is achieved by the
parameters (mingain
) and minpos
. The former should be set to
some value greater than 0 and the latter to some value greater than 1.
Backward pruning uses error pruning of the final clauses
in the tree by correcting error estimates obtained from the training data.
Automatic error-based pruning is achieved by setting the parameter prune_tree
to auto
.
For classification
trees the resulting procedure is identical to the one for rule
pruning described by Quinlan in C4.5: Programs for Machine Learning, Morgan Kauffmann.
For regression
trees, error-based pruning results in corrections to the
sample standard deviation. These corrections assume normality of observed values in a leaf: the
method has been studied emprically by L. Torgo in "A Comparative Study of Reliable Error Estimators for
Pruning Regression Trees".
Following work by F Provost and P Domingos, pruning is not employed
for class probability prediction. At this stage, there is no pruning also
for model trees.
The prediction at each `leaf' differs for each tree type. For classification
trees, prediction is the majority class as estimated from the examples in
the leaf; for regression
trees prediction is
a value drawn randomly from a normal distribution with mean and standard
deviation estimated from the examples in the leaf; for class_probability
trees
prediction is a value drawn randomly from the (Laplace corrected) discrete
distribution of classes in the leaf; and for model
trees prediction
is achieved by a user-defined background predicate (see following).
Model trees in Aleph are constructed by examining, at each leaf, one or
more model construction predicates. These predicates are defined as part
of background knowledge, and can specify different kinds of models For
example, the predicates may be for linear regression, polynomial regression etc. for
predicting a continuous variable; a decision tree, logistic regression etc. for
predicting a nominal variable. For each kind of model, the user has
to provide a definition for a predicate that
is able to: (a) construct the model; and (b) predict using
the model constructed. The process is the same as that for lazy evaluation.
Each such predicate is specified using the model/1
command. If several
different predicates are specified, then, at each leaf, each predicate is called
to construct a model and the predicate that constructs the best model
(evaluated using the current setting for evalfn
) is returned. This can
be computationally intensive, but can lead to the construction of fairly
complex theories, in which different leaves can contain different kinds
of models (for example, linear regression models in one leaf and quadratic
regression models in another).
Tree-learning can be performed interactively, with the user specifying the
split to be selected. This is done by setting interactive
to true
before executing the induce_tree
command.
An example of using the tree learner
can be found in the tree
sub-directory in:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip
The basic Aleph algorithm constructs definite clauses normally intended to be components of a predictive model for data. Early ILP work (for example, in the Claudien system) demonstrated the value of discovering all non-Horn constraints that hold in a database. A similar functionality can be obtained within Aleph using the command:
induce_constraints.
The implementation of these ideas in Aleph uses a naive generate-and-test strategy to enumerate all constraints within the background knowledge (for the mode language provided). All constraints are of the form:
false:- ...
and are stored in the user-specified goodfile
(the specification of this
file is mandatory for induce_constraints
to work).
With appropriate mode settings for false
and not
it is possible
to identify non-Horn constraints in the same way as Claudien.
For example given the background knowledge:
male('Fred'). female('Wilma'). human('Fred'). human('Wilma').
and the mode declarations:
:- modeh(1,false). :- modeb(*,human(-person)). :- modeb(1,male(+person)). :- modeb(1,female(+person)). :- modeb(1,not(male(+person))). :- modeb(1,not(female(+person))).
Aleph identifies the following constraints:
false :- human(A), male(A), female(A). false :- human(A), female(A), male(A). false :- human(A), not male(A), not female(A). false :- human(A), not female(A), not male(A).
After removing redundant constraints (which Aleph does not do), these are equivalent to the following:
false :- human(A), male(A), female(A). male(A) ; female(A) :- human(A).
The validity of these constraints can only be guaranteed if the
background knowledge is assumed to be complete and correct. To
account for incorrect statements in the background knowledge
it may sometimes be relevant to alter the noise
setting
when obtaining constraints which now specifies the number of falsifying
substitutions tolerated. The minacc
parameter is ignored.
An example of using the constraints learner
can be found in the constraints
sub-directory in:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip
The basic Aleph algorithm assumes modes will be declared by the user which, in the past, this has been the source of some difficulty. There has been some work (by E. McCreath and A. Sharma, Proc of the 8th Australian Joint Conf on AI pages 75-82, 1995) on automatic extraction of mode and type information from the background knowledge provided. The implementation of these ideas in Aleph follows these ideas fairly closely and can be invoked by the command:
induce_modes.
Given a set of determinations, the procedure works in two parts: (i) finding equivalence classes of types; and (ii) finding an input/output assignment.
Unlike the McCreath and Sharma approach,
types in the same equivalence class are given the same name only if
they "overlap" significantly (the overlap of type1 with type2
is the proportion of elements of type1 that are also elements of type2).
Significantly here means an overlap at least some threshold
T (set using typeoverlap
, with default 0.95).
Values of typeoverlap
closer to 1.0 are more conservative, in
that they require very strong overlap before the elements are called the
same type.
Since this may not be perfect, modes are also produced
for equality statements that re-introduce co-referencing amongst
differently named types in the same equivalence class.
The user has to however explicitly include a determination declaration for
the equality predicate.
The i/o assignment is not straightforward, as we may be dealing
with non-functional definitions. The assignment sought here is one
that maximises the number of input args as this gives the
largest bottom clause. This assignment is
is sought by means of a search procedure over mode sequences.
Suppose we have a mode sequence M = <m1,m2,..m\{i-1\
>} that uses the types T.
An argument of type t in mode m\{i\
} is an input iff t overlaps
significantly (used in the same sense as earlier) with some type in T.
Otherwise the argument is an output.
The utility of each mode sequence M is f(M) = g(M) + h(M) where
g(M) is the number of input args in M; and h(M) is a (lower) estimate
of the number of input args in any mode sequence of which M is a prefix.
The search strategy adopted is a simple hill-climbing one. Note
that the procedure as implemented assumes background predicates will
be generative (which holds when the background knowledge is ground).
An example of using the mode learner
can be found in the modes
sub-directory in:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip
The basic Aleph algorithm assumes that the examples provided are observations of the target predicate to be learned. There is, in fact, nothing within the ILP framework that requires this to be the case. For example, suppose the following was already provided in the background knowledge:
grandfather(X,Y):- father(X,Z), parent(Z,Y). parent(X,Y):- father(X,Y). father('Fred','Jane'). mother('Jane','Robert'). mother('Jane','Peter').
then the examples:
grandfather('Fred','Robert'). grandfather('Fred','Peter').
are clearly not entailed by the background knowledge. Aleph would then
simply try to learn another clause for grandfather/2
, perhaps
resulting in something like:
grandfather(X,Y):- father(X,Z), mother(Z,Y).
In fact, the job would have just as easily been done, and the result more useful, if Aleph could learn the following:
parent(X,Y):- mother(X,Y).
This requires Aleph to be able to do two things. First, given observations
of grandfather/2
that are not entailed by the background knowledge,
generate instances of parent/2
that will allow the observations to be
entailed. Second, use the instances of parent/2
that
were generated to obtain the clause for parent/2
above. The first
of these steps requires a form of abduction. The second requires generalisation
in the form of learning. It is the combination of these two steps
that is called "Abductive Learning" here.
The basic procedure used by Aleph is a simplified variant of S. Moyle's Alecto program. Alecto is described in some detail in S. Moyle, "Using Theory Completion to Learn a Navigation Control Program", Proceedings of the Twelfth International Conference on ILP (ILP2002), S. Matwin and C.A. Sammut (Eds), LNAI 2583, pp 182-197, 2003. Alecto does the following: for each positive example, an "abductive explanation" is obtained. This explanation is set of ground atoms. The union of abductive explanations from all positive examples is formed (this is also a set of ground atoms). These are then generalised to give the final theory. The ground atoms in an abductive explanation are obtained using Yamamoto's SOLD resolution or SOLDR (Skip Ordered Linear resolution for Definite clauses).
Currently, abductive learning is only incorporated within the
induce
command. If abduce
is set to true
then
Aleph first tries to obtain the best clause for the observed predicate
(for example, the best clause for grandfather/2
). Abductive
explanations are then generated for all predicates marked as being
abducible (see abducible/1
) and generalisations constructed using
these. The best generalisation overall is then selected and greedy
clause identification by induce
repeats with the
observations left. Care has to be taken to ensure that abductive
explanations are indeed ground (this can be achieved by using appropriate
type predicates within the definitions of the abducible
predicates) and limited to some maximum number (this latter
requirement is for reasons of efficiency: see setting for
max_abducibles
).
It should be evident that abductive learning as described here implements a restricted form of theory revision, in which revisions are restricted to completing definitions of background predicates other than those for which observations are provided. This assumes that the background knowledge is correct, but incomplete. In general, if background predicates are both incorrect and incomplete, then a more elaborate procedure would be required.
One promising role for ILP is in the area of feature construction. A good review of the use of ILP for this can be found in S. Kramer, N. Lavrac and P. Flach (2001), Propositionalization Approaches to Relational Data Mining, in Relational Data Mining, S. Dzeroski and N. Lavrac (eds.), Springer.
Aleph uses a simple procedure to construct boolean features. The
procedure is invoked using the induce_features/0
command.
This is almost identical to the induce_cover/0
command.
Recall that induce_cover/0
uses a
a covering strategy to construct rules that explain the examples (the
slight twist being that all positive examples are retained
when evaluating clauses at any given stage).
The difference with induce_features/0
is that
all good clauses that are found during the course of constructing such rules
are stored as new features. A feature stored by Aleph contains two
bits of information: (1) a number, that acts as a feature identifier; and (2)
a clause (Head:-Body)
. Here Head
is a literal that
unifies with any of the examples with the same name and arity as Head
and
Body
is a conjunction
of literals. The intent is that the feature is true
for an example
if and only if the example unifies with Head
and Body
is
true
. For classification problems, the user has to specify the
the dependent variable. This is done using set(dependent,...)
.
The process of finding rules (and the corresponding features) continues until all examples
are covered by the rules found or the number of features exceeds a pre-defined upper limit
(controlled by set(max_features,...)
).
What constitutes a "good clause" is dictated by settings for various Aleph parameters. The following settings are an example of some parameters that are relevant:
:- set(clauselength,10). :- set(minacc,0.6). :- set(minscore,3). :- set(minpos,3). :- set(noise,50). :- set(nodes,5000). :- set(explore,true). :- set(max_features,20000).
Features found by Aleph can be shown by the show(features)
command.
Aleph can be used to show the boolean vectors for the train and test examples
using a combination of set(portray_examples,...)
, features/2
appropriate definitions for aleph_portray/1
and show(train_pos)
,
show(train_neg
) etc. Here is an example of the use
of aleph_portray/1
for examples in the training set:
aleph_portray(train_pos):- setting(train_pos,File), show_features(File,positive). aleph_portray(train_neg):- setting(train_neg,File), show_features(File,negative). show_features(File,Class):- open(File,read,Stream), repeat, read(Stream,Example), (Example = end_of_file -> close(Stream); write_features(Example,Class), fail). write_features(Example,_):- features(_,(Example:- Body)), (Body -> write(1), write(' '); write(0), write(' ')), fail. write_features(_,Class):- writeq(Class), nl.
If portray_examples
is set to true
, Aleph will
call aleph_portray(Term)
, when the command
show(Term)
is executed (with Term
being one of
train_pos
, train_neg
, test_pos
or test_neg
).
There are a number of other useful commands and predicates defined in Aleph. These are:
rdhyp
addhyp
sphyp
addgcws
covers
coversn
reduce
sat/1
command.
abducible(+V)
bottom(-V)
commutative(+V)
man(-V)
symmetric(+V)
lazy_evaluate(+V)
model(+V)
lazy_evaluate/1
).
positive_only(+V)
random(V,+D)
sat(+V)
example_saturated(-V)
show(+V)
bottom
constraints
induce_constraints
.
determinations
features
gcws
gcws
procedure.
good
minscore
).
hypothesis
modes
modehs
modebs
neg
pos
posleft
rand
evalfn
is posonly
).
search
portray(search)
).
settings
sizes
theory
test_neg
test_neg
.
test_pos
test_pos
.
train_neg
train_neg
.
train_pos
train_pos
.
Name/Arity
prove(+Clause,+Example)
Example
is provable using a clause Clause
. Clause
can be the
special term bottom
, in which case it refers to the current
bottom clause. Calls to this predicate are only made if
the flag proof_strategy
is set to user
. Settings
to flags depth
and prooftime
are ignored.
redundant(+Clause,+Lit)
Lit
is redundant in a clause Clause
. Clause
can be the
special term bottom
, in which case it refers to the current
bottom clause. Calls to this predicate are only made if
the flag check_redundant
is set to true
.
modeh(+Recall,+Mode)
*
. Mode
is a mode template as in a mode/2
declaration. Declares
a mode for the head of a hypothesised clause. Required when
evalfn
is posonly
.
modeb(+Recall,+Mode)
*
. Mode
is a mode template as in a mode/2
declaration. Declares
a mode for a literal in the body of a hypothesised clause.
text(+L,+T)
text(active(X),[X, 'is active'])
.
Then the clause active(d1)
will be written as d1 is active
.
hypothesis(-Head,-Body,-Label)
[P,N,L]
where P
is
the positive examples covered by the hypothesised clause,
N
is the negative examples covered by the hypothesised clause,
and L
is the number of literals in the hypothesised clause,
feature(+Id,+(Head:-Body))
features(?Id,?(Head:-Body))
With appropriate settings, Aleph can emulate some the functionality of the following programs: P-Progol, CProgol, FOIL, FORS, Indlog, MIDOS, SRT, Tilde and WARMR. Descriptions and pointers to these programs are available at:
http://www-ai.ijs.si/~ilpnet2/systems/
In addition the following programs and scripts are relevant.
T-Reduce
induce_cover
and induce_max
. This finds a subset of these clauses that explain
the examples adequately, and have lesser overlap in
coverage. T-Reduce uses the Yap Prolog compiler.
A copy of this program is available (without support) at:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/treduce.pl
This has not been used for several years and is vulnerable to the
usual forces of decay that afflict old programs.
GUI
Scripts
This section contains ideas and suggestions that have surfaced during the development of Aleph and its predecessor programs. The topics themselves are in no particular order. They are written in a somewhat stylised manner and reflect various personal biases. They should therefore, not be considered normative in any way.
search
to heuristic
, evalfn
to compression
, construct_bottom
to saturation
, and samplesize
to 0
, the command induce
will a construct a theory along the lines
of the Progol algorithm described by S. Muggleton. This is, however, no
substitute for the original. If you want an implementation of S. Muggleton's Progol algorithm
exactly as described in his paper, then Aleph is not suitable for you. Try CProgol instead.
The same comment applies to other programs listed in section Related versions and programs.
current_predicate(X)
goal provided by
the Prolog engine.
0
.
construct_bottom
flag to reduction
.
sat/1
or rsat/0
, followed by
the command reduce/0
are useful for this. sat(N)
constructs
the bottom clause for example number N
. rsat
constructs a
bottom clause for a randomly selected example. reduce
does a search
for an acceptable clause.
addhyp/0
then adds
the current best clause to the theory. This has the flavour of anytime-learning.
induce_incremental
command is highly interactive. It requires
the user to provide examples, and also categorise the result of searches.
This may prove quite demanding on the user, but has the flavour of
the kind of search done by a version-space algorithm.
interactive
to true
and calling induce_clauses
has the same effect as calling induce_incremental
. Trees can
also be constructed interactively by setting interactive
to true
and
calling induce_tree
.
induce/0
is often sufficient.
induce/0
, induce_cover/0
, induce_max/0
, induce_clauses/0
and
induce_incremental/0
encode
control strategies for clause-level search.
They will use any user defined refinement
operators, search and evaluation functions, beam-width restrcitions
etc that are set. In terms of speed, induce/0
is usually faster
than induce_cover/0
, which in turn is faster than induce_max/0
.
The time taken by induce_incremental/0
is not as easily
characterisable. induce_clauses/0
is simply induce/0
or induce_incremental/0
depending on whether the flag
interactive
is false
or true
respectively.
induce_max/0
results in a set of clauses that is invariant of
example ordering. Neither induce_cover/0
,
induce/0
or induce_incremental/0
have this
property.
induce_max/0
or induce_cover/0
to obtain a compact theory for prediction.
sat/1
(or rsat/0
),
reduce/0
and addhyp/0
.
samplesize
flag. This sets the number of examples
to be selected randomly by
the induce
or induce_cover
commands.
Each example seeds a different
search and the best clause is added to the theory.
samplesize
to 0
examples will be selected in the order
of appearance in the positive examples file.
This will allow replication of results
without worrying about variations due to sampling.
induce_tree
command will construct tree-structured theories.
induce_theory
command is to be used at your own peril.
i
, clauselength
, nodes
,
minpos
, minacc
, noise
,
explore
, best
, openlist
,
splitvars
.
search
, evalfn
, refine
, samplesize
.
caching
, lazy_negs
, proof_strategy
,
depth
, lazy_on_cost
, lazy_on_contradiction
,
searchtime
, prooftime
.
print
, record
, portray_hypothesis
,
portray_search
, portray_literals
, verbosity
,
test_pos
, test_neg
, train_pos
, train_neg
.
begin active:= {0}; (comment: ``0'' is a conventional starting point) C:= inf; currentbest:= anything; while active is not empty do begin remove first node k from active; (comment: k is a branching node) generate the children i=1,...,Nk of node k, and compute corresponding costs Ci and lower bounds on costs Li; for i = 1,...,Nk do if Li >= C then prune child i else begin if child i is a complete solution and Ci < C then begin C:= Ci, currentbest:= child i; prune nodes in active with lower bounds more than Ci end add child i to active end end end
search
set to bf
and evalfn
set to coverage
(the default for Aleph),
the primary and secondary keys are -L,P-N
respectively. Here
L
is the number of literals in the clause, and P,N
are the
positive and negative examples covered by the clause. This ensures clauses with
fewer literals will be chosen first. They will further be ordered on
difference in coverage;
(b) Branch set. Children are generated by refinement steps that are either
built-in (add 1 literal at a time) or user-specified. With built-in
refinement, loop detection is performed to prevent duplicate
addition of literals;
(c) Lower bounds. This
represents the lowest cost that can be achieved at this node and the sub-tree
below it. This calculation is dependent on the search method and evaluation function.
In cases where no easy lower bound is obtainable, it is taken as 0
resulting
in minimal pruning;
(d) Restrictions. The search need not proceed until activeset is empty. It may
be terminated prematurely by setting the nodes
parameter. Complete solutions
are taken to be ones that satisfy the language restrictions and any other
hypothesis-related constraints.
i
setting or smaller clauselength
or nodes
setting.
Avoid setting splitvars
to true
(it is not even clear whether
this works correctly anyway). Try relaxing minacc
or noise
to
allow clauses with lower accuracy. Set minpos
to some larger value than
the default. Set a different value to best
.
openlist
); or
using an iterative beam-width search (setting search
to ibs
);
or using randomised local search (setting search
to rls
)
with an appropriate setting for associated parameters); or
using Camacho's language search (using parameter
language
or setting search
to ils
).
searchtime
to some small
value.
posonly
evaluation function will allow construction
of theories using positive examples only (thus, some savings can be made
by ignoring negative examples).
portray/1
provide a general mechanism
of altering the view of the hypotheses and search seen by the user.
portray_hypothesis
, portray_search
and portray_literals
.
If the first is set to true
then the command show(hypothesis)
will execute portray(hypothesis)
. This has to be user-defined.
If the second flag is set to true
then the command show(search)
will execute portray(search)
. This has to be user-defined.
If the third flag is set to true
then any literal L
in a clause
constructed during the search will be shown on screen by executing
portray(L)
. This has to be user-defined.
portray
sub-directory in:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip
guess.b
, guess.f
, and
guess.n
in the numbers
sub-directory in:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip
ineq.b
, ineq.f
, and ineq.n
in the numbers
sub-directory in:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip
ineq.b
, ineq.f
, and ineq.n
in the numbers
sub-directory in:
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip
normal(Y):- not(ab0(Y)). ab0(Y):- divisible(4,Y), not(ab1(Y)). ab1(Y):- divisible(100,Y), not(ab2(Y)). ab2(Y):- divisible(400,Y).where
normal
is a year that does not contain an intercalary day.
With background knowledge of divisible/2
GCWS would automatically
specialise the clause:
normal(Y).by constructing the more elaborate theory earlier. This involves invention of the
ab0,ab1,ab2
predicates.
sat
and reduce
commands). If no acceptable clause is found, decrease the minimum accuracy
of acceptable clauses (by setting minacc
or noise
). Now do the
search again. You will probably get an overgeneral clause (that is, one that
covers more negative examples than preferrable). Now use the sphyp
command to specialise this hypothesis. Aleph will repeatedly create
examples for new abnormality predicates and generalise them until the original
overgeneral clause does not cover any negative examples. You can then elect
to add this theory by using the addgcws
command.
record_testclause
to add depth bound call to body literals.
continue_search
for
user defined cost function; fixed bug in stochastic clause selection that
attempts to select more literals than present in bottom clause.
proof_strategy
having the value user
.
search
can now have value false
, which
results in bottom clause being added to theory without
further search.
minacc
and noise
settings when evalfn
is
set to user
. Incorporated bug fixes to get_max_negs
reported by Daniel Fredouille.
induce
loop
gen_nlitnum
.
posonly
bug by fixing typo for gsamplesize
.
best_value/4
after discussions
with James Cussens, Mark Reid and Jude Shavlik.
depth_bound_call
to cover test in
test_file/2
(reported by James Cussens).
noise
and minacc
when evalfn
is user
.
discontinue_search
now fails if evalfn
is
user
.
interactive
flag to control interactive
construction of clauses and trees.
induce_clauses
.
Jump to: a - b - c - d - e - f - g - h - i - l - m - n - o - p - r - s - t - u - v - w
Jump to: a - b - c - d - e - f - g - h - i - l - m - n - o - p - r - s - t - u - v - w
This document was generated on March, 13 2007 using texi2html 1.57.