Re: [eclipse-clp-users] Hunting for bottlenecks: Is factbase stored on-disk?

From: Kish Shen <kisshen_at_...5...>
Date: Wed, 12 Sep 2012 17:16:22 +0100
Hi Edwin,

On 12/09/2012 04:59, Edwin Wirawan wrote:
>
> Hi Kish,
>
> I am a colleague of David's and would like to post some follow-up questions. We would appreciate your help with the following 3 questions..
>
> 1) Kish Shen wrote:
>> ECLiPSe performs indexing on one argument during the matching of your goal
>> against clauses in your program. That is, in addition to the clause name,
>> one of the argument is used to filter out clauses that will not match.
>
> Do you mean that for both static and dynamic facts, the index is comprised of the functor and the first parameter (but only if bound)?
>

Static and dynamic codes are implemented differently. I had forgotten to 
say in my last post that the indexing I described was for static code.

Static code is compiled into instructions for ECLiPSe's abstract 
machine, which includes instructions for indexing. When you call
a static goal, the abstract machine will execute the abstract machine
instructions compiled for that clause, so the indexing for the "functor" 
(if I understand you correctly) is implicit -- this is rather like a 
conventional language, i.e. when you call the goal foo(A,B,C), you will 
only execute the compiled code for the predicate foo/3.

The compiled code for a predicate starts with indexing instructions, 
which filters the clauses that will be tried for matching your calling
goal base on *one* argument in your calling goal. This does not have to 
be the first argument (I assume that is what you mean by first 
parameter). The compiler uses some heuristics to decide which argument 
it will index on for the predicate.

The indexing that first "switches" (filter if you like) on the type of 
the chosen argument (so if you call with that argument uninstantiated, 
all the clauses will be tried), and then further indexing depending on 
the type of the argument -- in general this indexing is on the functor 
(name and arity) of the argument. Note that there is no further indexing 
if the argument is a string, and as the functor for lists is ./2. there 
is also no further filtering if the argument is a list (although it is 
treated as a different type than other structures).

Dynamic code is not compiled, but is instead stored in source form in 
the recorded database (using recorded/2 and friends). Each predicate has 
its own key in the database, so only the clauses for the called 
predicate will be accessed. Some further filtering is done based on
the arguments in your calling goal -- this is done by recorded/2 (see 
the documentation for recorded/2), but I don't know the details of what
some of filtering is done in this case.

> 2) What does compilation do with static facts that allows them to be accessed faster than dynamically-allocated ones or ones kept in a store?

Dynamic code is essentially interpreted, so their execution is much 
slower. I suspect the indexing that is done with dynamic code is less 
sophisticated than the static code, but I don't know for sure (as I 
don't know the details of the filtering done by recorded/2).

I am not sure what you mean by "ones kept in a store".

Some further points: the implementation of dynamic code was changed in 
ECLiPSe 6.0. It was simplified considerably and implemented using 
recorded/3. Use of dynamic code is not recommended for ECLiPSe, and some
of the traditional usage of dynamic code in Prolog, such as non-logical 
storage, can probably be done better in many cases using other ECLiPSe 
specific features (the user manual has a chapter on non-logical storage).

> 3) We are in the midst of profiling, but we believe the biggest time cost is in
> a predicate that checks every instance of a fact (which are dynamically
> allocated) to determine all potential matches, and must not stop at the first
> match. We considered using hashed storage, but because of this exhaustive
> looping, we don't see any advantage to it in this case. Can you recommend a
> retrieval strategy?
>

I am not sure I understand what exactly you are doing here. Are you 
saying you are using something like findall to collect all the potential 
matches to a goal? This will be very expensive, as findall (and related 
builtins) uses copying to collect the answers...
It also sounds like you don't want indexing, because you *do* want to 
check every single clause/fact you have. If you have many facts for the 
same predicate, and you do this repeatedly, this will not be quick, and 
I can't see how you can improve this without much more detail knowledge 
of what you are trying to do, but I think the most general suggestion I 
can make is: if you can avoid doing this checking for potential matches, 
then do so. [for example, if you must use dynamic predicates, and you 
want to know how many of the clauses you are asserting will match 
certain patterns, you might be able to do so by keeping track of them as 
you assert the clauses (e.g. counting clauses that has "foo" in argument 
1 etc.), rather than doing the check later as (I think) you are doing]

In any case, it sounds like you are trying to use your "facts" as some 
sort of database. As I said, although Prolog terminology calls the code 
as a "database", if you want to do database-like operations, then using 
ECLiPSe's code storage area (dynamic or static) is probably not the way 
to do it. You would be better off using a real database to do this, and 
ECLiPSe has a library lib(dbi) that interfaces to MySQL which you might 
consider using.

Note also that if you are using ECLiPSe's time profiler (which does not 
work on Windows as I said), then the profiling will not profile the 
execution time of the dynamic code correctly, because the time profiler 
basically interrupt the execution of your program repeatedly and check 
which compiled predicate you are running at that point, so for 
interpreted code, this will probably catch the built-in code running the 
interpreter instead.

Cheers,

Kish
  	   		
Received on Wed Sep 12 2012 - 16:16:36 CEST

This archive was generated by hypermail 2.2.0 : Mon Jul 09 2018 - 02:05:29 CEST