Hi David, On 04/09/2012 02:45, -dp- wrote: > We have a Java-embedded ECLiPSe 6.1 process running on Win7 Pro 64-bit > 2.67GHz that takes more than two weeks to run, but all the while the CPU > and RAM utilization is quite low. We are looking for ways of increasing CPU > utilization in order to shorten the run time. It is difficult to be very specific without more information.... However, how are you measuring CPU utilisation? If you are doing so using what Windows Task manager is giving you under 'processes' tab, the figure is based on the total number of cores you have, and as ECLiPSe does not run in parallel, you can only ever use 1 core (1 hyperthread), so even if that core is 100% busy, the utilisation % will still be low if you have many cores, as in many modern CPUs? For example, I just tried running a program on ECLiPSe that I know will maximise the CPU usage, and on my 4 core laptop (8 hyperthreads or whatever Intel calls it), I got a utilisation of about 16%. > One of our colleagues speculated that "you are doing lots of read or write > to your database on the disk and that is causing the IO bottleneck." Is any > of the ECLiPSe database kept on disk instead of RAM? No, your program code ("database") is kept in memory. This is virtual memory, so bits of it may be swapped to disc at the control of the OS. If this happens, you should see a lot of I/O to your disc, but I don't think this is easy to see with the Windows task manager. If you total memory usage is small, then this is also a little unlikely (but it does depend on what else you are running on your machine at the same time). > Any suggestions on where our bottleneck might be? > > Regards, > David > > btw, my own speculation is that, because we have 250,000+ facts with the > same functor, that the long runtime is due to serially attempting matches > across most or all of those facts. We are looking into storing few facts or > using more functors, but I've wondered if Google's parallelization of graph > search algorithms in BigQuery might someday come to ECLiPSe to allow > parallel search for backchaining. ECLiPSe performs indexing on one argument during the matching of your goal against clauses in your program. That is, in addition to the clause name, one of the argument is used to filter out clauses that will not match. If your program is performing search, have you looked at the search space your program is exploring? It is very easy for your program to be exploring a very large search space that will take very long to explore if it is poorly constrained. If you are using constraints, do you know how effective those constraints are in reducing your search space (you also need to take into account the work done in the constraint reasoning (propagation), as this can also be expensive). If you are running constraint program, you can reduce the search space by using the "right" constraint (either a pre-existing one or writing your own specialised constraint), and/or using the right heuristics for labelling your variables (if you are looking for a feasible solution only) -- again you might be able to develop your own labelling heuristics. This is one reason why we recommend that you develop your ECLiPSe code directly, as there are profiling and visualising tools that can help you. Note that the time-profiling tool is not available for Windows, so if you want to use this, you will need to run your program on a different platform. Another point: although the program space of Prolog is sometimes call the "database", this is not what it is really designed for. That is, if you really want to use database functions, i.e. to store and retrieve many items of information quickly and efficiently, you might be better off using a real modern database. ECLiPSe does have an interface to MySQL that may be useful for such situations. Finally, as I started my research on parallel Prolog, I think I will also comment a little on parallelisation of ECLiPSe: it is quite a complex task, and will involve some significant changes to the implementation of the engine. In fact, a lot of the early research at ECRC (where ECLiPSe originate from) was in parallising Prolog, and ECLiPSe still have some commented out code in the source for an or-parallel version that was distributed with ECLiPSe many years ago. You probably need a lot of work to get this code working again. This work on parallelising ECLiPSe was dropped before I joined IC-Parc, where the research concentrated on constraint and hybrid solving. I think one reason for this is that parallelising will mostly give you linear speedups (and probably at the expense of increased memory usage and overheads), while the right algorithm/heuristics can produce orders of magnitudes of speedup in solving your problem. [another consideration was that at the time (about 15-20 years ago), parallel processors were uncommon and expensive, which is of course no longer true] Cheers, KishReceived on Tue Sep 04 2012 - 19:37:03 CEST
This archive was generated by hypermail 2.2.0 : Thu Sep 06 2012 - 06:18:21 CEST