Re: [eclipse-clp-users] Large graph analysis

From: Kish Shen <kisshen_at_cisco.com> Date: Fri, 09 Nov 2012 19:04:04 +0000 · This archive was generated by hypermail 2.2.0 : Tue Nov 13 2012 - 06:13:51 CET

Hi Dave,

On 08/11/2012 14:38, David Dreisigmeyer wrote:
> I have a large graph of about 200,000,000 nodes/edges, essentially RDFs
> like:
>
> some_predicate(subject,object).
>

200 million items of anything is very large. A fact like the one above 
will take about 5 words minimum to represent in the compiled form 
(without considering the indexing code), and that's if the arguments are 
simple. Additionally, if they are all the same predicate (i.e. with the 
same name and arity), this will be a very big problem for the compiler, 
as a predicate (i.e. all its clauses) are compiled as one, with the 
compiler trying to generate indexing for the individual clauses (facts 
in your case).

In general, if you have a big problem, it is generally not a good idea 
to have a static (i.e. code-based) representation of it, because this 
cannot be used directly by the program at run-time, which will need some 
dynamic representation of the problem to work on, but the code 
representing your problem will still take up (virtual) memory -- in your 
case this is likely to be several Gigabytes. So it is better to read in 
the data at runtime and construct the dynamic representation of your 
problem. However, again, a dynamic representation of your problem will 
be huge -- a simple item like an atom or a (simple) variable is 
represented by 2 words -- for 64 bit this is 16 bytes. IC domain 
variable is considerably larger than this.

> stating the facts associated with the graph.  Some of the nodes can have
> values within a finite domain but only one (potentially none) of those
> values is the correct one.  I need to find which of the values is correct
> using the available information contained in the graph as well as auxiliary
> information that is not contained in the graph itself.

Using finite domain variable is only useful if you can state some 
constraints between the variables. So you need to be able to generate 
constraints from the 'available and auxiliary information' you 
mentioned. With the numbers you are talking about, it would probably be 
an issue to process the information.

Another issue is the size of the domain -- is this also in the order of 
the number of nodes you have? If so, you can't really use IC as your 
finite domain solver, because it uses a bitmap to represent a domain as 
soon as there are holes in the domain, and this is done for every 
variable, so any domain size bigger than a few thousand is problematic.

So if you can determine the correct values for your variables without 
search (i.e. just from processing the information), you probably should 
not use a finite domain solver for your problem.

> Does anyone have suggestions about how to analyze this using ECLiPSe or
> would I have to use something else (e.g., AllegroGraph or AllegroCache)?
>

I don't know anything about these other packages, so I can't comment on 
them.

Cheers,

Kish
> Thanks,
>
> -Dave
>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_nov
>
>
>
> _______________________________________________
> ECLiPSe-CLP-Users mailing list
> ECLiPSe-CLP-Users_at_lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/eclipse-clp-users
>