Hi Dave, On 08/11/2012 14:38, David Dreisigmeyer wrote: > I have a large graph of about 200,000,000 nodes/edges, essentially RDFs > like: > > some_predicate(subject,object). > 200 million items of anything is very large. A fact like the one above will take about 5 words minimum to represent in the compiled form (without considering the indexing code), and that's if the arguments are simple. Additionally, if they are all the same predicate (i.e. with the same name and arity), this will be a very big problem for the compiler, as a predicate (i.e. all its clauses) are compiled as one, with the compiler trying to generate indexing for the individual clauses (facts in your case). In general, if you have a big problem, it is generally not a good idea to have a static (i.e. code-based) representation of it, because this cannot be used directly by the program at run-time, which will need some dynamic representation of the problem to work on, but the code representing your problem will still take up (virtual) memory -- in your case this is likely to be several Gigabytes. So it is better to read in the data at runtime and construct the dynamic representation of your problem. However, again, a dynamic representation of your problem will be huge -- a simple item like an atom or a (simple) variable is represented by 2 words -- for 64 bit this is 16 bytes. IC domain variable is considerably larger than this. > stating the facts associated with the graph. Some of the nodes can have > values within a finite domain but only one (potentially none) of those > values is the correct one. I need to find which of the values is correct > using the available information contained in the graph as well as auxiliary > information that is not contained in the graph itself. Using finite domain variable is only useful if you can state some constraints between the variables. So you need to be able to generate constraints from the 'available and auxiliary information' you mentioned. With the numbers you are talking about, it would probably be an issue to process the information. Another issue is the size of the domain -- is this also in the order of the number of nodes you have? If so, you can't really use IC as your finite domain solver, because it uses a bitmap to represent a domain as soon as there are holes in the domain, and this is done for every variable, so any domain size bigger than a few thousand is problematic. So if you can determine the correct values for your variables without search (i.e. just from processing the information), you probably should not use a finite domain solver for your problem. > Does anyone have suggestions about how to analyze this using ECLiPSe or > would I have to use something else (e.g., AllegroGraph or AllegroCache)? > I don't know anything about these other packages, so I can't comment on them. Cheers, Kish > Thanks, > > -Dave > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > > > > _______________________________________________ > ECLiPSe-CLP-Users mailing list > ECLiPSe-CLP-Users_at_lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/eclipse-clp-users >Received on Fri Nov 09 2012 - 19:04:13 CET
This archive was generated by hypermail 2.2.0 : Tue Nov 13 2012 - 06:13:51 CET