>From ok@quintus Wed Jul 27 03:16:56 1988 Relay-Version: version Notes 2.8 87/9/11; site otter.hple.hp.com From: ok@quintus Date: Wed, 27 Jul 1988 02:16:56 GMT Date-Received: Wed, 27 Jul 1988 18:23:05 GMT Subject: Grammar rule translator. Message-ID: <196@quintus.UUCP> Organization: Quintus Computer Systems, Inc. Path: otter!hplabs!sri-unix!quintus!ok Newsgroups: comp.lang.prolog Sender: news@quintus.UUCP Reply-To: ok@quintus () Lines: 273 In his recent posting about the Oxford Prolog Library, Jocelyn Ireson-Paine said that one of the components was a grammar rule translator. The description says 'The translator is essentially the same as that published in "Programming in Prolog", by Clocksin & Mellish'. The grammar rule translator which appeared in the 1st and 2nd editions of Clocksin & Mellish had the following flaws: -- there was a variable name spelling mistake in one of the clauses, which broke the translation of rules with pushback (fixed in the 3rd edition, I think); -- the translator did not expand (If->Then;Else)s (still not handled in the 3rd edition); -- the translator did not use the 'C'/3 expansion, but tried to do the terminal matching in the translator, which meant that code with cuts and ``meta-logical'' operations did not work as expected (still a problem in the 3rd edition); -- the translator does nothing useful with variables appearing as non-terminals (in the 3rd edition it goes into a loop); -- the definition of phrase/2 is wrong--it only accepts nonterminals but should accept grammar rule bodies (i.e. phrase((p,q), S) will call ','(p,q,S,[]) when it should call (p(S,S1),q(S1,[]))) (still the case in the 3rd edition). This is reasonable because Clocksin & Mellish do not purport to provide a specification of the Prolog language, only instruction in how to use it. Their grammar rule translator is presented only as the answer to an exercise. The lack of a reliable specification means that some vendors have left grammar rules out entirely, and others have botched the job. Others have added "features". For example, ALS decided to make calls to certain built-in predicates are quietly translated as themselves. The trouble with this is that this is done _quietly_ (so you never know when it has happened without looking at the code in the data base) and that there is no apparent pattern to the choice of which builtins are left alone and which are changed (so that you, or at any rate I, can never recall which are which). In the absence of an agreed standard, however, nobody can say that this was wrong. There is a group of people who purport to be developing a Prolog standard at the taxpayer's expense. You would expect that group to have addressed the grammar rule translation issue, ideally by distributing a draft specification in the form of usable Prolog code. The BSI Prolog substandard committee, however, in their great wisdom, have preferred to do other things. There's only one thing to do, then, and that's to publish source code. The code which follows in this message was rewritten from scratch for this purpose. The purpose of this code is to serve as a workable specification of phrase/3 phrase/2 'C'/3 and the translation from Lhs-->Rhs to Head:-Body form. Error checking and reporting has quite deliberately been left out, and a couple of cases have been over-generalised to accomodate this. I have supplied the dcg rule translation as a separate predicate 'dcg rule'/2 to decouple the question of what DCG rules are translated to from the question of how DCG rule translation is integrated into the rest of a Prolog system; I am not proposing that 'dcg rule'/2 should be directly available under that name. phrase/[2,3] and 'C'/3 _should_ be directly available. There may be mistakes in the code that follows. There may be things it does that it shouldn't do. There may be things it doesn't do that you think it should. Let's hear about them. There are only 66 lines of Prolog code; if we can't get something this size satisfactory, we'll _deserve_ the BSI substandard. % File : DCG.PL % Author : Richard A. O'Keefe % Updated: Tuesday July 26th, 1988. % Purpose: Definite Clause Grammar rule to Prolog clause translator. /* This file is written in the ISO 8859/1 character set. The "Copyright" line contains after the right parenthesis a character which when transmitted was character 169, the international copyright symbol. Copyright (C)) 1988, Quintus Computer Systems, Inc. This file is distributed in the hope that it will be useful, but without any warrantee. You are expressly warned that it is meant to serve as a model, not for direct use: all error checking and reporting has been omitted, and mistakes are almost surely present. Permission is granted to anyone to distribute verbatim copies of this source code as received, in any medium, provided that the copyright notice, the nonwarrantee warning, and this permission notice are preserved. Permission is granted to distribute modified versions of this source code, or of portions of it, under the above conditions, plus the conditions that all changed files carry prominent notices stating who last changed them and that the derived material is subject to this same permission notice. Permission is granted to include this material in products for which money is charged, provided that the customer is given written notice that the code is (or is derived from) material provided by Quintus Computer Systems, Inc., and that the customer is given this source code on request. ---------------------------------------------------------------- Now that we've got that (adapted from the GNU copyright notice) out of the way, here are the technical comments. The predicates are all named 'dcg '/ in order to keep out of the way, with the exception of phrase/2 and phrase/3 which bear their proper names. Only phrase/[2,3] and 'dcg rule'/2 are meant to be called directly, and 'dcg rule'/2 is meant to be called from expand_term/2. You need to keep 'dcg body'/4 and its dependents around at run time so that variables as nonterminals in DCG rule bodies will work correctly. So that Quintus have _something_ left to sell, this code has been rewritten from scratch with no error checking or reporting code at all, and a couple of places accept general grammar rule bodies where they are really supposed to demand lists of terminals. However, any rule which is valid according to the Quintus Prolog manual will be translated correctly, except that this code makes no attempt to handle module: prefixes. (The change is trivial.) Note that 'dcg rule'/2 and phrase/[2,3] are steadfast. */ % dcg rule(+Grammar Rule, -Equivalent Clause) 'dcg rule'(-->(Head0,Body0), Clause) :- 'dcg head'(Head0, Head, PushBack, S0, S), 'dcg body'(Body0, Body1, S0, S), 'dcg conj'(Body1, PushBack, Body), Clause = :-(Head,Body). % dcg head(+Head0, -Head, -PushBack, -S0, -S) % recognises both % NonTerminal, [PushBackList] --> % and % NonTerminal --> % It returns the difference pair S0\S which the body is to parse. % To avoid error checking, it will accept an arbitrary body in place % of a pushback list, but it should demand a proper list. 'dcg head'((Head0,PushBack0), Head, PushBack, S0, S1) :- !, 'dcg goal'(Head0, Head, S0, S), 'dcg body'(PushBack0, PushBack, S, S1). 'dcg head'(Head0, Head, true, S0, S) :- 'dcg goal'(Head0, Head, S0, S). % dcg goal(+Goal0, -Goal, +S0, +S) % adds the arguments S0, S at the end of Goal0, giving Goal. % It should check that Goal0 is a callable term. 'dcg goal'(Goal0, Goal, S0, S) :- functor(Goal0, F, N), N1 is N+1, N2 is N+2, functor(Goal, F, N2), arg(N2, Goal, S), arg(N1, Goal, S0), 'dcg args'(N, Goal0, Goal). % dcg args(+N, +Goal0, +Goal) % copies the first N arguments of Goal0 to Goal. 'dcg args'(N, Goal0, Goal) :- ( N =:= 0 -> true ; arg(N, Goal0, Arg), arg(N, Goal, Arg), M is N-1, 'dcg args'(M, Goal0, Goal) ). % dcg body(+Body0, -Body, +S0, +S) % translates Body0 to Body, adding arguments as needed to parse S0\S. % It should complain about bodies (such as 2) which are not callable % terms, and about lists of terminals which are not proper lists. % To avoid error checking, [a|foo] is accepted as [a],foo, but it % really should complain. ONLY the forms lists here should be treated; % other non-terminals which look like calls to built-ins could well be % commented on (no error reporting here) but should be expanded even % so. Thus X=Y as a nonterminal is to be rewritten as =(X,Y,S0,S), % perhaps with a warning. If you want the translation X=Y, use {X=Y}. 'dcg body'(Var, Body, S0, S) :- var(Var), !, Body = phrase(Var,S0,S). 'dcg body'((A0,B0), Body, S0, S) :- !, 'dcg body'(A0, A, S0, S1), 'dcg body'(B0, B, S1, S), 'dcg conj'(A, B, Body). 'dcg body'(->(A0,B0), ->(A,B), S0, S) :- !, 'dcg body'(A0, A, S0, S1), 'dcg body'(B0, B, S1, S). 'dcg body'(;(A0,B0), ;(A,B), S0, S) :- !, 'dcg disj'(A0, A, S0, S), 'dcg disj'(B0, B, S0, S). 'dcg body'({}(A), A, S, S) :- !. 'dcg body'(!, !, S, S) :- !. 'dcg body'([], true, S, S) :- !. 'dcg body'([H0|T0], Body, S0, S) :- !, 'dcg term'(H0, H, S0, S1), 'dcg body'(T0, T, S1, S), 'dcg conj'(H, T, Body). 'dcg body'(NT0, NT, S0, S) :- 'dcg goal'(NT0, NT, S0, S). % dcg term(+T0, -T, +S0, +S) % generates code (T) which succeeds when there is a terminal T0 % between S0 and S. This version uses the DEC-10 Prolog predicate % 'C'/3 for compatibility with DEC-10 Prolog, C Prolog, Quintus Prolog. % This is the only place that knows how terminals are translated, so % you could supply instead the definition % 'dcg term'(T0, S0=[T0|S], S0, S). % and reap the same benefits. The one thing you must not do is % NO! 'dcg term'(T0, true, [T0|S], S). DON'T DO THAT! 'dcg term'(T0, 'C'(S0,T0,S), S0, S). % To see why dcg disj/4 is needed, consider the translation of % ( [] | [a] ). We have to insert S1=S0 somewhere, but we do it at % "compile"-time if we can. 'dcg disj'(Body0, Body, S0, S) :- 'dcg body'(Body0, Body1, S1, S), ( S1 == S -> 'dcg conj'(S1=S0, Body1, Body) ; S1 = S0, Body = Body1 ). % dcg conj(+A, +B, -C) % combines two conjunctions A, B, giving C. Basically, we want to % ensure that there aren't any excess 'true's kicking around (in a % compiled system, that shouldn't matter). There is room for some % leeway here: I have chosen to flatten A completely. 'dcg conj'(A, true, A) :- !. 'dcg conj'(A, B, C) :- 'dcg CONJ'(A, B, C). 'dcg CONJ'(true, C, C) :- !. 'dcg CONJ'((A,As), C0, (A,C)) :- !, 'dcg CONJ'(As, C0, C). 'dcg CONJ'(A, C, (A,C)). % 'C'(S0, T, S) % is true when the terminal T "Connects" the "string positions" S0 and S. 'C'([T|S], T, S). % phrase(+NT0, ?S0) % is true when the list S0 is in the language defined by the % grammar rule body NT0. E.g. phrase(([a],[b]), [a,b]). phrase(NT0, S0) :- phrase(NT0, S0, []). % phrase(+NT0, ?S0, ?S) % is true when the list S0\S is in the language defined by the % grammar rule body NT0. E.g. phrase(([a],[b]), [a,b|X], X). phrase(NT0, S0, S) :- 'dcg body'(NT0, NT, T0, T), T0 = S0, T = S, call(NT).