Previous Up

12.3  Definite Clause Grammars — DCGs

Grammar rules are described in many standard Prolog texts ([2]). In ECLiPSe they are provided by a predefined global2 macro for -->/2. When the parser reads a clause whose main functor is -->/2, it transforms it according to the standard rules. The syntax for DCGs is as follows:
grammar_rule --> grammar_head, ['-->'], grammar_body.

grammar_head --> non_terminal.
grammar_head --> non_terminal, [','], terminal.

grammar_body --> grammar_body, [','], grammar_body.
grammar_body --> grammar_body, [';'], grammar_body.
grammar_body --> grammar_body, ['->'], grammar_body.
grammar_body --> grammar_body, ['|'], grammar_body.
grammar_body --> iteration_spec, ['do'], grammar_body.
grammar_body --> ['-?->'], grammar_body.
grammar_body --> grammar_body_item.

grammar_body_item --> ['!'].
grammar_body_item --> ['{'], Prolog_goals, ['}'].
grammar_body_item --> non_terminal.
grammar_body_item --> terminal.
The non-terminals are syntactically identical to prolog goals (atom, compound term or variable), the terminals are lists of prolog terms (typically characters or tokens). Every term is transformed, unless it is enclosed in curly brackets. The control constructs like conjunction ,/2, disjunction (;/2 or |/2), the cut (!/0), the condition (->/1) and do-loops do not need to be enclosed in curly brackets.

The grammar can be accessed with the built-in phrase/3. The first argument of phrase/3 is the name of the grammar to be used, the second argument one is a list containing the input to be parsed. If the parsing is successful the built-in will succeed. For instance with the grammar
a --> [] | [z], a.
phrase(a, X, []) will give on backtracking: X = [z] ; X = [z, z] ; X = [z, z, z] ; ....

12.3.1  Simple DCG example

The following example illustrates a simple grammar declared using the DCGs.
sentence --> imperative, noun_phrase(Number), verb_phrase(Number).

imperative, [you] --> [].
imperative --> [].

noun_phrase(Number) --> determiner, noun(Number).
noun_phrase(Number) --> pronom(Number).

verb_phrase(Number) --> verb(Number).
verb_phrase(Number) --> verb(Number), noun_phrase(_).

determiner --> [the].

noun(singular) --> [man].
noun(singular) --> [apple].
noun(plural) --> [men].
noun(plural) --> [apples].

verb(singular) --> [eats].
verb(singular) --> [sings].
verb(plural) --> [eat].
verb(plural) --> [sing].

pronom(plural) --> [you].
The above grammar may be successfully parsed using phrase/3. If the predicate succeeds then the input has been parsed successfully.
[eclipse 1]: phrase(sentence, [the,man,eats,the,apple], []).

[eclipse 2]: phrase(sentence, [the,men,eat], []).

[eclipse 3]: phrase(sentence, [the,men,eats], []).

[eclipse 4]: phrase(sentence, [eat,the,apples], []).

[eclipse 5]: phrase(sentence, [you,eat,the,man], []). 

The predicate phrase/3 may be used to return the point at which parsing of a grammar fails — if the returned list is empty then the input has been successfully parsed.
[eclipse 1]: phrase(sentence, [the,man,eats,something,nasty],X).

X = [something, nasty]     More? (;) 

no (more) solution.
[eclipse 2]: phrase(sentence, [eat,the,apples],X).

X = [the, apples]     More? (;) 

X = []     More? (;) 

no (more) solution.
[eclipse 3]: phrase(sentence, [hello,there],X).

no (more) solution.

12.3.2  Mapping to Prolog Clauses

Grammar rule are translated to Prolog clauses by adding two arguments which represent the input before and after the nonterminal which is represented by the rule. The effect of the transformation can be observed, e.g. by switching on the all_dynamic flag so that the compiled clauses can be listed:
[eclipse 1]: set_flag(all_dynamic, on), [user].
 p(X) --> q(X).
 p(X) --> [a].
user       compiled traceable 296 bytes in 0.25 seconds

[eclipse 2]: listing.
p(_g212, _g214, _g216) :-
        q(_g212, _g214, _g216).
p(_g212, _g214, _g216) :-
        _g214 = [a|_g216].


12.3.3  Parsing other Data Structures

DCGs are in principle not limited to the parsing of lists. The predicate 'C'/3 is responsible for reading resp. generating the input tokens. The default definition is
'C'([Token|Rest], Token, Rest).
The first argument represents the parsing input before consuming Token and Rest is the input after consuming Token.

By redefining 'C'/3, it is possible to apply a DCG to other input sources than a list, e.g. to parse directly from an I/O stream:
:- local 'C'/3.
'C'(Stream-Pos0, Token, Stream-Pos1) :-
        seek(Stream, Pos0),
        read_string(Stream, " ", _, TokenString),
        atom_string(Token, TokenString),
        at(Stream, Pos1).

 sentence --> noun, [is], adjective.
 noun --> [prolog] ; [lisp].
 adjective --> [boring] ; [great].
This can then be applied to a string as follows:
[eclipse 1]: String = "prolog is great", open(String, string, S),
             phrase(sentence, S-0, S-End).
End = 15
Here is another redefinition of 'C'/3, using a similar idea, which allows the direct parsing of ECLiPSe strings as sequences of characters:
:- local 'C'/3.
'C'(String-Pos0, Char, String-Pos1) :-
        Pos0 =< string_length(String),
        string_code(String, Pos0, Char),
        Pos1 is Pos0+1.

anagram --> [].
anagram --> [_].
anagram --> [C], anagram, [C].
This can then be applied to a string as follows:
[eclipse 1]: phrase(anagram, "abba"-1, "abba"-5).
[eclipse 2]: phrase(anagram, "abca"-1, "abca"-5).
no (more) solution.
Unlike the default definition, these redefinitions of 'C'/3 are not bi-directional. Consequently, the grammar rules using them can only be used for parsing, not for generating sentences.

Note that every grammar rule uses that definition of 'C'/3 which is visible in the module where the grammar rule itself is defined.

Previous Up