Previous Up

12.3  Definite Clause Grammars — DCGs

Grammar rules are described in many standard Prolog texts ([3]). In ECLiPSe they are provided by a predefined global3 macro for -->/2. When the parser reads a clause whose main functor is -->/2, it transforms it according to the standard rules. The syntax for DCGs is as follows:

grammar_rule --> grammar_head, ['-->'], grammar_body.

grammar_head --> non_terminal.
grammar_head --> non_terminal, [','], terminal.

grammar_body --> grammar_body, [','], grammar_body.
grammar_body --> grammar_body, [';'], grammar_body.
grammar_body --> grammar_body, ['->'], grammar_body.
grammar_body --> grammar_body, ['|'], grammar_body.
grammar_body --> iteration_spec, ['do'], grammar_body.
grammar_body --> ['-?->'], grammar_body.
grammar_body --> grammar_body_item.

grammar_body_item --> ['!'].
grammar_body_item --> ['{'], Prolog_goals, ['}'].
grammar_body_item --> non_terminal.
grammar_body_item --> terminal.

The non-terminals are syntactically identical to prolog goals (atom, compound term or variable), the terminals are lists of prolog terms (typically characters or tokens). Every term is transformed, unless it is enclosed in curly brackets. The control constructs like conjunction ,/2, disjunction (;/2 or |/2), the cut (!/0), the condition (->/1) and do-loops need not be enclosed in curly brackets.

The grammar can be accessed with the built-in phrase/3. The first argument of phrase/3 is the name of the grammar to be used, the second argument is a list containing the input to be parsed. If the parsing is successful the built-in will succeed. For instance, with the grammar

a --> [] | [z], a.

phrase(a, X, []) will give on backtracking

X = [z] ; X = [z, z] ; X = [z, z, z] ; ....

12.3.1  Simple DCG example

The following example illustrates a simple grammar declared using the DCGs.

sentence --> imperative, noun_phrase(Number), verb_phrase(Number).

imperative, [you] --> [].
imperative --> [].

noun_phrase(Number) --> determiner, noun(Number).
noun_phrase(Number) --> pronoun(Number).

verb_phrase(Number) --> verb(Number).
verb_phrase(Number) --> verb(Number), noun_phrase(_).

determiner --> [the].

noun(singular) --> [man].
noun(singular) --> [apple].
noun(plural) --> [men].
noun(plural) --> [apples].

verb(singular) --> [eats].
verb(singular) --> [sings].
verb(plural) --> [eat].
verb(plural) --> [sing].

pronoun(plural) --> [you].

The above grammar may be applied by using phrase/3. If the predicate succeeds then the input has been parsed successfully.

[eclipse 1]: phrase(sentence, [the,man,eats,the,apple], []).

[eclipse 2]: phrase(sentence, [the,men,eat], []).

[eclipse 3]: phrase(sentence, [the,men,eats], []).

[eclipse 4]: phrase(sentence, [eat,the,apples], []).

[eclipse 5]: phrase(sentence, [you,eat,the,man], []).


The predicate phrase/3 may be used to return the point at which parsing of input fails—if the returned list is empty then the input has been successfully parsed.

[eclipse 1]: phrase(sentence, [the,man,eats,something,nasty],X).

X = [something, nasty]     More? (;)

no (more) solution.
[eclipse 2]: phrase(sentence, [eat,the,apples],X).

X = [the, apples]     More? (;)

X = []     More? (;)

no (more) solution.
[eclipse 3]: phrase(sentence, [hello,there],X).

no (more) solution.

12.3.2  Mapping to Prolog clauses

A grammar rule is translated to a Prolog clause by adding two arguments which represent the input before and after the nonterminal which is represented by the rule. The effect of the transformation can be observed, e.g., by calling expand_clause/2:

[eclipse 1]: expand_clause(p(X) --> q(X), Expanded).

X = X
Expanded = p(X, _250, _243) :- q(X, _250, _243)
Yes (0.00s cpu)
[eclipse 2]: expand_clause(p(X) --> [a], Expanded).

X = X
Expanded = p(X, _251, _244) :- 'C'(_251, a, _244)
Yes (0.00s cpu)

12.3.3  Parsing other data structures

DCGs are in principle not limited to the parsing of lists. The predicate ’C’/3 is responsible for reading resp. generating the input tokens. The default definition is

'C'([Token|Rest], Token, Rest).

The first argument represents the parsing input before consuming Token and Rest is the input after consuming Token.

By redefining ’C’/3, it is possible to apply a DCG to input sources other than a list, e.g., to parse directly from an I/O stream:

:- local 'C'/3.
'C'(Stream-Pos0, Token, Stream-Pos1) :-
        seek(Stream, Pos0),
        read_string(Stream, " ", _, TokenString),
        atom_string(Token, TokenString),
        at(Stream, Pos1).

 sentence --> noun, [is], adjective.
 noun --> [prolog] ; [lisp].
 adjective --> [boring] ; [great].

This can then be applied to a string as follows:

[eclipse 1]: String = "prolog is great", open(String, string, S),
             phrase(sentence, S-0, S-End).
End = 15

Here is another redefinition of ’C’/3, using a similar idea, which allows direct parsing of ECLiPSe strings as sequences of characters:

:- local 'C'/3.
'C'(String-Pos0, Char, String-Pos1) :-
        Pos0 =< string_length(String),
        string_code(String, Pos0, Char),
        Pos1 is Pos0+1.

anagram --> [].
anagram --> [_].
anagram --> [C], anagram, [C].

This can then be applied to a string as follows:

[eclipse 1]: phrase(anagram, "abba"-1, "abba"-5).
[eclipse 2]: phrase(anagram, "abca"-1, "abca"-5).
no (more) solution.

Unlike the default definition, these redefinitions of ’C’/3 are not bi-directional. Consequently, the grammar rules using them can only be used for parsing, not for generating sentences.

Note that every grammar rule uses that definition of ’C’/3 which is visible in the module where the grammar rule itself is defined.

So that the user can redefine it with a local one.

Previous Up