Vertical Bar Syntax

Author:: Joachim Schimpf
Date:: 26 June 2010
Modified:: 9 July 2010
This is in response to http://www.complang.tuwien.ac.at/ulrich/iso-prolog/3steps I have also tried to address concern by Richard O'Keefe.
Comment on "step 1"
-------------------
The proposal is to introduce a standard operator op(1105,xfy,'|').

PRO: Saves 1 keystroke in one type of program (DCG translator),
by way of allowing (X'|'Y) to be written instead of '|'(X,Y).

CON: Conflicts with historic behaviour of infix | as de-facto xfy
operator of priority 1100 (e.g. SICStus, probably Quintus).  Also
the original CHR implementation in ECLiPSe used priority 1100.

CON: Standardising an operator for a functor that has no particular
semantics within the Prolog core would set an unfortunate precedent.
Various other language extensions and libraries could reasonably
request to have their important operators standardised in the core
(loops, lambdas, finite domain solvers etc come to mind). Yes, -->
already sets this precedent in a way, which is equally unfortunate.

SUMMARY: "step 1" is undesirable and unnecessary.



Comment on "step 2"
-------------------
The suggestion is to allow unquoted infix |, but leave the interpretation
implemention-defined.

PRO: Allows legacy DCG code (which may use | instead of ; to separate
alternatives) to work in a standard Prolog with a DCG translator.

CON: The DEC-10 tradition reliably maps A|B to A;B.  Current ISO reliably
rejects A|B.  Allowing A|B with ambiguous meaning is worse than either.

CON: There does not seem to be any reason other than historical accident
for the | to ; mapping.  It is a good thing that ISO did not perpetuate it.

CON: Would presumably require a prolog flag like bar_maps_to_semicolon.

CON: DCGs can (and always could) be written with ; for alternatives.
The argument that | resembles BNF is somewhat unconvincing, given
that we use --> and not ::= to denote grammar rules.

SUMMARY: "step 2" is worse than the current situation.



Comment on "step 3"
-------------------
Step 3 is a useful proposal as it makes a nice looking syntax available.



Minimalist concrete proposal
----------------------------

1. Change 6.3.4.3 to additionally allow the ht_sep token, but only
   if '|' has been defined as an infix operator (analogous to comma):

      op = ht sep
      Abstract: |
      Priority and associativity according to op declaration.
      Condition: '|' is an infix operator.

      If accepted, | shall be equivalent to '|'.

   Rationale: allow infix unquoted bar without committing to an
   actual operator declaration in the core standard, and without
   allowing | to have different effective associativity and precedence
   from '|'.

2. In 8.14.3, disallow infix op declarations for '|' with a precedence
   lower than 1000.

   Rationale: avoid ambiguity in list contexts, but do not go as far
   as fixing the operator completely (as is done with the comma).

3. Nothing else.

   Rationale: avoid discussions about the actual operator details
   (and leave them to the corresponding language extensions).


Variant proposal
----------------

1. Change 6.3.4.3 to additionally allow the ht_sep token:

      op = ht sep
      Abstract: |
      Priority and associativity according to op declaration.

      If accepted, | shall be equivalent to '|'.

   Rationale: allow infix unquoted bar as an alternative syntax for
   a quoted infix bar.

2. In 8.14.3, disallow infix op declarations for '|' with a precedence
   lower than 1000.

   Rationale: avoid ambiguity in list contexts, but do not go as far
   as fixing the operator completely (as is done with the comma).

3. Add a default operator declaration of op(1100,xfy,'|').

   Rationale: conform to Edinburgh tradition but allow language
   extensions to make the operator more weakly binding (>1100).



Comments on standardising operators in general
----------------------------------------------
The best solution is probably not to standardise the operator at all,
because the '|'/2 functor has no special meaning within the language core.
Standardising operators "for use in language extensions" would be
setting an unfortunate precedent.

The advantage of Prolog's user-defined operators is precisely that one
does _not_ have to change the language definition in order to be able
to have new syntax for a new purpose.

Libraries and language extension packages should generally come with
their own operator declarations.  Several modern module systems allow
operators to be imported from libraries.  It has been argued that such
operator declarations may conflict: this is true, but global legislation
is a poor way of addressing this problem.  Rather, it is up to the
application writer who wants to mix such extensions to either decide on
one of the conflicting declarations (and possibly use extra parentheses),
or to provide their own per-application compromise declaration,
or not to use operator notation at all in such situations.


Comments on suggested op(1105,xfy,'|')
---------------------------------------
We would suggest that standard operators should have a precedence
spacing larger than 5 (the current minimum spacing is 50).
So instead of 1105, 1150 or at least 1110 might be more appropriate.

The traditional behaviour is 1100 xfy (SICStus, ECLiPSe).  This is
pretty ok for CHR (the original CHR implementation was in ECLiPSe),
as it allows
   foo <-> guard, guard | body1 ; body2.
and does only require extra parentheses for disjunctions in the guard
   foo <-> (guard ; guard) | body1 ; body2.
which are probably a good idea anyway.



Alternative ways of allowing the unquoted bar
---------------------------------------------
There are two main alternatives for how to extend the syntax:

1. Treat the bar similar to comma

   Change 6.3.4.3 to allow the ht_sep token in addition to atom and comma.
   As with the comma, we would have impose an operator property here,
   which should be infix and priority at least 1000 (reason see below).
   This means that only infix use is allowed, and there is no conflict
   with the list tail separator.

2. Treat the bar similar to semicolon (SWI/ECLiPSe style)

   Change 6.4.2 to add "head tail separator token" as a "name token".
   Do not change to 6.3.4.3.  This would allow unquoted bars to be used
   as arbitrary operators, and also as atoms and functors, so |(a,b) and
   foo(|) would be valid.

Additional constructs valid in both (1) and (2):
a|b		(assuming infix operator)

Additional constructs valid in (2) only:
|a		(assuming prefix operator)
a|		(assuming postfix operator)
|		plain atom
|(a,b)		plain functor

Recall that the following are anyway valid, without any extension:
'|'a		(assuming prefix operator)
a'|'b		(assuming infix operator)
a'|'		(assuming postfix operator)
'|'(a,b)
'|'

Recent SWI seems to implement some kind of restriction of (2).


Interpretation in list context
------------------------------
In a list context, we need to prevent ambiguities between infix-bars
and the head-tail separator bar.  This is analogous to preventing
ambiguities between infix commas and argument separator commas.

There are (at least) 2 ways of achieving this:

1. The way the standard forbids commas in structure and list arguments
   is by requiring the argument precedence to be less than 1000.
   This happens to also disambiguate the unquoted bar, as long as it
   is only allowed as an infix operator with precedence of at least 1000.

2. Treat comma/bar as a term terminator when in a list context.
   This is the way it is done in systems that natively do not impose
   a precedence limit on arguments (e.g. SWI, ECLiPSe - these systems
   tend to also allow bars as general atoms and functors).
   This isn't a problem implementation-wise, but it is a bit awkward
   to express in the standard's syntactic framework.  It is also
   different from how the standard avoids ambiguity with commas.