Prolog /
StringsStrings in ECLiPSe 6.2, SWI-7 and YAPJoachim Schimpf, 2013-11-28, 2013-12-07, 2013-12-26, 2014-07-11 HistoryECLiPSe (and it precedessor Sepia) has always had the string data type (which was part of early BSI standard drafts) with double-quote syntax. SWI also had strings, but up to version 6 not with double quote syntax. With SWI-7 and ECLiPSe 6.2 string support has been harmonized, and YAP is expected to agree as well. The following is a summary of the common functionality, and a record of the related discussion. Agreed Common FunctionalitySyntax
Term orderStrings fall between numbers and atoms: ?- sort([1,1.2,a,"a",X,f(a)], S). S = [X, 1.2, 1, "a", a, f(a)] Intuition: strings have a "more compound" flavour than numbers, but atoms and compounds must remain consecutive because atoms may be considered as compound terms with arity 0. String-related builtinsstring(?Term) is semidet succeeds iff Term is a string string_length(+String, -Length) is det where String is of type string. string_code(?Index, +String, ?Code) is nondet Index from 1 to length of String. Domain errors on Index and Code if negative. Character codes like ISO char_code/2. get_string_code(+Index, +String, -Code) is det like string_code/3, but deterministic and strict 1..N domain checking on Index. string_char(?Index, +String, ?Char) is nondet analogous to string_code/3. string_codes(?String, ?Codes) analogous to ISO atom_codes/2. string_chars(?String, ?Chars) analogous to ISO atom_chars/2. string_lower(+String, -Lower) is det Convert String to all lower or all upper case. atom_string(+Atom, -String) is det where Atom is of type atom, and String of type string. number_string(+Number, -String) is det Conversion between any type of number and a string. Fails if String can't be parsed as a number. The number syntax does not allow for leading or trailing spaces, nor for spaces between sign and digits. Both + and - are allowed as signs. Comments etc are not allowed. string_concat(?String1, ?String2, ?String3) is nondet analogous to ISO atom_concat/3 and previous ECLiPSe append_strings/3. sub_string(+String, ?Before, ?Length, ?After, ?Sub) is nondet analogous to ISO sub_atom/5, and identical to ECLiPSe substring/5. atomics_to_string(+Atomics, -String) is det concat list of atomic terms. Identical to previous ECLiPSe concat_string/2. atomics_to_string(+Atomics, +Glue, -String) is det concat list of atomic terms, with glue between. Identical to previous ECLiPSe join_string/2. split_string(+String, +SepChars, +PadChars, -SubStrings) is det as in ECLiPSe term_string(+Term, -String) is det If String was uninstantiated, it is bound to a string representation of Term as produced by writeq/2. If String was instantiated, it is parsed as with read/2 and the resulting term unified with Term. term_string(+Term, -String, +Options) is det If String was uninstantiated, it is bound to a string representation of Term as produced by write_term/3 (with options corresponding to writeq/2, and in addition, and potentially overridden by, the given options). If String was instantiated, it is parsed as with read_term/3 with the given options, and the resulting term unified with Term. Inapplicable options are ignored. text_to_string(+Text, -String) is det Converts different textual representations into a string. Text is either an atom, string, list of character codes (codes), or list of single-character atoms (chars). Text==[] gives String="". read_string(+Stream, +Length, -String) If Length is given, read Length characters from Stream into String. Otherwise, read until end of stream, and bind Length to the number of characters read. read_string(+Stream, +SepChars, +PadChars, -Sep, -String) Read a string from Stream, providing functionality similar to split_string/4. The predicate performs the following steps: * Skip all characters that match PadChars * Read up to a character that matches SepChars or end of file * Discard trailing characters that match PadChars from the collected input * Unify String with a string created from the input and Sep with the separator character read. If input was terminated by the end of the input, Sep is unified with -1. The predicate read_string/5 called repeatedly on an input until Sep is -1 (end of file) is equivalent to reading the entire file into a string and calling split_string/4, provided that SepChars and PadChars are not partially overlapping (which would require lookahead and could cause unexpected blocking read). Note regarding mode notation: where mode '-' is specified, mode '+' is also allowed and affects the determinism class accordingly. Situation before December 2013ECL Syntax
Builtins previously in both ECL and SWIstring(?Term) Term is a string atom_string(?Atom, ?String) but SWI allows numbers as Atom in (+,-) mode, and numbers as String in (-,+) mode string_length(+String, -Length) but SWI allows atoms and numbers as String string_code(+String, +Index, ?Code) ECL This is very unfortunate! Different argument order, 0-based in SWI vs. 1-based Index in ECL, and nondeterministic reverse mode in SWI... In ECL, this is supposed to be a very fast primitive (like arg/3), it could even be implemented as an abstract machine instruction. What about renaming the nondet version string_member/3 or the like? append_strings(?String1, ?String2, ?String3) ECL Name is historical in ECL, could add alias. substring(+String, ?Before, ?Length, ?After, ?Sub) ECL ECL ready to add underscore variant, in analogy to sub_atom/5. However, Quintus precedent is without underscore (and different argument order...) number_string(?Number, ?String) Conversion between any number and a string. Fails if String can't be parsed as a number. Builtins previously in SWI only string_codes(?String, ?Codes) ECL could add this, but subsumed by string_list/3. string_chars(?String, ?Chars) ECL could add this, but subsumed by string_list/3. Builtins previously in ECL only (ignoring deprecated ones) concat_strings(+String1, +String2, ?String3) Deterministic version of concatenation. concat_string(++List, -Dest) [redundant] Succeeds if Dest is the concatenation of the atomic terms contained in List. join_string(++List, +Glue, -String) String is the string formed by concatenating the elements of List with an instance of Glue between each of them (subsumes concat_string/2). split_string(+String, +SepChars, +PadChars, -SubStrings) Decompose String into SubStrings according to separators SepChars and padding characters PadChars. string_list(?String, ?List, +Format) Conversion between string in different encodings and a list (subsumes string_codes, string_chars, string_list). Format is bytes, codes, chars, utf8. string_list(?String, ?List) [redundant] same as string_list(String,List,bytes) substring(+String1, +String2, ?Position) [redundant] Quick semidet check for substring presence. 1-based position. term_string(?Term, ?String) In the (+,-) direction, String is like the output of writeq. In the (?,+) direction, String is parsed with read. sprintf(-String, +Format, ?ArgList) printf with output to string. Other SuggestionsOther inspiration is to be found from Quintus's lib(string). in particular the span-family (but cf. split_string/4 above) Suggestions by Richard O'KeefeThe following is based on as of 2013-12-27. substring(String,Sub,Before,Length,After) with Quintus argument order, enabling omission of arguments for substring/2,3,4, and no underscore. string_codes/2,3,4,5 like substring, but Sub is of type codes. string_chars/2,3,4,5 like substring, but Sub is of type chars. integer_string(Integer,String,Base,Zero) and /2,3 taking an integer Base 2..36 and a character to be treated as zero. This allows alternative isomorphic 0-9 Unicode sequences to be used. float_codes(Float,String,Format,Zero,Decimal,Exponent) and /3,4,5 taking a format descriptor term, Zero character, Decimal point character, exponent marker character (characters all as string). number_string(Number,String,Zero) and /2 combination of the preceding two. string_append/3 well, string_concat/3 was chosen because of ISO. atomics_to_string/2,3 as above string_leading_count(String,Set,[LengthOut,]LengthIn) determines maximal leading sequence of characters out of the Set, followed by maximal sequence of characters in the Set. string_trailing_count(String,Set,[LengthOut,]LengthIn) same from other end. Tentative suggestion for skip_input(Stream,Set) skip characters in Set read_string(+Stream,+Set,-String,+Bound,-BaseCount) reads all available characters in Set up to a limit of Bound. Also reads any diacriticals that may follow the characters. Bound and Base are counting base characters only. |