Yes, unfortunately Unicode support is still somewhat limited in ECLiPSe. Non-ISO-8859-1 characters can be used in quoted tokens like "タ" or 'タ', and can be read and written. However, as you have observed, the generic atom/string predicates still always consider them as byte sequences, assuming a fixed encoding and not recognising multi-byte characters. When you are working on the character level, the best solution currently is to do all your computation with character code (=integer) lists. As I suggested in my other mail, this anyway seems to fit your application better. You should then - get your input strings - convert the strings to lists (using string_list/3) - do all computation with lists - convert the result lists to strings (using string_list/3 in reverse) - return the result strings which means you don't need any new predicates, and the encoding/decoding problem is limited to the input/output phases. In the longer term, of course we want to improve Unicode support and your input is welcome. Cheers, Joachim On 03/07/15 04:44, Edgaonkar, Shrirang wrote: > Dear CLP users, > > The following predicate returns the Length variable as 12 since the unicode > character length is counted as 3 instead of 1. Since there are 3 characters it > gets 9 plus 3 Ascii characters equals 12. > > string_length("ABCターデ", Length), > > Whereas the following clauses would return N as 6 for the same string since it > supports utf8. > > string_list("ABCターデ", List, utf8), > > length(List, N), > > I have written a list of predicates for string manipulation. They use the > existing predicates from library Strings and Atoms like append_strings(?String1, > ?String2, ?String3) etc. If I have to support utf8 such that string_length("ABC > ターデ", Length, utf8), gives me 6, I have to write my own version for example:- > > string_length(STR, Length, utf8):- > > string_list(STR, List, utf8), > > length(List, Length). > > This is just a prototype for illustration. Please let me know if my > understanding is right. Replacing all the Strings and Atoms with utf8 support is > a task for me given they from sepia-kernel. > > Thanks and Regards, > > Shrirang EdgaonkarReceived on Sun Jul 05 2015 - 15:40:24 CEST
This archive was generated by hypermail 2.3.0 : Wed Sep 25 2024 - 15:13:21 CEST