logo separator

[mkgmap-dev] [PATCH] Alpha code for Highway Symbols

From Marko Mäkelä marko.makela at iki.fi on Mon Apr 6 13:57:57 BST 2009

On Mon, Apr 06, 2009 at 02:38:15PM +0200, Johann Gail wrote:
> \u Syntax is java Syntax, and is *NOT* UTF8-Encoding!

Correct.  For example, \u2020 (the dagger symbol, †) would be
\xe2\x80\xa0 or \342\200\240 in the UTF-8 encoding and
\x20\x20 or \40\40 in UTF-16 (no matter if big or little endian,
in this case).  The octal and hex notation are 8-bit byte codes.

I think that it is much more readable to write \u2020 for U+2020 than
\xe2\x80\xa0.  The \u notation will apparently also be in the next
C and C++ syntax.

> Both of them are unicode, but the encoding scheme is different. At the  
> moment it works fine, if you use an editor, which can handle unicode  
> properly.

I'm not sure if I understand your comment.  I have understood that
java.lang.String uses something like UTF-16 internally.  I have never
seen a text file containing Unicode characters that would be encoded
in anything else than UTF-8.  As far as I understand, the MySQL database
(which I develop for a living) accepts UTF-16 string literals (called
"ucs2"), but the bug reports I've seen always have been in ASCII,
ISO 8859-1, or UTF-8.

> But it is good idea, instead of introducing a new proprietary ~[xx]  
> style, use a n existing standard, as e.g. the \u4 notation.

That exactly was my point.  It should be trivial to implement all
three notations (\x hex bytes, \ octal bytes, \u hex unicode).

	Marko



More information about the mkgmap-dev mailing list