logo separator

[mkgmap-dev] TYP files and character encoding

From Gerd Petermann gpetermann_muenchen at hotmail.com on Wed Dec 18 19:08:33 GMT 2019

Hi Ticker,

I think I understand now why we didn't have a default typ file ;)
If I got that right I should revert the changes in r4395 and mkgmap should not allow or warn loudly when a typ file with a different codepage is merged?
Or should we force the usage of unicode codepage?
Or is it possible to compile mapnik.txt with cp 1252 (or any other) in a way that only those lines which contain non-matching characters are ignored?

Gerd


________________________________________
Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
Gesendet: Mittwoch, 18. Dezember 2019 19:46
An: mkgmap development
Betreff: [mkgmap-dev] TYP files and character encoding

Hi

A couple of problems with typ-files and unicode.

With 'Codepage=65001' the final contents of the labels in mapnik.typ
that is included with the composite map is unicode, but if the map is
codepage 1252, the unicode characters with the top bit set are simply
displayed as if in 1252.

Removing the codepage statement from mapnik.txt and making fixes
elsewhere to ensure that the file is read correctly as utf-8 and then
generating a map with --code-page=1252, it gives the error:

SEVE: uk.me.parabola.imgfmt.MapFailedException
 ../svn/trunk/resources/typ-files/mapnik.txt:
 (thrown in TypCompiler.makeMap())
 TYP file cannot be written in code page 1252

Changing the exception handling in imgfmt/app/typ/TypElement.java, so
that makeLabelBlock() reads as
...
    CharBuffer cb = CharBuffer.wrap(tl.getText());
    try {
        ByteBuffer buffer = encoder.encode(cb);
        out.put((byte) tl.getLang());
        out.put(buffer);
        out.put((byte) 0);
     }  catch (CharacterCodingException ignore) {
//        ignore.printStackTrace();
        String name = encoder.charset().name();
        System.out.println("Cannot represent String=" +
            tl.getLang() + "," + tl.getText() +
            " in CodePage=" + name);
//        throw newTypLabelException(name);
     }
...

It gives output like:
Cannot represent String=21,Gara|e in CodePage=windows-1252
Cannot represent String=21,Obszar przemysBowy in CodePage=windows-1252
Cannot represent String=21,ZieleD in CodePage=windows-1252
Cannot represent String=21,Zaro[la in CodePage=windows-1252
Cannot represent String=21,MokradBa in CodePage=windows-1252
Cannot represent String=21,Droga wojew\363dzka (B^Ecznik) in
CodePage=windows-1252
Cannot represent String=21,Droga szybkiego ruchu  (B^Ecznik) in
CodePage=windows-1252
Cannot represent String=21,Droga szybkiego ruchu  (B^Ecznik) in
CodePage=windows-1252
Cannot represent String=21,Zcie|ka rowerowa in CodePage=windows-1252
Cannot represent String=21,Wybrze|e in CodePage=windows-1252
Cannot represent String=21,Zcie|ka in CodePage=windows-1252
Cannot represent String=21,StrumieD in CodePage=windows-1252
Cannot represent String=21,Granica paDstwa in CodePage=windows-1252
Cannot represent String=21,Rzeka, KanaB in CodePage=windows-1252
Cannot represent String=21,StrumieD in CodePage=windows-1252
Cannot represent String=21,Ruroci^Eg in CodePage=windows-1252
Cannot represent String=21,Kabel wysokiego napi^Ycia in
CodePage=windows-1252
Cannot represent String=21,Tor wy[cigowy in CodePage=windows-1252
Cannot represent String=21,Droga szybkiego ruchu  (B^Ecznik) in
CodePage=windows-1252
Cannot represent String=21,Droga krajowa (B^Ecznik) in CodePage=windows
-1252
Cannot represent String=21,Droga wojew\363dzka (B^Ecznik) in
CodePage=windows-1252
Cannot represent String=21,Wie[ (>5 tys.) in CodePage=windows-1252
Cannot represent String=21,Wie[ (>5 tys.) in CodePage=windows-1252
Cannot represent String=21,Restauracja (AmerykaDska) in
CodePage=windows-1252
Cannot represent String=21,Restauracja (ChiDska) in CodePage=windows
-1252
Cannot represent String=21,Restauracja (Mi^Ydzynarodowa) in
CodePage=windows-1252
Cannot represent String=21,Restauracja (WBoska) in CodePage=windows
-1252
Cannot represent String=21,Restauracja (MeksykaDska) in
CodePage=windows-1252
Cannot represent String=21,Restauracja (P^Eczki) in CodePage=windows
-1252
Cannot represent String=21,Restauracja (WegetariaDska) in
CodePage=windows-1252
Cannot represent String=21,Kr^Ygle in CodePage=windows-1252
Cannot represent String=21,Sklep odzie|owy in CodePage=windows-1252
Cannot represent String=21,Wypo|yczalnia samochod\363w in
CodePage=windows-1252
Cannot represent String=21,Gara| in CodePage=windows-1252
Cannot represent String=21,Sprzeda| samochod\363w in CodePage=windows
-1252
Cannot represent String=21,Sklep |eglarski in CodePage=windows-1252
Cannot represent String=21,S^Ed in CodePage=windows-1252
Cannot represent String=21,O[rodek kultury in CodePage=windows-1252
Cannot represent String=21,Wi^Yzienie in CodePage=windows-1252
Cannot represent String=21,Stra| po|arna in CodePage=windows-1252
Cannot represent String=21,SBupek in CodePage=windows-1252
Cannot represent String=21,PrzystaD in CodePage=windows-1252
Cannot represent String=21,L^Edowisko helikopterowe in CodePage=windows
-1252
Cannot represent String=21,Wie|a in CodePage=windows-1252
Cannot represent String=21,yr\363dBo in CodePage=windows-1252
Cannot represent String=21,Pla|a in CodePage=windows-1252
Cannot represent String=21,Przyl^Edek in CodePage=windows-1252
Cannot represent String=21,SkaBa in CodePage=windows-1252

Which makes sense if codepage 1252 doesn't handle Polish (hex 0x15,
decimal 21).

NB the non ascii characters in above are messed up by my cutting and pasting.

Checking the French, on my Garmin device, the type descriptions now display accents correctly.

Ticker

_______________________________________________
mkgmap-dev mailing list
mkgmap-dev at lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


More information about the mkgmap-dev mailing list