logo separator

[mkgmap-dev] TYP files and character encoding

From Ticker Berkin rwb-mkgmap at jagit.co.uk on Wed Dec 18 19:54:03 GMT 2019

Hi Gerd

I think it is best to continue with the ideas for typ-files that:

1/ they can be in any character set and we just need a better way of
working out the correct one - see my posting earlier today.

2/ it can include as many languages as anyone can be bothered to add,
and so has to be an a character set that allows the languages to be
added, implying unicode for a common one (more particulary, UTF-8)

3/ the codepage= statement should be redundant and ignored for
controlling the output character set, which should be taken from the
map, but its use for determining the input coding might need to be kept
for a while for compatability.

4/ the messages my hack generates should be turned into 1 warning or
information message per language or maybe suppressed altogether. If
someone is generating a map with a character set that doesn't support a
particular language, they really won't care that that data for other
languages that have an incompatible representation with their language
won't be there. 

Ticker

On Wed, 2019-12-18 at 19:08 +0000, Gerd Petermann wrote:
> Hi Ticker,
> 
> I think I understand now why we didn't have a default typ file ;)
> If I got that right I should revert the changes in r4395 and mkgmap
> should not allow or warn loudly when a typ file with a different
> codepage is merged?
> Or should we force the usage of unicode codepage?
> Or is it possible to compile mapnik.txt with cp 1252 (or any other)
> in a way that only those lines which contain non-matching characters
> are ignored?
> 
> Gerd
> 
> 
> ________________________________________
> Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
> Gesendet: Mittwoch, 18. Dezember 2019 19:46
> An: mkgmap development
> Betreff: [mkgmap-dev] TYP files and character encoding
> 
> Hi
> 
> A couple of problems with typ-files and unicode.
> 
> With 'Codepage=65001' the final contents of the labels in mapnik.typ
> that is included with the composite map is unicode, but if the map is
> codepage 1252, the unicode characters with the top bit set are simply
> displayed as if in 1252.
> 
> Removing the codepage statement from mapnik.txt and making fixes
> elsewhere to ensure that the file is read correctly as utf-8 and then
> generating a map with --code-page=1252, it gives the error:
> 
> SEVE: uk.me.parabola.imgfmt.MapFailedException
>  ../svn/trunk/resources/typ-files/mapnik.txt:
>  (thrown in TypCompiler.makeMap())
>  TYP file cannot be written in code page 1252
> 
> Changing the exception handling in imgfmt/app/typ/TypElement.java, so
> that makeLabelBlock() reads as
> ...
>     CharBuffer cb = CharBuffer.wrap(tl.getText());
>     try {
>         ByteBuffer buffer = encoder.encode(cb);
>         out.put((byte) tl.getLang());
>         out.put(buffer);
>         out.put((byte) 0);
>      }  catch (CharacterCodingException ignore) {
> //        ignore.printStackTrace();
>         String name = encoder.charset().name();
>         System.out.println("Cannot represent String=" +
>             tl.getLang() + "," + tl.getText() +
>             " in CodePage=" + name);
> //        throw newTypLabelException(name);
>      }
> ...
> 
> It gives output like:
> Cannot represent String=21,Gara|e in CodePage=windows-1252
> Cannot represent String=21,Obszar przemysBowy in CodePage=windows
> -1252
> Cannot represent String=21,ZieleD in CodePage=windows-1252
> Cannot represent String=21,Zaro[la in CodePage=windows-1252
> Cannot represent String=21,MokradBa in CodePage=windows-1252
> Cannot represent String=21,Droga wojew\363dzka (B^Ecznik) in
> CodePage=windows-1252
> Cannot represent String=21,Droga szybkiego ruchu  (B^Ecznik) in
> CodePage=windows-1252
> Cannot represent String=21,Droga szybkiego ruchu  (B^Ecznik) in
> CodePage=windows-1252
> Cannot represent String=21,Zcie|ka rowerowa in CodePage=windows-1252
> Cannot represent String=21,Wybrze|e in CodePage=windows-1252
> Cannot represent String=21,Zcie|ka in CodePage=windows-1252
> Cannot represent String=21,StrumieD in CodePage=windows-1252
> Cannot represent String=21,Granica paDstwa in CodePage=windows-1252
> Cannot represent String=21,Rzeka, KanaB in CodePage=windows-1252
> Cannot represent String=21,StrumieD in CodePage=windows-1252
> Cannot represent String=21,Ruroci^Eg in CodePage=windows-1252
> Cannot represent String=21,Kabel wysokiego napi^Ycia in
> CodePage=windows-1252
> Cannot represent String=21,Tor wy[cigowy in CodePage=windows-1252
> Cannot represent String=21,Droga szybkiego ruchu  (B^Ecznik) in
> CodePage=windows-1252
> Cannot represent String=21,Droga krajowa (B^Ecznik) in
> CodePage=windows
> -1252
> Cannot represent String=21,Droga wojew\363dzka (B^Ecznik) in
> CodePage=windows-1252
> Cannot represent String=21,Wie[ (>5 tys.) in CodePage=windows-1252
> Cannot represent String=21,Wie[ (>5 tys.) in CodePage=windows-1252
> Cannot represent String=21,Restauracja (AmerykaDska) in
> CodePage=windows-1252
> Cannot represent String=21,Restauracja (ChiDska) in CodePage=windows
> -1252
> Cannot represent String=21,Restauracja (Mi^Ydzynarodowa) in
> CodePage=windows-1252
> Cannot represent String=21,Restauracja (WBoska) in CodePage=windows
> -1252
> Cannot represent String=21,Restauracja (MeksykaDska) in
> CodePage=windows-1252
> Cannot represent String=21,Restauracja (P^Eczki) in CodePage=windows
> -1252
> Cannot represent String=21,Restauracja (WegetariaDska) in
> CodePage=windows-1252
> Cannot represent String=21,Kr^Ygle in CodePage=windows-1252
> Cannot represent String=21,Sklep odzie|owy in CodePage=windows-1252
> Cannot represent String=21,Wypo|yczalnia samochod\363w in
> CodePage=windows-1252
> Cannot represent String=21,Gara| in CodePage=windows-1252
> Cannot represent String=21,Sprzeda| samochod\363w in CodePage=windows
> -1252
> Cannot represent String=21,Sklep |eglarski in CodePage=windows-1252
> Cannot represent String=21,S^Ed in CodePage=windows-1252
> Cannot represent String=21,O[rodek kultury in CodePage=windows-1252
> Cannot represent String=21,Wi^Yzienie in CodePage=windows-1252
> Cannot represent String=21,Stra| po|arna in CodePage=windows-1252
> Cannot represent String=21,SBupek in CodePage=windows-1252
> Cannot represent String=21,PrzystaD in CodePage=windows-1252
> Cannot represent String=21,L^Edowisko helikopterowe in
> CodePage=windows
> -1252
> Cannot represent String=21,Wie|a in CodePage=windows-1252
> Cannot represent String=21,yr\363dBo in CodePage=windows-1252
> Cannot represent String=21,Pla|a in CodePage=windows-1252
> Cannot represent String=21,Przyl^Edek in CodePage=windows-1252
> Cannot represent String=21,SkaBa in CodePage=windows-1252
> 
> Which makes sense if codepage 1252 doesn't handle Polish (hex 0x15,
> decimal 21).
> 
> NB the non ascii characters in above are messed up by my cutting and
> pasting.
> 
> Checking the French, on my Garmin device, the type descriptions now
> display accents correctly.
> 
> Ticker
> 
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


More information about the mkgmap-dev mailing list