logo separator

[mkgmap-dev] TYP files and character encoding

From Ticker Berkin rwb-mkgmap at jagit.co.uk on Sat Dec 21 16:11:32 GMT 2019

Hi Gerd

Attached is a patch that:

Doesn't use the 'CodePage=' command in the typ-file to determine output
character encoding of the typ-file, rather it uses the main map
encoding from the --code-page argument.

log.warn's any typ labels that can't be encoded in the --code-page,
rather than just giving up with message like:
> TYP file cannot be written in code page 1252

The message:
> WARNING: SortCode in TYP txt file different from command line setting
that was written direct to system.out is changed to a log.warn and it
shouldn't happen anyway now

For the moment, the 'CodePage=' command in the typ-file is, under some
circumstances, used to determine the encoding of the typ-file itself
and I've left this alone for compatibility with existing useage.
Sometime in January I'll provide a better method for this
 
Ticker


On Wed, 2019-12-18 at 19:54 +0000, Ticker Berkin wrote:
> Hi Gerd
> 
> I think it is best to continue with the ideas for typ-files that:
> 
> 1/ they can be in any character set and we just need a better way of
> working out the correct one - see my posting earlier today.
> 
> 2/ it can include as many languages as anyone can be bothered to add,
> and so has to be an a character set that allows the languages to be
> added, implying unicode for a common one (more particulary, UTF-8)
> 
> 3/ the codepage= statement should be redundant and ignored for
> controlling the output character set, which should be taken from the
> map, but its use for determining the input coding might need to be
> kept
> for a while for compatability.
> 
> 4/ the messages my hack generates should be turned into 1 warning or
> information message per language or maybe suppressed altogether. If
> someone is generating a map with a character set that doesn't support
> a
> particular language, they really won't care that that data for other
> languages that have an incompatible representation with their
> language
> won't be there. 
> 
> Ticker
> 
> On Wed, 2019-12-18 at 19:08 +0000, Gerd Petermann wrote:
> > Hi Ticker,
> > 
> > I think I understand now why we didn't have a default typ file ;)
> > If I got that right I should revert the changes in r4395 and mkgmap
> > should not allow or warn loudly when a typ file with a different
> > codepage is merged?
> > Or should we force the usage of unicode codepage?
> > Or is it possible to compile mapnik.txt with cp 1252 (or any other)
> > in a way that only those lines which contain non-matching
> > characters
> > are ignored?
> > 
> > Gerd
> > 
> > 
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> > von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
> > Gesendet: Mittwoch, 18. Dezember 2019 19:46
> > An: mkgmap development
> > Betreff: [mkgmap-dev] TYP files and character encoding
> > 
> > Hi
> > 
> > A couple of problems with typ-files and unicode.
> > 
> > With 'Codepage=65001' the final contents of the labels in
> > mapnik.typ
> > that is included with the composite map is unicode, but if the map
> > is
> > codepage 1252, the unicode characters with the top bit set are
> > simply
> > displayed as if in 1252.
> > 
> > Removing the codepage statement from mapnik.txt and making fixes
> > elsewhere to ensure that the file is read correctly as utf-8 and
> > then
> > generating a map with --code-page=1252, it gives the error:
> > 
> > SEVE: uk.me.parabola.imgfmt.MapFailedException
> >  ../svn/trunk/resources/typ-files/mapnik.txt:
> >  (thrown in TypCompiler.makeMap())
> >  TYP file cannot be written in code page 1252
> > 
> > Changing the exception handling in imgfmt/app/typ/TypElement.java,
> > so
> > that makeLabelBlock() reads as
> > ...
> >     CharBuffer cb = CharBuffer.wrap(tl.getText());
> >     try {
> >         ByteBuffer buffer = encoder.encode(cb);
> >         out.put((byte) tl.getLang());
> >         out.put(buffer);
> >         out.put((byte) 0);
> >      }  catch (CharacterCodingException ignore) {
> > //        ignore.printStackTrace();
> >         String name = encoder.charset().name();
> >         System.out.println("Cannot represent String=" +
> >             tl.getLang() + "," + tl.getText() +
> >             " in CodePage=" + name);
> > //        throw newTypLabelException(name);
> >      }
> > ...
> > 
> > It gives output like:
> > Cannot represent String=21,Gara|e in CodePage=windows-1252
> > Cannot represent String=21,Obszar przemysBowy in CodePage=windows
> > -1252
> > Cannot represent String=21,ZieleD in CodePage=windows-1252
> > Cannot represent String=21,Zaro[la in CodePage=windows-1252
> > Cannot represent String=21,MokradBa in CodePage=windows-1252
> > Cannot represent String=21,Droga wojew\363dzka (B^Ecznik) in
> > CodePage=windows-1252
> > Cannot represent String=21,Droga szybkiego ruchu  (B^Ecznik) in
> > CodePage=windows-1252
> > Cannot represent String=21,Droga szybkiego ruchu  (B^Ecznik) in
> > CodePage=windows-1252
> > Cannot represent String=21,Zcie|ka rowerowa in CodePage=windows
> > -1252
> > Cannot represent String=21,Wybrze|e in CodePage=windows-1252
> > Cannot represent String=21,Zcie|ka in CodePage=windows-1252
> > Cannot represent String=21,StrumieD in CodePage=windows-1252
> > Cannot represent String=21,Granica paDstwa in CodePage=windows-1252
> > Cannot represent String=21,Rzeka, KanaB in CodePage=windows-1252
> > Cannot represent String=21,StrumieD in CodePage=windows-1252
> > Cannot represent String=21,Ruroci^Eg in CodePage=windows-1252
> > Cannot represent String=21,Kabel wysokiego napi^Ycia in
> > CodePage=windows-1252
> > Cannot represent String=21,Tor wy[cigowy in CodePage=windows-1252
> > Cannot represent String=21,Droga szybkiego ruchu  (B^Ecznik) in
> > CodePage=windows-1252
> > Cannot represent String=21,Droga krajowa (B^Ecznik) in
> > CodePage=windows
> > -1252
> > Cannot represent String=21,Droga wojew\363dzka (B^Ecznik) in
> > CodePage=windows-1252
> > Cannot represent String=21,Wie[ (>5 tys.) in CodePage=windows-1252
> > Cannot represent String=21,Wie[ (>5 tys.) in CodePage=windows-1252
> > Cannot represent String=21,Restauracja (AmerykaDska) in
> > CodePage=windows-1252
> > Cannot represent String=21,Restauracja (ChiDska) in
> > CodePage=windows
> > -1252
> > Cannot represent String=21,Restauracja (Mi^Ydzynarodowa) in
> > CodePage=windows-1252
> > Cannot represent String=21,Restauracja (WBoska) in CodePage=windows
> > -1252
> > Cannot represent String=21,Restauracja (MeksykaDska) in
> > CodePage=windows-1252
> > Cannot represent String=21,Restauracja (P^Eczki) in
> > CodePage=windows
> > -1252
> > Cannot represent String=21,Restauracja (WegetariaDska) in
> > CodePage=windows-1252
> > Cannot represent String=21,Kr^Ygle in CodePage=windows-1252
> > Cannot represent String=21,Sklep odzie|owy in CodePage=windows-1252
> > Cannot represent String=21,Wypo|yczalnia samochod\363w in
> > CodePage=windows-1252
> > Cannot represent String=21,Gara| in CodePage=windows-1252
> > Cannot represent String=21,Sprzeda| samochod\363w in
> > CodePage=windows
> > -1252
> > Cannot represent String=21,Sklep |eglarski in CodePage=windows-1252
> > Cannot represent String=21,S^Ed in CodePage=windows-1252
> > Cannot represent String=21,O[rodek kultury in CodePage=windows-1252
> > Cannot represent String=21,Wi^Yzienie in CodePage=windows-1252
> > Cannot represent String=21,Stra| po|arna in CodePage=windows-1252
> > Cannot represent String=21,SBupek in CodePage=windows-1252
> > Cannot represent String=21,PrzystaD in CodePage=windows-1252
> > Cannot represent String=21,L^Edowisko helikopterowe in
> > CodePage=windows
> > -1252
> > Cannot represent String=21,Wie|a in CodePage=windows-1252
> > Cannot represent String=21,yr\363dBo in CodePage=windows-1252
> > Cannot represent String=21,Pla|a in CodePage=windows-1252
> > Cannot represent String=21,Przyl^Edek in CodePage=windows-1252
> > Cannot represent String=21,SkaBa in CodePage=windows-1252
> > 
> > Which makes sense if codepage 1252 doesn't handle Polish (hex 0x15,
> > decimal 21).
> > 
> > NB the non ascii characters in above are messed up by my cutting
> > and
> > pasting.
> > 
> > Checking the French, on my Garmin device, the type descriptions now
> > display accents correctly.
> > 
> > Ticker
> > 
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: typCodePage.patch
Type: text/x-patch
Size: 4774 bytes
Desc: not available
URL: <http://www.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20191221/fc56a67f/attachment.bin>


More information about the mkgmap-dev mailing list