logo separator

[mkgmap-dev] New assertion, now with code-page=632 and Japan tile

From Gerd Petermann gpetermann_muenchen at hotmail.com on Wed Nov 17 11:16:21 GMT 2021

Hi Ticker,

remember that cs932 is a double-byte character set.
With your code only a few unmappable utf-16 characters are replaced, for the rest one of cs932 is used, but without any good reason. The result is typically garbage.

I've modified the patch to replace any  unmappable character that was not transliterated by '?' .
I've also attached a debug version that shows what goes on.
A possible change in SparseTransliterator would be to add a mapping for the MATH MINUS, the other FULLWIDTH digits are supported in cs932.

Gerd









________________________________________
Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
Gesendet: Dienstag, 16. November 2021 17:33
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] New assertion, now with code-page=632 and Japan tile

Hi

wouldn't:
        if ((c & 0xff) == 0)
                c = "?";
be safer

I don't understand the point of sparseTranslitorator and why it is only
used for cp932 (japanese), unless this charset includes quite a few
european accented character.

If this is the case then wouldn't it be much better to do as I
described, the essence of which is not to transliterate the complete
string into the small ascii/latin1 set just because some chars can't be
mapped. The TableTranslitorator (ascii & latin1) map these FULLWIDTH
digits (and letters). MATHS MINUS isn't defined but easy to add.

Handling char at a time might allow removal of the 'ascii' table - if
transliteration changes char to [string of] another, for each of these,
if can't be represented, transliterate them.

Ticker


On Tue, 2021-11-16 at 15:48 +0000, Gerd Petermann wrote:
> Hi all,
>
> this small patch would be my approach. It replaces those characters
> which don't fit into a byte by '?'
> This fixes the problems with japanese codepage 932.
>
> Gerd
> BTW: SparseTransliterator is very sparse. We could add a few more
> character mappings, for example there is a housenumber that contains
> "1237−1" instead of "1237-1".
> https://www.fontspace.com/unicode/analyzer#e=77yR77yS77yT77yX4oiS77yR
>
> ________________________________________
> Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
> Gesendet: Montag, 15. November 2021 15:59
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] New assertion, now with code-page=632 and
> Japan tile
>
> Hi
>
> How about something like:
>
> If the full string fails to encode in the target charset, process
> char
> at a time.
>
> If a char can't be represented, try transliteration on it and, if
> none
> defined, use "?", then go through the resultant string char at a
> time,
> and if this can't be represented, drop it.
>
> Maybe a final warning at end if no transliteration for a char or
> transliteration couldn't be represented.
>
> Ticker
>
> On Mon, 2021-11-15 at 13:04 +0000, Gerd Petermann wrote:
> > Hi all,
> >
> > > Maybe we should simply stop transliteration when this happens and
> > > return an empty string for the label?
> >
> > any thoughts on this?
> >
> > Gerd
> >
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> > von Gerd Petermann <gpetermann_muenchen at hotmail.com>
> > Gesendet: Mittwoch, 10. November 2021 11:17
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] New assertion,        now with code-
> > page=632 and Japan tile
> >
> > Hi devs,
> >
> > the problem occurs with node https://www.osm.org/node/5692472121
> > name=키타가키 고로케
> > Google translate says the name is Korean. The (utf8) name cannot be
> > translated into code-page 932 (japanese) and thus mkgmap converts
> > the
> > internal utf16 representation of the name to bytes.  This happens
> > in
> > method AnyCharsetEncoder.encodeText(String text) in this loop:
> >                                 for (int i = 0; i < s.length();
> > i++)
> >                                         outBuf.put((byte)
> > s.charAt(i));
> > The name 키타가키 고로케 ends with  케 and the char value is \ucf00, so it
> > is
> > converted to \0x00.
> > Maybe we should simply stop transliteration when this happens and
> > return an empty string for the label?
> >
> > If mkgmap is executed without the -ea run time option the map shows
> > name 、タ for the restaurant which is just wrong.
> > Gerd
> >
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> > von Gerd Petermann <gpetermann_muenchen at hotmail.com>
> > Gesendet: Mittwoch, 10. November 2021 09:43
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] New assertion,        now with code-
> > page=632 and Japan tile
> >
> > Hi Carlos,
> >
> > I'll try to debug this.
> >
> > BTW: I see you use *.o5m for the tiles (output from splitter). I
> > think this is no longer a good choice, pbf is a lot smaller and
> > almost as fast. Esp. when it comes to the goal of reducing disk I/O
> > (as with --gmapi-minimal)
> >
> > Gerd
> >
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> > von Carlos Dávila <carlos at alternativaslibres.org>
> > Gesendet: Dienstag, 9. November 2021 22:54
> > An: mkgmap-dev at lists.mkgmap.org.uk
> > Betreff: Re: [mkgmap-dev] New assertion,        now with code-
> > page=632 and Japan tile
> >
> > Hi Ticker
> >
> > Not sure if relevant, but note in this case assertion occurs while
> > compiling the tile, not the index. In fact, --index is not included
> > in
> > the command.
> >
> > El 9/11/21 a las 21:55, Ticker Berkin escribió:
> > > Hi
> > >
> > > I think this assertion could be removed from the code.
> > >
> > > Looking through the definition of Shift-JIS, I read it as saying
> > > the
> > > second byte shouldn't be zero, so I don't know why this happens.
> > >
> > > As with the Chinese code-pages, mkgmap has places where multi-
> > > byte
> > > encodings are not handled correctly in the --index generation and
> > > unknown meanings of flags to the Garmin software.
> > >
> > > Ticker
> > >
> > >
> > >
> > > On 09/11/2021 19:43, Carlos Dávila wrote:
> > > > code-page=932, sorry for the typo.
> > > >
> > > > El 9/11/21 a las 20:36, Carlos Dávila escribió:
> > > > > The command below produces an assertion while compiling this
> > > > > tile
> > > > > <https://files.mkgmap.org.uk/download/526/31191025.o5m> from
> > > > > Japan.
> > > > > Process continues with remaining tiles and finishes without
> > > > > "Number
> > > > > of MapFailedExceptions: 1" as expected. This is with r4813,
> > > > > but
> > > > > I
> > > > > also tried with an old version of mkgmap with the same
> > > > > result.
> > > > >
> > > > > java -Xmx27G -ea -jar mkgmap.jar--code-page=632 31191025.o5m
> > > > > Mkgmap version 4813
> > > > > Time started: Tue Nov 09 20:18:16 CET 2021
> > > > > WARNING (global): Setting max-jobs to 8
> > > > > Exception in thread "main" java.lang.AssertionError: found
> > > > > trailing
> > > > > 0 in chars
> > > > >         at
> > > > > uk.me.parabola.imgfmt.app.labelenc.EncodedText.<init>(Encoded
> > > > > Te
> > > > > xt.java:39)
> > > > >
> > > > >         at
> > > > > uk.me.parabola.imgfmt.app.labelenc.AnyCharsetEncoder.encodeTe
> > > > > xt
> > > > > (AnyCharsetEncoder.java:112)
> > > > >
> > > > >         at
> > > > > uk.me.parabola.imgfmt.app.lbl.LBLFile.newLabel(LBLFile.java:1
> > > > > 32
> > > > > )
> > > > >         at
> > > > > uk.me.parabola.imgfmt.app.lbl.PlacesFile.createPOI(PlacesFile
> > > > > .j
> > > > > ava:253)
> > > > >         at
> > > > > uk.me.parabola.imgfmt.app.lbl.LBLFile.createPOI(LBLFile.java:
> > > > > 17
> > > > > 2)
> > > > >         at
> > > > > uk.me.parabola.mkgmap.build.MapBuilder.processPOIs(MapBuilder
> > > > > .j
> > > > > ava:670)
> > > > >         at
> > > > > uk.me.parabola.mkgmap.build.MapBuilder.makeMap(MapBuilder.jav
> > > > > a:
> > > > > 325)
> > > > >         at
> > > > > uk.me.parabola.mkgmap.main.MapMaker.makeMap(MapMaker.java:114
> > > > > )
> > > > >         at
> > > > > uk.me.parabola.mkgmap.main.MapMaker.makeMap(MapMaker.java:62)
> > > > >         at
> > > > > uk.me.parabola.mkgmap.main.Main.lambda$processFilename$1(Main
> > > > > .j
> > > > > ava:291)
> > > > >         at
> > > > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java
> > > > > :2
> > > > > 64)
> > > > >         at
> > > > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(T
> > > > > hr
> > > > > eadPoolExecutor.java:1128)
> > > > >
> > > > >         at
> > > > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > > > Th
> > > > > readPoolExecutor.java:628)
> > > > >
> > > > >         at java.base/java.lang.Thread.run(Thread.java:829)
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > mkgmap-dev mailing list
> > > > > mkgmap-dev at lists.mkgmap.org.uk
> > > > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > > >
> > > > _______________________________________________
> > > > mkgmap-dev mailing list
> > > > mkgmap-dev at lists.mkgmap.org.uk
> > > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > > _______________________________________________
> > > mkgmap-dev mailing list
> > > mkgmap-dev at lists.mkgmap.org.uk
> > > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> >
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
>
>
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


_______________________________________________
mkgmap-dev mailing list
mkgmap-dev at lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs932-v2.patch
Type: application/octet-stream
Size: 2313 bytes
Desc: cs932-v2.patch
URL: <http://www.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20211117/e25fe1f8/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs932-v2-debug.patch
Type: application/octet-stream
Size: 2756 bytes
Desc: cs932-v2-debug.patch
URL: <http://www.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20211117/e25fe1f8/attachment-0001.obj>


More information about the mkgmap-dev mailing list