logo separator

[mkgmap-dev] New assertion, now with code-page=632 and Japan tile

From Gerd Petermann gpetermann_muenchen at hotmail.com on Wed Nov 17 18:00:50 GMT 2021

Hi Ticker,

> For some other character sets the result could be invalid or garbage.
OK, I assumed that '?' is always at the same position, might be wrong with that.
SparseTransliterator is only used for cs932.

Gerd

________________________________________
Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
Gesendet: Mittwoch, 17. November 2021 18:23
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] New assertion, now with code-page=632 and Japan tile

Hi Gerd

It makes a big assumption that transliterated chars can have their low
byte written as a sequence that will be taken as some other charset.

For cp932, because so few chars are transliterated, and, by chance,
these have the same representation, it won't crash or give an invalid
sequence, but will be strange that only macron chars, with the macron
removed, will show amongst the "?"s.

For some other character sets the result could be invalid or garbage.

Ticker

On Wed, 2021-11-17 at 16:40 +0000, Gerd Petermann wrote:
> Hi Ticker,
>
> yes, sure, a lot of unicode characters cannot be represented in
> cs932. SparseTransliterator only handles 5 of them.
> All others are now translated to ? instead of a more or less
> unpredictable character.
> My patch doesn't try to implement a good transliteration, just a
> better handling of unmapped chars.
>
> Gerd
>
> ________________________________________
> Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
> Gesendet: Mittwoch, 17. November 2021 17:35
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] New assertion, now with code-page=632 and
> Japan tile
>
> Hi Gerd
>
> Not quite - The transliteration / "?" doesn't get encoded into the
> target charset.
>
> In this case with cp932, there seems to be an assumption that
> SparseTranslitorator will convert all unicode chars that are not in
> CP932. There must be lots of these.
>
> Ticker
>
> On Wed, 2021-11-17 at 16:00 +0000, Gerd Petermann wrote:
> > Hi Ticker,
> >
> > result.length() works and most times returns 1, sometimes higher
> > values for unicode characters which cannot be represented by a
> > single
> > char.
> >
> > OK to commit v2?
> > Gerd
> >
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> > von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
> > Gesendet: Mittwoch, 17. November 2021 15:37
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] New assertion, now with code-page=632 and
> > Japan tile
> >
> > Hi Gerd
> >
> > My description didn't quite mean what I hoped it did - sorry. I was
> > thinking that there would be a single attempt at encoding the whole
> > string, and if that fails, start again but char-by-char.
> >
> > But, assuming result.length() works and charBuffer.get() and
> > outBuff.put() maintain positions used by main encoder, within the
> > loop
> > the failed component needs to be processed input char-by-char,
> > transliterated (if no change replaced by "?") and encoded with
> > another
> > encoder.
> >
> > Any variable length nature of the output charset shouldn't be a
> > problem. The variable length input UTF-16 will need care.
> >
> > Ticker
> >
> >
> > On Wed, 2021-11-17 at 11:16 +0000, Gerd Petermann wrote:
> > > Hi Ticker,
> > >
> > > remember that cs932 is a double-byte character set.
> > > With your code only a few unmappable utf-16 characters are
> > > replaced,
> > > for the rest one of cs932 is used, but without any good reason.
> > > The
> > > result is typically garbage.
> > >
> > > I've modified the patch to replace any  unmappable character that
> > > was
> > > not transliterated by '?' .
> > > I've also attached a debug version that shows what goes on.
> > > A possible change in SparseTransliterator would be to add a
> > > mapping
> > > for the MATH MINUS, the other FULLWIDTH digits are supported in
> > > cs932.
> > >
> > > Gerd
> >
> >
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
>
>
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


_______________________________________________
mkgmap-dev mailing list
mkgmap-dev at lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


More information about the mkgmap-dev mailing list