logo separator

[mkgmap-dev] New assertion, now with code-page=632 and Japan tile

From Ticker Berkin rwb-mkgmap at jagit.co.uk on Thu Nov 18 17:47:53 GMT 2021

Hi Gerd
 
For any code-page except Japanese/cp932, AnyCharSetEncoder takes
anything that can't be represented, tries to find a reasonable ascii
representation or "?", then writes this to the output. This is a big
assumption for far-eastern charsets, most likely generating garbage
with possible invalid shift-in/out requests...

SparseTranslitorator is a very strange special case, without any
explanation. Doing a bit of searching, it was submitted as a change
because user had map that needed to be in Japanese/cp932 and it also
contained latin characters. The characters with macrons couldn't be
encoded. Many others could. The rest of Unicode that can't be encoded
resulted in garbage.

Your patch fixes the "rest of Unicode" problem for cp932. It misses any
ability of the 'latin1' transliterator to provide reasonable
replacement chars that can be encoded. It doesn't deal with possible
problems for other (non-european) charsets.

I've attached cs932-V3.patch that addresses both of these issues.

SparseTranslitorator.java can the be removed.

Ticker

On Wed, 2021-11-17 at 18:00 +0000, Gerd Petermann wrote:
> Hi Ticker,
> 
> > For some other character sets the result could be invalid or
> > garbage.
> OK, I assumed that '?' is always at the same position, might be wrong
> with that.
> SparseTransliterator is only used for cs932.
> 
> Gerd

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs932-v3.patch
Type: text/x-patch
Size: 3753 bytes
Desc: not available
URL: <http://www.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20211118/9b2abae1/attachment.bin>


More information about the mkgmap-dev mailing list