logo separator

[mkgmap-dev] Small problem with global index

From Ticker Berkin rwb-mkgmap at jagit.co.uk on Sat Nov 27 12:05:38 GMT 2021

Hi Gerd

The drastic case which mdrUnicode_v9b fixes is the index byte size
crash, and this is most easily demonstrating by having enough ignorable
characters, eg shields or Chinese & Unicode without the original Sort
fix.

Until this crash point is reached, I've no idea if there is a problem
that can be demonstrated. However, the data structure whereby Mdr25
shares the same byte-size pointer to Mdr5 strongly indicates that these
should be kept in step and anything that allows Mdr25 to be bigger than
Mdr5 muse be wrong.

The new version of Sort for Unicode assumes that if ordering has been
defined for any characters in a [256] page, then any characters in this
page but not defined will get a zero/ignore sortOrder. If nothing has
been defined for the page the code will invent an sortOrder. So some
diag code in Sort should be able to list some other characters that
will get a zero sortOrder, hopefully there might be some nice name-like
chars amongst them.

Apart from the ignored sortOrder chars making a difference between
TERTIARY and .equal():
A significant consideration is that Sort.java doesn't sort higher than
the TERTIARY level, so it is possible to end up with a section of
TERTIARY same records, but adjacent records in this set might be
.equal() or there might be a non equal record between equals. Again, no
idea if this will matter in areas like the repeat flag setting, but it
indicates strongly that should use collator.compare rather than
.equal() for dedup.

Ticker

On Sat, 2021-11-27 at 10:54 +0000, Gerd Petermann wrote:
> Hi Ticker,
> 
> running in circles, aren't we?
> I ask for sample data to show that mdrUnicode_v9b.patch makes a
> difference in some special case.
> I totally agree that either Mdr5 or Mdr25 should be changed, and
> probably other places, too.
> 
> What I really like to have is a (small) example that shows a
> difference between TERTIARY and EQUAL so that I am able to compare
> mkgmap results with what Garmin does.
> The highway shield codes may not be a good idea in case Garmin treats
> them special, but I also would like to understand that special
> handling if it exists.
> 
> I think all I need is a way to find a String that gives TERTIARY == 0
> and String.equals() returns false for a given codepage. Maybe this is
> totally clear to you but it is not for me.
> 
> Gerd
> 
> 
> ________________________________________
> Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
> Gesendet: Samstag, 27. November 2021 11:23
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] Small problem with global index
> 
> Hi Gerd
> 
> mdrUnicode_v9b.patch isn't related to the issue of case-variants; it
> is
> about keeping consistency between Mdr5 and Mdr25 indexes. This will
> go
> wrong when there is a difference between TERTIARY and EQUAL in
> Country,
> Region and City names. It may be that this doesn't matter to Garmin
> software, or, more likely, will introduce slight errors in what is
> findable.
> 
> If you don't want to accept this patch, I think changes would be
> needed
> to Mdr5 to replace TERTIARY collator use with .equals().
> 
> Ticker
> 
> On Fri, 2021-11-26 at 18:04 +0000, Gerd Petermann wrote:
> > Hi Ticker,
> > 
> > sorry, meant r4718 instead of 4717 before.
> > 
> > Gerd
> > 
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> > von Gerd Petermann <gpetermann_muenchen at hotmail.com>
> > Gesendet: Freitag, 26. November 2021 18:06
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] Small problem with global index
> > 
> > Hi Ticker,
> > 
> > I tried this: use your command to build a gmapi and gmapsupp, but
> > replace r4810 by a binary compiled from mkgmap r4717 +
> > mdrUnicode_v9b.patch
> > (I still see no difference in the output compared to unpatched
> > r4717)
> > 
> > I then use MapSource to create another gmapsupp.
> > I run MdrDisplay and MdrCheck on both gmapsupp.img and see
> > different
> > repeat flags.
> > MdrCheck + my patch display-no-secondary.patch complains a lot
> > about
> > the gmapsupp with your patch but reports only 1 problem about a
> > city
> > without name.
> > When I try this with the unpatched display tool it complains a lot
> > about the gmapsupp from MapSource but not about the one from
> > mkgmap.
> > I think that shows that unpatched MdrCheck is wrong.
> > 
> > I tried this also with a binary from r4817 with attached mkgmap-no-
> > secondary-v2.patch with your command.
> > The two outputs from MdrCheck are identical, and I think the
> > outputs
> > for MdrDisplay differ only in offsets. I consider this very good.
> > In MapSource the search for "Baybride Lane" and "Alma Lane" both
> > return wherWell, so that's also good.
> > 
> > I prefer a patch that changes mkgmap to produce the same index as
> > MapSource.
> > 
> > I hope I've done nothing wrong during testing. Do you get other
> > results?
> > 
> > Gerd
> > 
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> > von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
> > Gesendet: Freitag, 26. November 2021 16:27
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] Small problem with global index
> > 
> > Hi Gerd
> > 
> > I was sort of thinking the opposite. PlaceFile using some method
> > (eg
> > what it does) to dedupe city/region/country and this is extended to
> > the
> > POI/MDRBuilder logic, such that combinations of these 3 have unique
> > sets of index values regardless of case.
> > 
> > Then the relevant MdrX sections should be able to do a SECONDARY
> > dedup
> > on these, to cope with case-variants coming from different tiles.
> > 
> > Then checking that this does actually work with Garmin software (ie
> > hope nothing cares that the index entries might not match the LBL
> > data
> > in some of the tiles)
> > 
> > If this works, there should only be one city presented in the find
> > options - eg, from the original problem data, it might be "De Wijk"
> > or
> > "de Wijk"
> > 
> > Then making MdrCheck tolerant of this as well.
> > 
> > An alternative is just to ignore the whole issue - no one else has
> > ever
> > noticed and complained.
> > 
> > I was hoping to get mdrUnicode_v9b.patch accepted before tackling
> > this.
> > Its purpose is to fix the crash when pathological city / region /
> > country names or incomplete sortorder codepage data causes enough
> > difference between TERTIARY & EQUAL to make Mdr25 index size too
> > big.
> > 
> > Ticker
> > 
> > On Fri, 2021-11-26 at 10:56 +0000, Gerd Petermann wrote:
> > > Hi Ticker,
> > > 
> > > reg. --lower-case and city/region/country names with different
> > > capitalization:
> > > I think it would be good to keep the different capitalization
> > > within
> > > a single tile, so yes, the .toUpperCase() in PlacesFile is
> > > probably
> > > not a good idea. Results seem better when this is not done.
> > > When the global index is created we can log warnings for those
> > > cases,
> > > but I don't see yet how we can create a valid index which doesn't
> > > require the user to decide whether wherWell or Wherwell should be
> > > searched.
> > > 
> > > Gerd
> > 
> > 
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> 
> 
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev




More information about the mkgmap-dev mailing list