Compatibility IDs

Hello all! I thought I’d make a quick blog post to talk about one of the design decisions within Wikijump to maintain compatibility with existing Wikidot data: IDs.

Like many other pieces of software, both Wikidot and Wikijump use numerical IDs to uniquely identify various data objects. To name a few, all users have an ID, as do all sites, all pages, all revisions within those pages, etc.

The most common way to uniquely assign such IDs is to do so sequentially. You start with say 1, and then every time a new user is created you take the current value of the counter, assign this as the ID for the new object, and increment the counter. This is how we know that Michał Frąckowiak was the first Wikidot user, since his user ID is 1.

However this poses a problem for the Wikijump project. If we simply do the same, then we will start issuing conflicting IDs. For instance, the ID of SCP-001 on SCP-JP is 1959889, if we start issuing page IDs at 1 at simply incrementing, we will eventually start producing IDs that are actually real IDs already in use throughout Wikidot, like the above. So then the question is how we ensure our unique IDs are actually unique.

One’s initial response may be to add checking on a newly-incremented ID to ensure it’s not already in use. But this approach has two issues:

  1. Most obviously, this adds a fair bit of overhead that did not previously exist. We can no longer use plain BIGSERIAL as a Postgres type to assign IDs, and now need code logic to query for and check ID uniqueness.
  2. This can potentially collide with Wikidot objects imported later. Let’s say we use this approach, and we have Wikidot data migrated for SCP-EN. We happily create new pages, skipping IDs that exist in EN until we get to 1959889. Then later we import SCP-JP data from Wikidot. Well now we have a problem because that page ID is already in use by that site’s SCP-001 page.

To avoid these issues, Wikijump uses compatibility IDs (WJ-964).

The way this system works is that first, I sampled the current ID value in Wikidot for several different class of object. Then I determined a much higher ID value than Wikidot will realistically reach, basically by doubling the value and then increasing the highest digit by one. This will be the new starting value for any new Wikijump-created objects. (In some cases this is bigger than the 32-bit ID limit that Wikidot has for IDs, making it impossible for Wikidot to even emit such a value as a new ID.)

This way, even as Wikidot continues to accumulate new objects with time, any entities created within Wikijump will be independent, and the two spaces can coexist. Put another way, all Wikidot IDs are valid in Wikijump. (Of course, whether that data is actually present in Wikijump is another matter — we certainly aren’t going to import the hundreds of spam sites that Wikidot has.)

Author: aismallard

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.