New site listing workflow and search as a service improvements
This is a quick post summarising the simplified site listing workflow and search as a service improvements.
Why the site listing workflow needed simplifying
The old site listing workflow was suprisingly complicated, with a number of different routes through the process, and the ability to restart and take a different route at a later date. Unfortunately, there were a number of issues, for example:
- Unexpected combinations of routes leading to unusual bugs like “A site submitted via Quick Add but awaiting approval, then submitted again via Verified Add, won’t be indexed until moderator approval”.
- Difficulties adding new features like “Search as a service: Free trial mode”.
- Users not entirely sure whether they should use the “Quick Add”, “Verified Add (IndieAuth)” or “Verified Add (DCV)” option.
The new site listing workflow
All submissions in the new workflow start from the same Add Site page, and the listing types have been renamed to “Basic” and “Full” (plus the new “Free Trial”), which is hopefully clearer. The second step for the Full listing asks for “Login and domain ownership validation method”, which again is hopefully clearer than the existing “Domain Control Validation” or “IndieAuth” options.
The new workflow diagram with the new terminology is:
The new database schema
This is the 2nd major change to the database schema. The first design had 3 tables for domains - pending, approved, and rejected. That turned out to be a bit of a pain to maintain, having to move records between tables as they changed state, so I replaced with a simplified schema in Oct 2021. That new schema had one table. Unfortunately that turned out to be a bit of an oversimplification, meaning (as per one of the fundamentals of database design) single values per domain, while there were at least two features that would benefit from more than one value per domain:
- A site should have more than one state, so someone can have it indexing on the Basic tier while they (for example) try out the Free Tier or sort out the Full Tier.
- A site should be able to have more than one paid subscription, i.e. users should be able to renew a subscription while the current one is still active. As it was, users needed to let their subscription expire before they could renew, which was a poor user experience.
So the main schema changes are:
- A new listing status table to track signup state. This will hopefully reduce the chance of inconsistent states, e.g. if someone tries to change tier mid-way through signup. The primary key is a combination of domain and tier, so there will only be one instance of each tier for each domain.
- A new subscriptions table. Crucially, this allows for more than one subscription for a site, so you can renew your subscription before the old one expires. It was also pretty useful for adding support for the new Free Trial listing.
This should make the whole site listing workflow much more robust and extensible in the long term (although admittedly the introduction of a lot of new code might lead to a few new bugs in the short term).
What data has been migrated to the new version
Given past experience of odd issues when migrating potentially inconsistent states to a new schema, I’ve decided to only migrate fully approved sites to the new schema, and also only migrate them with their current status. There are 1400 such sites.
This means the following is not migrated:
- Previous verification details, i.e. if sites were initially Verified Add but lapsed to Quick Add (old terminology), they will need to reverify if moving from Basic back to Full listing (new terminology). This applies to 30 of the 1400 migrated sites.
- Unlisted sites where the listing is “in progress”. A total of 33 such sites haven’t been migrated (noting that some have been “in progress” for over 2 years).
- The blocked sites list. There are 591 blocked sites which haven’t been migrated.
- Sites which have had indexing disabled because indexing has failed twice in a row, e.g. because the site is down, or blocking indexing due to robots.txt or Cloudflare. There are 70 of these.
This should however mean starting with a clean slate, and preventing issues caused by the accumulation of potentially inconsistent data over the years.
Improvements to the search as a service
In addition to the new site listing workflow, the ability to resubscribe, and the introduction of a Free Trial, other improvements to the search as a service include:
- Checking a site can be indexed before completing submission process, to reduce the chance of people paying for a Full Listing before realising that their site can’t be indexed for one of the reasons described in Some of the challenges of building an internet search. By far the most common reason is sites going offline, which isn’t an issue at the point of submission (and is less likely to affect sites where the owners have paid for a search as a service), but other reasons like robots.txt and Cloudflare could still apply.
- Auto-expiring tier 2 and 3 (formerly known as Verified Add) sites - previously just tier 1 / Quick Add sites were auto-expired.
- Sending an email to admin if a paid for (i.e tier 3) listing has indexing disabled due to indexing failing twice in a row, to reduce the time a paid for listing is unindexed in the event of a site owner having remedied the issue (it was only periodically manually checked before).
So I think that will make it a much more useful and attractive search as a service.
For the next major release, I’ve got a number of improvements I’d like to make to the public search.