Web analytics on searchmysite.net
Hopefully it goes without saying that some form of web analytics is useful even for a privacy aware site like searchmysite.net, because you really need to know how a site is being used when looking at certain issues and enhancements, scaling the infrastructure etc.
- The official httpd docker image (which searchmysite.net uses) defaults to sending its logs to
/proc/self/fd/1(STDOUT) rather than an actual log file, for better integration with
- House-keeping activities such as log file rotation can complicate things.
- If the service becomes popular enough to warrant spreading the load over multiple web servers, there is the additional challenge of aggregating all the log files (unless of course the logs were captured in a single SSL terminating reverse proxy).
I know there are solutions like ELK Stack (Elasticsearch, Logstash, Kibana) for managing data collection, log parsing and visualisation, but implementing those is a project in itself, and more than the simple web analytics required at this stage.
Fortunately there are a growing number of privacy aware web analytics solutions, perhaps helped by the spread of GDPR and PECR cookie consent popups2.
I decided to look at Plausible for a number of reasons:
- It is completely cookie-free.
- It is open source.
- It has a self-hosted option.
I also really like Plausible’s business model: it is to be a sustainable open-source project. Although there is a free self-hosted solution, they aim to pay their running costs and salaries via the premium managed solution, i.e. they have a concrete plan to remain a viable company without having to depend on advertising, investor cash, or charity. This sounds just like the sort of thing I wrote about in What went wrong with the internet (and how can it be fixed)? which is of course what inspired the creation of searchmysite.net.
Anyway, I’m using the self-hosted Plausible solution for now, primarily for reasons of cost (searchmysite.net is self-funded and hasn’t switched on the listing fee yet). Setup was relatively straight-forward, with a good and growing community of users. I’ve been running it for about a week and it seems to be working well so far.
In the interests of transparency, I am still keeping the web server log files for a short period. One of the reasons for this is to get details of the most popular search queries, which is missing from many analytics solutions. Information on popular searches could be useful for relevancy tuning, which will be the topic for my next post.
See View logs for a container or service. My workaround was to reconfigure the logging to send to an actual file (logs/access.log) and set up a volume mount to be able to access the logs from the host. ↩︎
I sometimes wonder if the point behind the General Data Protection Regulation (GDPR) and Privacy and Electronic Communication Regulations (PECR) legislation was to try and encourage sites to cut back on the amount of tracking cookies, but what seems to have happened is that the big sites have put their effort into developing the consent functionality instead, and most users have got used to simply clicking “accept all”. ↩︎