The Story So Far

I thought it would be worthwhile to provide a more-or-less complete history of the Wikijump project up to this point in time. While I’d been playing with the gabrys Wikidot release for years prior, this project began in earnest just a bit over a year ago. I stumbled upon a long-abandoned site by the Wikidot team that, at some point, offered a virtual appliance to spin up a self-contained Wikidot install in a box. Thankfully, there was still someone around that possessed a copy, and they were willing to send it to me.

The good news was that it did in fact run and persist data. The bad news is that it was really outdated, designed for PHP 5.2 and running on Ubuntu 8.04, using PGSQL 8 and generally shaping up to be a lot of work to bring forward. The codebase itself was also a bit scattered, and important configuration files were simply edited into the default configs. If you wish to play with that codebase, it’s kept in our Legacy branch.

So, the scope of work was steep, but nothing here was impossible. We were able to create database dumps and rip out any proprietary configurations, leaving it as a basic database. We were then able to use the latest stable Postgres. After that began the long, slow process of bringing the codebase up to modern standards. That meant getting the code to acceptable levels to be upgraded across all 9 major updates, from 5.3 to 7.4. With that came including namespaces for code that had no concept of such a thing, which was further complicated by the codebase dynamically instantiating classes with the assumption that everything was in the same namespace (or lack of one). Work on that went from December of last year to this month.

At the same time, the infrastructure needed to be rethought. This portable installation would be fine for a single user, but if you want to run it at scale, you need to break up these services and make sure they have the ability to scale as needed. We decided to run the full infrastructure in AWS, and to deploy it exclusively with Terraform. The concept of Infrastructure As Code is incredibly powerful. We extended the concept by planning to use AWS Elastic Container Service, and to containerize the pieces of the puzzle. This necessarily required us to store the Dockerfiles in the repo and we added Docker Compose support for local development. Thus, everything someone needs to deploy for local work or in the AWS cloud is provided right in the repository.

The infrastructure at this point in time starts at the edge with an AWS Elastic Load Balancer, listening on port 443 but not terminating the connection. Instead it passes it to our Traefik reverse proxy. We chose this route instead of terminating on the load balancer primarily for the purpose of SSL certificate deployment. If you have an ELB handle SSL termination, you’re limited to 25 certificates in the store. Since this software could theoretically be used with an arbitrarily large number of custom domains (whitelabel wikis and the like), that was not practical. Traefik lets us generate certificates on the fly from Let’s Encrypt, and designating targets for traffic is handled with simple Docker labels.

Behind the reverse proxy is Nginx running as a web server, replacing their Lighttpd installation. It has a very small workload, serving a few static assets and otherwise sending requests back to the PHP-FPM container, which will process the request and return the response. Also present in the configuration is a Memcached server and a Postgres database server. For our dev environment, this all fits on a single t3a.small instance, which we run as a spot request to bring the cost down to about $4 a month.

The production environment is not yet built but will have a few changes made. The Memcached container will instead be an Elasticache cluster, Postgres will be handled by RDS, and the container fleet will run under the Fargate serverless paradigm instead of EC2s. This allows for much more granular scaling of the needed services. Given that multiple containers can pull configuration or files from a common EFS volume, scaling can be nearly instantaneous. Production will also make greater use of CloudFront and S3 for the caching and serving of static assets.

Meanwhile, work has been steadily progressing on FTML, a library written in rust to parse Wikidot markup with an actual parser rather than the Text_Wiki package which is just a series of regex statements, passing the markup by reference. We’ve observed in the implementation of Text_Wiki some catastrophic backtracking which explains a lot of what we’ve seen in the unmaintained public implementation. This library gives us the opportunity to parse and render pages much, much faster than the current implementation allows, as well as being more consistent and less prone to quirks from chaining together statements.

Work is steadily progressing on a much-enhanced editor to replace the current one, as well as the TypeScript code to shuttle requests back and forth from the user to the backend and the toolchain to bundle the assets.

On the PHP side, we decided fairly early on to make use of the Laravel framework for future development, and to touch the existing codebase as little as possible except to deprecate and remove it. One of the reasons we chose Laravel is for its robust ecosystem for things like integrating with OAuth-compatible social account logins, WebSockets, and deployment and management of API keys. As of today, the legacy (known as Ozone) codebase is simply a subset of the Laravel codebase, having been set up as a controller to be invoked as needed. This allows for new code and classes for tricky things like authentication and sessions to be seamlessly used in the legacy codebase, and vice-versa, we can call existing legacy classes and methods until we’re ready to replace them.

We have ripped out several libraries that served minimal purpose such as an antiquated implementation of Zend, used only for the purpose of Pingbacks and Search, both of which we’re replacing. We’re continuing to remove dead code almost continuously and we’ve shrunk the repo size by an order of magnitude.

Exploration has begun for an API, and a full OpenAPI spec has had its first draft published. We’re currently evaluating Dingo as an API package due to its many quality-of-life improvements allowing us to iterate much faster and safer.

We have also begun the preliminary steps of Datadog integration, who has very graciously agreed to provide support in the form of access to their most excellent software. A more complete writeup there will be available when we’ve had the chance to really implement their integrations and performance monitoring, but suffice it to say that there’s a native Datadog integration for literally every technology in our stack. We’re going to try to add to the Terraform configuration as a means to deploy Datadog dashboards and monitors just by providing an API key.

Truth be told, there’s probably many many more things that have transpired that are slipping my mind. We plan on providing tech blogs and updates roughly every two weeks going over what we’ve been up to and showing off anything fun we found.

Author: bluesoul
I'm an AWS Certified Professional Solutions Architect and hobbyist PHP developer. I'm currently the project lead for Wikijump.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.