This is the second article in a series on Drupal 8 migrations which started with An Overview for Migrating Drupal Sites to 8. In this article, you will see a sample setup of a Drupal 7 to 8 migration where we provide the front and back-end teams with a daily database that has the latest configuration and content changes, plus a means for the migration team to test migrations.
The other two articles referenced are Achieve Rocketship-Fast Jobs in CircleCI by Preinstalling the Database and Continuous Integration for Drupal 8 with CircleCI.
If you want to jump straight into the code, here are the resources that we will use:
- The source code lives in GitHub at juampynr/drupal8_migrate. It is a Drupal 8 site built with the Composer Template for Drupal projects.
- The repository is connected to CircleCI, which runs a nightly migration. Here is the list of its past jobs.
- Docker MariaDB images containing the database are hosted by Quay.io. Here you can see the tag history.
Background
Large migrations to Drupal 8 can be planned as two teams working in parallel where one team migrates configuration and content and the other team ports the front and back ends. This planning requires the migration to set up a scheduled process that creates a database dump with the latest configuration and content that the front and back-end teams can use for their work.
The approach that we took was creating versioned copies of the database via a mixture of pull requests, CircleCI jobs, and Docker image tags. In this setup, the Docker tag at quay.io/juampynr/drupal8_migrate:master
points to the resulting image from the latest migration done by CircleCI. Other tags such as my-pull-request point to images where the migration team is experimenting with the migration via a pull request. Here is the tag history listing at Quay.io:
And here is a screenshot of a pull request that created one of the above Docker image tags:
Let’s start by describing the three databases that we will use.
Source database, configuration database, and full database
The repository at Quay.io hosts Docker images, each of them containing three databases:
- The source database is the Drupal 7 database from the existing site that we want to migrate. For this example, we created a Dropbox app that hosts a database dump that an imaginary client would update every 24 hours with a fresh database dump from production.
- The configuration database contains a Drupal 8 installation with database updates up to date, imported configuration, configuration migrations executed, and no content.
- The full database is the result of running content migrations against the configuration database.
Here are the relationships between the three databases and the teams:
The migration team uses the source and configuration databases to make tweaks in the migration which, once merged onto the master branch, will be picked up by CircleCI at the nightly migration, resulting in a full new database that the front and back-end teams can use the next day.
💡 The main reason for using a configuration database when working on the migration is that Drupal won’t let you modify the content model if there is content in it. On top of that, working with a leaner database is much faster than having to roll back content in order to test a change. We could have used drush site-install -existing-config
every time we needed to work on migrations but this approach is slower for databases with a large content model.
In the following section, you will see how we configured CircleCI to run scheduled migrations.
Nightly migrations
We set up the CircleCI workflow so it runs a migration at midnight. It does the following:
- Fetches the latest code in master branch from GitHub
- Downloads the latest master Docker tag from Quay.io, where the three databases are installed
- Downloads the latest Drupal 7 database dump via the Dropbox API
- Runs the content migration
- Dumps the source, configuration, and full databases
- Builds a new Docker image, tags it at
master
, and pushes it to Quay.io
Here is a diagram that illustrates the above:
CircleCI is the director of the above orchestra. Let’s look at how it does it. The .circleci/config.yml is too large to comment in one go so we will do it in chunks. Here is the top configuration:
Next is the first half of the steps section, where we configure the environment and its databases:
The second half of the steps section is what runs the migration and builds an image:
Finally, the workflows section is where we define when the migration should run:
The above setup will give the front and back-end teams up-to-date configuration and fresh content. All they need to do to obtain the latest Docker image containing the database is run these commands:
docker pull quay.io/juampynr/drupal8_migrate:master
docker run -d --name drupal8_migrate -p 3306:3306 quay.io/juampynr/drupal8_migrate:master
In the next section, we will see how we allow the migration team to test migrations without disturbing the front and back-end teams.
Triggering migrations in pull requests
If you look again at the end of the job, you will see that it builds and pushes the resulting Docker image using an environment variable that contains the Git branch that was used. Like this:
vendor/bin/robo database:build-image ${CIRCLE_BRANCH}
Therefore, by creating a pull request that comments out the scheduling section, the migration team can trigger a migration which, if successful, will build and push a Docker image using the branch name of the pull request as the tag. The migration team can use this process to delegate running migrations on CircleCI while they work on other areas of the migration. When the migration is completed, they can download the resulting Docker image and verify it.
See the following screenshot of a pull request where I was tweaking the migration job. I waited until I got a working migration before I restored scheduling and merged it:
When working with configuration migrations, it is common to use the configuration database, tweak the migration files, run configuration migrations, export the resulting configuration, and create a pull request. However, when working with content migrations on large projects, the process described in this article offers a means to test the content migration, verifies that the resulting full database is correct, and merges the pull request safely. Here is a drawing with the workflow:
Try it out
If you're migrating a large site within a team, consider adjusting this approach to your development workflow. Here are a few final considerations to take into account:
Creating the first image
This is a chicken and egg problem because CircleCI pulls the Docker image from Quay, updates its full database, and pushes a new image. Creating the first image involves dumping the databases locally, building the Docker image and pushing it to Quay. You can find the steps to accomplish it at the sample repository.
Restoring scheduling before merging a pull request
The CircleCI workflow that we defined at the sample repository requires commenting out the scheduling section to trigger a migration. We wanted to be very specific on when to run migrations so that is why we did not make use of the amazing features of CircleCI workflows to implement something that did not require tweaking the job like holding a workflow for manual approval. It is up to you to do it one way or the other.
Keep an eye on the locally available disk storage
Docker images can take a lot of space in your hard disk. Therefore, if you find yourself pulling the latest image frequently, have a look at the docker system prune command.
Keeping databases elsewhere
The sample repository uses Quay.io to host databases in a MySQL image, but you are free to use a different approach. For example, if you have a server, you could store the three databases in a directory and that is perfectly valid. The only downside to such an approach is that the migration job will take longer because it needs to install the databases and you won’t be able to skim through a reference of each database tag.
Showcasing the latest migration
In a client project, we connected Tugboat.qa to the repository so it would use the latest Docker master tag available in pull requests. That let everyone, even stakeholders, test the heartbeat of the project every day.
Thanks to Salvador Molina Moreno, James Sansbury, Andrew Berry, Matthew Tift, and Ellie Fanning for their feedback.
Hero image by Andreas Weith