Development environment for huge WP network
-
Hello there,
I work with a large WP Multisite installation, with more than half a million posts, some hard-coded customizations (that we managed to take out of the WP core and are now plugins) and a lot of plugins. This installation was quite outdated and receiving little development attention for a few years. Since the last time we touched its codebase, we greatly changed (mostly for the better) our development practices.
We recently updated it from WPMU 2.4 to WP 3.0.4, and I was amazed at how hard the process was. We didn’t manage to fit it on our workflow. I guess the biggest issue was:
How can we keep multiple, isolated replicas of a running WP Network installation for development and testing purposes?
The developers weren’t able to keep their local WP installations a reliable testing environment (there was simply too much plugins and configurations to replicate all the work on a fresh WP install, and trying to use the existing installation simply didn’t work or was too unstable), so we had to use our integration server as a development environment. That made everything a mess and our version control system useless; since to get to this server, code had to be on our VCS, and we had to broke the assumption that all code there was stable (since no one could guarantee anything without testing, and testing was impossible locally).
The result was that everything is quite messy; after the update, a lot of problems appeared (mainly weird plugin behavior) that we really think we should have been able to prevent; and developing new things for this installation (eg. a plugin that integrates it with our external searching framework) is hell, since we can’t test it properly.
So, while I try to sort out more carefully what our real problems were, I’d like to hear hear some stories: how do you manage and develop for your WP Network? How do you test & validate changes on this Network on a reliable way?
Thank you very much.
-
How can we keep multiple, isolated replicas of a running WP Network installation for development and testing purposes?
I have to ask … how did you do it before?
My first thought would be to have a DB replication that processes a couple sql scripts to tweak the domain names to dev-domain.com or something similar. But a lot of this depends on how YOUR company handles development in general and what process you have to fit this into?
I have to ask … how did you do it before?
We’ll never know for sure, most people that worked on setting the thing up are no longer with us, and they didn’t document much. However, they didn’t work on some problems that we know are fixable (eg. the domain names), so I guess they believed that there was no way to make things a little more reliable, and just got used to it.
…how YOUR company handles development in general and what process you have to fit this into?
We use a VCS (Git) and a staging server that receives work. When we’re done with whatever we’re doing, we mark a release, and the code goes to production servers. We work mainly with Rails/Ruby applications, so we got used to have a lot of tools to make code more reliable and overall just make our life easier. A lot of the struggle is to make things that we take for granted (continuous integration, integration tests, etc) also work with WP.
About our processes, I think there are two major sources of conflict (all mentions to “code leaving the developers machine” means it entering the version control system):
* Code shouldn’t leave the developer’s machine unless it’s stable. So, we presume that each developer should have a local environment that enables him to reliably validate the code’s stability. Since we weren’t able to do that, we had to send code to the integration server to test it.
* All code that leaves the developer’s machine should be environment-agnostic – that is, it should not have any kind of information that locks it down to an specific server. If that cannot be avoided, code should be provided to adapt the application to each environment.
That means, for example, that the wp-config.php (which contains database information pertinent to each environment) should not leave the developer’s machine (which is fine).
That also means that, since we cannot avoid the fact that WP stores the domain name (an environment-specific thing) in the database, we should provide a code that easily changes it (which I can’t do right now, but I’m pretty sure it’s possible).
There’s still a problem with plugins & themes, we have a lot of them, of varying quality, and some of them don’t handle the different environments very well (sorry if that sounds vague, but it’s exactly because we are having trouble testing everything). Initially, we left the plugins out of the application itself (eg. they were environment-specific) so it would be easier to set up the environment; but the plugins are a very important part of the application, and testing without them just isn’t realistic.
a couple sql scripts to tweak the domain names to dev-domain.com or something similar
Yes, thank you – I managed to do that with normal WP installations (just changed two wp_options and all menus/attachments), but that doesn’t seems to be enough on multisite. I’m guessing that I’ll also need to change the wp_blogs and wp_sites tables… I’ll try that.
I asked Boone to put two cents in this thread, as he helps manage a very large network with a dev environment. (and with git too ?? )
Yeah, at this point, I have a lot of ideas, but … oddly, that’s the stuff people pay me to do ?? Not that I’m disinclined to help, but more it’s a LOT of work and a lot of customization to what YOU have and something you’ll want to hire for.
I do it (personally) with SVN, shell scripts, and a DB copy tool that edits the domain from domain.com to domain-dev.com so I can copy DOWN the live data from my live DB to a test DB. The other option is to use a sample DB with a selection of your posts that are the important types. But this one’s really complicated :/
The network I run isn’t nearly as big as a half-million posts (we’ve got about 700 blogs alongside a very active BuddyPress network). But maybe there are some things you can take away from our development procedures.
– We’ve got a production environment and a staging/acceptance environment running on the same VM, so we are guaranteed identical environments. Development is done on local machines. About 8 people can push to our repository on Github. When it’s time to migrate code to staging, or to release a stable version on production, I shell into that server and git pull from our central repository. Except in emergencies or when I’m tweaking a release, no one touches code on the live servers. This ensures that commit trees don’t get out of sync.
– We don’t really attempt to do any kinds of database syncing between dev environments. We do nightly backups, of course, and those dumps are available for developers if they need to refresh their local installations. That means that, every month or two (less if I don’t think about it) I do a manual mysql import of last night’s backup into my dev environment or into the staging environment. On occasion it can be a pain to have the databases out of sync, but my thought is this: if you are developing in such a way that it requires a particular piece of data in the database, you are probably developing wrong. This is of a piece with your point about code being environment-agnostic.
– When we DO decide to import a database dump into a new machine, we have two options. One is to do the import wholesale and to edit the hosts file on the local machine so that the production domain points to localhost. This has the advantage of being easy and relatively foolproof in the execution. On the other hand, it freaks me out, because often will get distracted and forget whether I’m editing a local or a remote copy of the website (since the URLs are the same). The other option is to run a script on the local copy of the database before or after importing. We have one that does a pretty thorough job, looking through every field in every table, unserializing if necessary, replacing the old domain with the one you specify, and resaving. Obviously, when you are working with a huge database, this can take a while, but since we don’t do it very often, it’s not a big deal. I’d be happy to share this script with you if you’re interested. I launch it manually, but it’d be easy to hook it as part of an automated chain.
– There are occasional exceptions to environment-agnosticism, where the particular data in the database really is of paramount importance. One kind of scenario is where you have to do something simple like activate a plugin – simply pushing it up is not enough, but you actually have to change a setting. The other kind of case is where you have bugs in data. I recently had a situation where I migrated a few tens-of-thousands of email subscription data to a totally different format, and I found out after migrating and launching that in a few edge cases my migration script had the wrong logic. In that case, I did a fresh import of the database and wrote the script to fix the problem all on my local installation. Then, when I committed, I made a note in my commit message that the script would have to be triggered after the site got upgraded (we have a convention of putting ACTION_REQUIRED in the commit message – that way I can easily git log | grep ACTION_REQUIRED to get a sense of what has to be done at release time before taking the site out of maintenance mode). Same for plugin that need to be activated, themes that need to be made available, settings that need to be set, etc.
– Because we have a pretty fair amount of confuguration data in our wp-config and other config files, we abstracted the environment specific data (which really boils down to dbname, dbuser, and db password), defined those constants in a separate file, and then included it at the top of wp-config. That way we get to keep the main config file in the repo.
– We do not have a good system in place for quality assurance. Especially with BuddyPress, there are so many different kinds of content that it’s not possible for our small team to check every little thing every time we do a release (time between releases probably averages 1-2 weeks, sometimes much less). I try to mitigate this by having multiple instances of the site on my local machine, in a configuration borrowed from the way that WP versions itself: a master branch where new development goes, and a stable branch for bugfixes. All of the developers but me develop on the master branch, toward the next feature release. When they commit something that I think should go into the stable branch, I use git cherry-pick. We generally run the master branch in the staging environment, because that’s the place where we can get the most eyeballs on it (especially from non-coding members of the team, who don’t maintain local dev environments) and generally, the features in the master/dev branch are the ones that need the most testing anyway. If WordPress were a different kind of software, we would be hardcore about having unit tests, and maybe it’s something we’ll move toward in the future – but at the moment it’s all human powered.
Hope some of that helps.
I am hugely impressed with Boone.
Speaking directly to QA, I have two scripts used for testing. Here’s the logic behind them:
Script #1 has everything the app (this is for any app, web or desktop actually) MUST be able to do. In the case of WP, it’s something like ‘Make posts, edit posts, schedule posts. Add user, edit user, delete user’ etc etc. The basics of what the app does specifically customized to HOW I use it. So if I use WP to post videos, it has to be able to play them. This script is mostly generic, and hasn’t really changed. It’s based entirely on the scope of what the original design doc had.
Script #2 has everything we’ve CHANGED. Added a sidebar thing? That has to be tested. That script is written based on the change-log of every change checked in to (in your case) GitHub. It lives and dies entirely on how good the devs are at accurately recording what’s in each change (it’s a job requirement, actually, and a fireable offense for making a major change without documenting it in your check-in log).
I should mention I work for a Fortune 100 bank, with thousands of people in a dozen countries. We take this stuff really seriously ??
What I do for my own dev environments that might be an additional tip here. If the live install is my-domain.com, then my dev environment is my-domain.loc (or .tld). There is a corresponding entry in the hosts file of anyone who needs to access the site.
The main advantage to the hostname is that the backup from the live environment can be passed through a command line editor (ex. awk) and all the domain references changed which can directly be passed to MySQL to restore to the dev environment.
That’s some awesome feedback! Thank you all.
…it’s a LOT of work and a lot of customization to what YOU have and something you’ll want to hire for.
Indeed.
…use a sample DB with a selection of your posts that are the important types. But this one’s really complicated :/
Yeah, and and the resulting DB may end up not being a faithful representation of the real one. Much like Boone mentioned, I had a few edge cases issues that didn’t show up on the data slice I took.
* About your scripting approach:
Your approach seems specially interesting. I’m used to writing unit/integration tests for my other projects, so it pains me greatly to not do the same thing with WP.Can you please share more information about how you do that (like what tools do you use)?
I should mention I work for a Fortune 100 bank, with thousands of people in a dozen countries. We take this stuff really seriously ??
That’s impressive, and makes me even more interested to know how you do that tests; since if it’s good enough for you, it’s also going to be for the vast majority of people.
We’re a religious non-profit organization here; which makes implementing good development practices a pain – since our income isn’t very dependent on the quality of the services we deliver (most of it comes from donations and the like), it’s really hard to get resources to improve things. It’s amazing how stale an enterprise can get when it doesn’t need to be competitive.
@boone:
Now thats a lot of info – let’s talk about it:* About the server setup and repositories: that’s a lot similar to how we already have it running here (or to how we’d like things to run).
* About the database syncing: we tried the the local hosts file solution here, and observed the same points – easy and foolproof, but leads to a lot of mistakes and confusion. I them tried the script approach, trying to change the hostname on a few tables/fields that I knew mattered, but it didn’t work very well – maybe I’m missing some important stuff; or I should just give up on trying to change some specific places and scan every table/field instead. Since it’s exactly what you do, I would indeed like very much if you could share the script that you use.
* I really like your suggestion of using “action tags” on the commit messages. That’s going to be really helpful on non-WP projects too.
* We also took the same approach in regard to the wp-config .
* About QA, it also pains me to not have unit tests on it. Your descriptions of your workflow and how you branch stuff were very useful.
@rennick:
Interesting. Here, along with the hosts file entry, we’re experimenting with a browser extension (Fireproxy for Firefox) and our own proxy server to allow us to easily switch between the development/production environment.
The thought of “piping” the DB backup through a command-line tool in order to change the domain references didn’t occur to me – thanks, I’ll try that outScript 1 is easy. You write down what you use WP for and how you use it ??
Script 2, we use TFS and SVN and this other POS tool I can’t stand and yet am stuck supporting. Every time you check in code, you have to say why you’re making the edit. Every morning, an automated report is run, listing the changes (a trace change log, basically), and every other time you push code to the next test environment, the logs are pushed with it and you have to sign off on each change.
The other option is to run a script on the local copy of the database before or after importing. We have one that does a pretty thorough job, looking through every field in every table, unserializing if necessary, replacing the old domain with the one you specify, and resaving. Obviously, when you are working with a huge database, this can take a while, but since we don’t do it very often, it’s not a big deal. I’d be happy to share this script with you if you’re interested. I launch it manually, but it’d be easy to hook it as part of an automated chain.
I’d love to see your script if you wouldn’t mind?
I currently do a search/replace in the mysqldump .sql file to change the production domain name to a dev one, but has caused problems with the serialized data (presumably because the string length changes with the domain name change).
Ron’s idea of changing my-domain.com to my-domain.loc (ie keeping the domain name string the same length) is a good one that I’ll try I think.
James
Here’s the script we use to switch domains in a database: https://pastebin.com/p3fydKGd
Keep in mind that it also tests for the existence of an external bbPress db (for legacy reasons in our case) as well as a MediaWiki database. The script is configured to run as a command-line utility (saved with a .phpsh extension) but you can abstract out the guts however you’d like.
Thanks for sharing your script Boone.
Have you had any problems with your script breaking wp_option records that contain serialized data?
For example, the cForms settings are stored in the cforms_settings wp_option record. The settings are a PHP array, which is serialized before saving into the database.
If your script just does a search/replace on this, it will break the array if the string length changes. So if your new domain is a different string length to the old domain, then it will cause problems.
Ron’s idea of changing my-domain.com to my-domain.loc (ie keeping the domain name string the same length), seems like a good way to get around that though.
James
I just came across this search/replace script, which looks like it takes care of serialized wp_option records too:
I just came across this search/replace script, which looks like it takes care of serialized wp_option records too:
This works very well for what its supposed to do.
- The topic ‘Development environment for huge WP network’ is closed to new replies.