Numerous set_default_options() calls cause site to crash
-
Hi,
We’ve been experiencing website crashes daily (Internal Error 500 [nginx]), and I believe I’ve identified the cause as numerous calls to Options:set_default_options(). To test this, I added return statement to the first line of that method, and the page response time has improved significantly. Is it possible for you to limit the number of times this method is called?
For reference:
Authorizer 3.8.3
WordPress 6.5.5
Multisite with 192 subsitesRegards,
Mark Pemburn
Clark UniversityThe page I need help with: [log in to see the link]
-
Aloha Mark, can you provide any more information on the HTTP 500 errors? Such as what specific pages it is happening on. wp-login.php? Admin pages? Front end? What shows up in the nginx error logs when this happens?
Also, what specific pages is “page response time” improving after stubbing the function?
set_default_options()
is really only intended to run on plugin activation, and during database migrations, and I think we determined in your last thread that no database migrations were pending: https://www.ads-software.com/support/topic/authorizer-seems-to-be-slowing-us-down/It’s possible there’s been a regression somewhere, or another plugin is triggering the excess calls. It does seem to be multisite-specific, and probably made worse by the number of subsites in your network. We run many multisite installations ourselves, but typically they have less than 100 subsites, so the fact that we haven’t seen this issue indicates it scales with number of sites.
Hi Paul,
The HTTP 500 errors were happening on all pages, front end and back. The response time is the overall page loading time, and I track it by watching the New Relic dashboard for the site. I wanted to show you a screenshot of New Relic that illustrates what I mean but there doesn’t seem to be a way here. When I added the “return false;” to the set_default_options method, the average page load times dropped from around 7 seconds to less than a second.This is from the New Relic “slowest transactions” list from one of our worst times:
#Authorizer\Options::set_default_options
Execution count: 969
Duration 16.16 seconds
I can only present the evidence that’s available to me, and it seems to point at this specific method call as the culprit. Let’s work together to figure this out. I’d be happy to do a Zoom call with you if that works best.Mark
Hi Paul,
I believe I’ve found the root of the problem, and it’s very much tied to the fact the we use networks on the Clark University website. The networks organize the subsites into groups for “departments”, “offices”, “schools”, etc.
The problem occurs in Updates::auth_update_check() method when it checks for the value of auth_version for the root subsite. ?In simple multisites, this would be “1”, and it would pull the value from the wp_options table. ?This is currently 20230222, so the code that tests whether updates are needed fails, and processing fall through to the end of the method without taking any action.
In some cases, however, the auth_version check sees the root site as “3” or “5”—the IDs of the “offices” and “departments” network—and returns null, since auth_version isn’t set in wp_3_options or wp_5_options. ?Now, the code thinks that updates are needed, and it runs every one of the 20 blocks of code that test for $update_if_older_than. ?Most of this are harmless as far as we’re concerned, but the one that follows $update_if_older_than = 20221101 contains a foreach that loops over every subsite and runs the Options::set_default_options() method for each one. ?This is where the CPU usage goes through the roof and we get slow performance or, in the worst cases, crashes with fatal errors.
The fix for this is pretty simple, and I don’t think that it will adversely affect the plugin for others. If you add the following after line 34 of class-update.php, it will run the code properly for the root subsite, but bail for ones that don’t return a date value for auth_version:if (empty($auth_version)) {
return false;
}
Can I request that you incorporate this into your next release of Authorizer?
Regards,
MarkThank you so much Mark for narrowing that down! We’ll definitely get this fix included in the next release; in the meantime, your hotfix can remain in place to prevent the issue from happening.
Thank you!
Aloha Mark, if you’re up for a little more testing, at your convenience can you please try removing your hotfix and replacing the line that fetches auth_version: https://github.com/uhm-coe/authorizer/blob/master/src/authorizer/class-updates.php#L31
Old/Existing:
$auth_version = get_blog_option( get_network()->blog_id, 'auth_version' );
Replacement:
$auth_version = get_blog_option( get_main_site_id( get_main_network_id() ), 'auth_version' );
Once you’ve done this please just confirm that this also fixes the issue.
We have numerous places in the codebase where we get/set Authorizer multisite settings options, so we’ll need to roll out this fix more comprehensively than just in the update check. We’ve always assumed that
get_network()->blog_id
returns the main blog in the multisite network, and you discovered that doesn’t handle multisite multi-network installations. We think thatget_main_site_id( get_main_network_id() )
will always get us the correct blog ID and not fail if the request came from one of the other networks in the multi-network.Thanks and no rush, have a great weekend.
Hi Paul,
Thanks, that works. I realized that I’ve used the same technique for one of my own plugins.
Sad to relate, we discovered another couple of problems with Authorizer this morning. These showed up when we attempted to create a new subsite and got fatal errors. The first of these was in class-options.php on line 45.if ( ! array_key_exists( $option, $auth_settings ) ) {
return null;
}
…threw the error “Fatal error: Uncaught TypeError: array_key_exists(): Argument #2 ($array) must be of type array, bool given”. I changed it to this:if ( ! is_array( $auth_settings ) || ! array_key_exists( $option, $auth_settings ) ) {
return null;
}The second issue was in class-authorization on line 817.
$auth_settings_access_users_approved = array_merge(
$options->get( 'access_users_approved', Helper::SINGLE_CONTEXT ),
$options->get( 'access_users_approved', Helper::NETWORK_CONTEXT )
);
…threw the error: “PHP Fatal error: Uncaught TypeError: array_merge(): Argument #1 must be of type array, bool given”. I changed it to this:$single = $options->get( 'access_users_approved', Helper::SINGLE_CONTEXT );
$network = $options->get( 'access_users_approved', Helper::NETWORK_CONTEXT );
if ( is_array( $single ) && is_array( $network ) ) {
// Get all site users and all multisite users.
$auth_settings_access_users_approved = array_merge(
$options->get( 'access_users_approved', Helper::SINGLE_CONTEXT ),
$options->get( 'access_users_approved', Helper::NETWORK_CONTEXT )
);
}
You may have a better way of doing this. If you could squeeze this into you next release, I’d be most grateful.
Mark- This reply was modified 4 months, 4 weeks ago by mpemburn.
We have released version 3.8.5 with a comprehensive fix for looking up the main site in the main network in a multi-network installation:
https://github.com/uhm-coe/authorizer/commit/16c2a42cd938164a0f718fbba994b4cafb7f637b
This should also resolve the other errors you encountered when creating a new site, but please let us know if you continue to run into errors.
Thanks again for your diligence getting this one identified and fixed!
Also as a point of clarification: Authorizer Multisite Settings are currently implemented only on the main network in a multi-network setup, and shared across all networks, instead of having separate Authorizer Multisite Settings in each network in the multi-network. Hopefully this works with your setup, and if individual sites in any sub-network need different config, they can be configured in Advanced settings to override the multisite settings (and be configured independently).
Hi Paul,
Thank you! I’ve installed 3.8.5 on our test instance, and it looks good so far. I’ve asked my team to give it a workout, and I’ll be monitoring New Relic to make sure no further spikes show up.
I really appreciate your help with this! Through our travails with server crashes over these past few weeks, you are the only plugin vendor out of a handful to respond quickly and appropriately.
Best regards,
MarkGreat! Thanks for the kind words.
Please reach out if your team finds any additional problems. Cheers
- You must be logged in to reply to this topic.