A lot of entries in bpges_queued_items
-
Hi, I’m following up on this issue – https://www.ads-software.com/support/topic/queue-not-being-processed-with-http-auth/ – where I mentioned that there’s a large amount of items in the bpges_queued_items table. On that thread I thought it might be related to the issue I was having with BPGES not sending on our staging server with HTTP Auth. That particular issue is resolved, but as stated, it didn’t appear to relate, as there’s a lot of items in the queue table on the live server, which has no HTTP Auth.
So on the live server:
- There’s about 21,000 items in bpges_queued_items, going back to December 2021.
- Only ~300 aren’t ‘immediate’ (i.e. ‘dig’ or ‘sum’), and all of these are from the past week.
- There’s been no reports from the client of notifications not being received (though this isn’t a firm indicator, as site members may not be noticing missing notifications, or not reporting it to the client).
- To try to debug this, I have access to WP SMTP on-site logs of sent emails going back 30 days, and DotDigital logs.
- Going back 30 days, I’ve taken notifications related to a post in a group (for my ref: group ID 30, activity ID 48696) on 3/4/23, and examined a few entries in bpges_queued_items. By recipient user ID for my ref:
- 7372 – WP SMTP logged as failed, no details.
- 4090, 7411, 3836, 7381, 2923, 637, 5077, 7497 – No WP SMTP log, no DD log.
I stopped checking after these. Apart from the 1st oddity where there’s a log of the sending failing, most seem to not even have a failure in the WP SMTP log.
This all suggests that rather than there being an issue with the queued items being purged after sending, there’s a real issue with sending.
-
Something I’ve just noticed, perhaps relevant. The old queued items go back to 9/12/21. Looking at my commit log, just before this (3/12/21) was when I updated my own forked version of BPGES to 4.0.0, but including my own customisations. You might recall I’d forked BPGES in order to include these customisations, and missed the 4.0.0 update, which I think was when you improved the queueing (which was causing issues for us). You eventually included my customisations, but until those were included in the main plugin, I updated my forked version manually to 4.0.0.
There’s a 6-day gap between the shift to 4.0.0 and the first apparently unsent items – but it seems significant.Thanks for the continued debugging @gyrus
This all suggests that rather than there being an issue with the queued items being purged after sending, there’s a real issue with sending.
Yes, I agree. To help narrow it down, here are some thoughts and questions:
- Are you running the BPGES log?
define( 'BPGES_DEBUG', true );
, then look for bpges-debug.log in your uploads directory. - The send process works first by recording queued items in the database. Obviously, this part of the process is working for you, since the items are there.
- Then, the server triggers the
dispatch()
method on the queue object. See https://github.com/boonebgorges/buddypress-group-email-subscription/blob/c39a883cd2444638158683a008ef85fbbe1b8356/bp-activity-subscription-functions.php#L271. This creates a request toadmin-ajax.php?action=wp_bpges_send_queue...
So the first place to look may be in your access logs, to see whether this request is hitting the server in the problematic cases, and whether it’s responding with a 200 or some error code. - Assuming a 200, the send-queue handler is here: https://github.com/boonebgorges/buddypress-group-email-subscription/blob/c39a883cd2444638158683a008ef85fbbe1b8356/classes/class-bpges-async-request-send-queue.php#L47 There are a few ways it could go wrong: the ‘type’ payload could be empty; the ‘type’ payload could be invalid.
- It seems as if your problem is specifically with ‘immediate’ sends, so the next place to look is at https://github.com/boonebgorges/buddypress-group-email-subscription/blob/c39a883cd2444638158683a008ef85fbbe1b8356/classes/class-bpges-async-request-send-queue.php#L93 If we get to this point, then an entry will be put into the debug log. Is this happening?
- Is the query at https://github.com/boonebgorges/buddypress-group-email-subscription/blob/c39a883cd2444638158683a008ef85fbbe1b8356/classes/class-bpges-async-request-send-queue.php#L101 fetching the items that you would expect?
For some of these items, you may have to set up some custom PHP-based logging.
Hopefully this helps us get closer to the root of the issue.
Hi, thanks for the response. As soon as I got this reply I set BPGES_DEBUG before I came back to this. The log now has details going back to the beginning of yesterday. There are entries there saying, ‘Beginning batch of immediate notifications for…’.
First off I’m wondering how old still-queued items should be before we can deem them as having failed. For now, if I take the oldest activity for which immediate notifications are being sent (activity ID 49470, 03-May-2023 09:34:11), there are no entries in the queue table, so I assume these got sent. The WP SMTP log confirms this.
So, looking at the queue table for items still remaining… There’s some for activity 49472. The beginning of this batch is indicated in the log file, but unlike 49470, there’s no subsequent ‘Finished sending…’. The next line in the log is ‘Beginning batch…’ for another activity. Looking in the PHP error log, there seem to be a few warnings around that time, but no fatal errors, nothing that looks relevant.
I’ll now try to add some custom debug logging in there to catch the next issue better. As you’ve mentioned, it looks like the main thing to check is $items. Let me know if you think there’s anything else worth logging.
First off I’m wondering how old still-queued items should be before we can deem them as having failed.?
The plugin never tries to resend immediate items. See https://github.com/boonebgorges/buddypress-group-email-subscription/issues/163 For digests, items will be abandoned after 3 weeks (for weekly digests) or 3 days (for daily digests). https://github.com/boonebgorges/buddypress-group-email-subscription/blob/master/bp-activity-subscription-digest.php#L984
There’s some for activity 49472. The beginning of this batch is indicated in the log file, but unlike 49470, there’s no subsequent ‘Finished sending…’.?
This is very helpful, as it indicates that the batch is beginning https://github.com/boonebgorges/buddypress-group-email-subscription/blob/master/classes/class-bpges-async-request-send-queue.php#L94 but is never being completed https://github.com/boonebgorges/buddypress-group-email-subscription/blob/master/classes/class-bpges-async-request-send-queue.php#L111. Two things to cross-reference:
- Check your mail logs. Did any notifications go out for 49472? Perhaps some of them succeeded, but not all? Can you collect counts (how many should have been sent vs how many were sent)?
- Check your server access logs for
admin-ajax.php?action=wp_bpges_send_queue
around this time. It could be that two batches were required, and the first one sent, but the request for the second one was blocked by the server.
OK so the activity I picked turned out to be tricky because a member cross-posted the same update to multiple groups around the same time! But it looks like there are actually sent email logs for the activity 49472 – about 322. There’s about 480 members in the group. This seems feasible, 322 being the number of members with immediate subscriptions. Is it worth checking that – how would I find the count of members with an immediate subscription?
Anyway, this makes debugging a little trickier, if lack of ‘Finished sending…’ in the BPGES debug log doesn’t necessarily indicate a failed batch. I guess this does mean that there’s a problem with purging the queue table items – perhaps in addition to failed sending? :-/
I’ll come back and check some other logs later.
-
This reply was modified 1 year, 10 months ago by
Steve Taylor. Reason: Typo
This seems feasible, 322 being the number of members with immediate subscriptions. Is it worth checking that – how would I find the count of members with an immediate subscription?
SELECT COUNT(*) FROM wp_bp_subscriptions WHERE group_id = 123 AND type = 'immediate'
Anyway, this makes debugging a little trickier, if lack of ‘Finished sending…’ in the BPGES debug log doesn’t necessarily indicate a failed batch.?
Lack of ‘Finished Sending’ indicates that the entire process didn’t finish running for a given send event. A “send event” is one of two things:
- A scheduled digest run event, where BPGES loops through all members who have pending digest items and sends them
- An individual activity item event, where BPGES sends out ‘immediate’ notifictainos to all subscribed members
Each one of these “events” can take place in multiple batches. In case 2 (which is the current case we’re talking about), BPGES will send to as many subscribed ‘immediate’ users as it can before meeting either the timeout or memory threshhold. See https://github.com/boonebgorges/buddypress-group-email-subscription/blob/master/classes/class-bpges-async-request-send-queue.php#L125. Then it will trigger another batch, which will be represented by another
admin-ajax.php?action=wp_bpges_send_queue
entry in the webserver access log. https://github.com/boonebgorges/buddypress-group-email-subscription/blob/master/classes/class-bpges-async-request-send-queue.php#L134 It’ll keep going until there’s no more subscribed users to process. https://github.com/boonebgorges/buddypress-group-email-subscription/blob/master/classes/class-bpges-async-request-send-queue.php#L115So it could be that the first batch of immediate sends is working properly – in this case, it includes 322 users – before a second batch is triggered, but that second batch is failing for whatever reason. In that case, you’d see an entry in the log like this https://github.com/boonebgorges/buddypress-group-email-subscription/blob/master/classes/class-bpges-async-request-send-queue.php#L130
In contrast, your hypothesis about items being sent but not purged from the queue is possible, though it doesn’t seem likely to me. See https://github.com/boonebgorges/buddypress-group-email-subscription/blob/master/classes/class-bpges-async-request-send-queue.php#L120 – the delete for each item happens immediately after the send.
Running that SQL to check on the number of immediate subscriptions in the group where activity 49472 was posted (group ID 60) gives 0. Looking directly in the table for that group, there’s 5 ‘dig’, 30 ‘sum’ and 432 ‘supersub’ (and 35 ‘no’). I’m not sure where that leaves things for that activity.
Anyway, while my bpges-debug.log goes back to 3/5, I didn’t get my own additional custom logging in til 4/5. So, looking at my own custom logging for the earliest apparent issue, and cross-referencing with other sources of info:
- In bpges-debug.log there’s this line: ‘[04-May-2023 11:02:26] Beginning batch of immediate notifications for 49530.’
- For activity 49530 in wp_bpges_queued_items there’s 139 records from this time, all of type ‘immediate’. The IDs of these records go from 320135 to 320280.
- The associated group ID is 60 (the same as the group detailed at the top of this post).
- In my custom log, which outputs $items from \BPGES_Async_Request_Send_Queue::handle_immediate_queue(), just after $items = $query->get_results() – the first BPGES_Queued_item is ID 319816, type immediate (this record no longer seems to be in the queued items table).
- The last queued item for this activity is ID 320135. I assume it’s no coincidence that this is the first queued item still in the table (before the remaining records up to 320280 stack up).
So it looks like something’s failed at this point, and it’s not returned to send the rest of the items?
But – what about the ‘supersub’ subscription types? As mentioned, there don’t appear to be any ‘immediate’ subscriptions for group 60 in wp_bpges_subscriptions. There’s 292 items in my custom log being sent for this activity, all type ‘immediate’. Then there’s 139 items still in the queued items table. Allowing for the one overlap, that’s 430. The closest match for that seems to be the 432 ‘supersub’ subscriptions for this group.
I’m just getting the access logs, I’ll report back.
So yes, in the access log at 11:02:34 (8 seconds after the batch beginning noted above), there is: “POST /wp-admin/admin-ajax.php?action=wp_bpges_send_queue&nonce=xxxxxx HTTP/1.1”
Based on what you’ve said, this would be the attempt to trigger another round of sending after the first lot stopped with queued item 320135 – but since records after that remain in the queued items table, this attempt failed? Anything else to look for as to why? It looks like the PHP error logs have been purged from that far back, but I never found anything checking previously.
Thanks for the detailed debugging. I think we’re getting closer to what’s happening.
Looking at activity 49530, here’s what I glean from your research:
- 422 users should have immediate notifications
- 292 users actually received the notification (you mention that this is the number that is listed in your custom log, and I assume this means that the emails actually went out – you may want to cross-reference with your mail logs)
- 139 items remain in the queue table. This roughly corresponds to the difference between 422 and 292 – one or two may have been lost in whatever happened.
From this it seems likely that the following is happening:
- When 49530 was created, the request was successfully sent to begin the asynchronous sending of the first batch. There should be an entry in the access log
POST /wp-admin/admin-ajax.php?action=wp_bpges_send_queue
corresponding to this first send, which should appear very soon after the timestamp on 49530. I can’t tell from your report whether the 11:02:34 entry corresponds to this request, or whether there was one before it. - The server successfully received and handled this request, by beginning to send notifications to ‘supersub’ users.
- At this point, one of two things happened, and which one depends on what’s in your logs. Either (a) BPGES successfully determined that it was running short on memory/time, so it stopped processing notifications, and sent a request to trigger the next batch. This would be represented by a *second*
POST /wp-admin/admin-ajax.php?action=wp_bpges_send_queue
entry in your log, shortly after the first one. Or (b) BPGES is failing to determine that it’s running out of memory or time, so the process is unexpectedly crashing. The next batch request is never sent.
Given what I see in your explanation – namely, that you mention only one
wp_bpges_send_queue
entry in your access logs; that you don’t say whether it is recorded as a 200, which suggests to me that it probably is; and that bpges-debug.log never records an entry “Sent 292 immediate notifications…” https://github.com/boonebgorges/buddypress-group-email-subscription/blob/fa241d53aadddfee29a3670b94baacdba81bf822/classes/class-bpges-async-request-send-queue.php#L130 – I suspect that 3(b) is what’s happening.Given that there’s nothing in your PHP error log indicating that this is the case, we’re going to have to do a bit of guessing. The plugin doesn’t currently have a way to force a ceiling on batch size. I will consider adding it in the future, but for the time doing I think we can work around it with something like this:
function bpges_batch_count( $add = null ) { static $count; if ( ! $count ) { $count = 0; } if ( $add ) { $count += (int) $add; } return $count; } add_filter( 'wp_bpges_send_queue_memory_exceeded', function( $retval ) { if ( $retval ) { return $retval; } $batch_count = bpges_batch_count( 1 ); $batch_max = 100; // This seems reasonable for our tests if ( $batch_count > $batch_max ) { return true; } return $retval; } );
So, in other words, we’re hijacking the ‘memory_exceeded’ check to implement a ceiling on batch size. In my example, I’m using 100, which seems like it’s well under the threshhold where you’re experiencing problems.
Thanks for reminding me to cross-reference the email logs! As far as I can see, for the activity 49530 around 04-May-2023 11:02, there were 284 successfully sent emails, 4 failures. Three of the failures appear to be ones legitimately blocked by our code (because reasons). One failure seems to be fair, probably an issue with the recipient? Anyway this (very roughly) tallies with my custom log of $items.
To clarify on the occurrence of POST /wp-admin/admin-ajax.php?action=wp_bpges_send_queue in the access log, this does appear about 8 seconds after the beginning of sending for that activity appears in bpges-debug.log. The previous occurrence is about 7 minutes before, then there’s one 2 hours after – so it looks like this is the one.
Anyway, I misunderstood, I thought this AJAX request indicated the end of the first round of sending and the triggering of the next. Sounds like this is the start of the first round, and the second never happened.
Also I missed the crucial detail out of the access log line, the status code, probably because it’s not familiar to me! 499, which seems to be related to an NGINX timeout. Perhaps something to ask our hosts about? Of course they’re usually reluctant to ‘support 3rd party plugins’, so please let me know what exactly we might need from them, a lot in this kind of thing depends on phrasing.
Just to be clear on something, is ‘supersub’ a kind of synonym for ‘immediate’?
Thanks for the additional info. All of this, especially the 499 code, means that my hypothesis is almost certainly correct.
Perhaps something to ask our hosts about?
I wouldn’t bother. You could bump up the timeout a bit, but you’re bound to hit the ceiling again as the site grows. Underlying all of this, there’s a problem in BPGES – it ought to be detecting that the script is running out of time, but it’s not doing it properly. However, this is a tricky thing to attempt to debug without direct access, and from a practical point of view it’s unlikely that I could make any changes in BPGES that would account for all possible hosting environments.
As such, I think the best way forward is to use the technique I described above to limit the batch size. I’ve created a ticket https://github.com/boonebgorges/buddypress-group-email-subscription/issues/237 to create a simpler method for setting a max batch size, and when I’ve built it, perhaps you can help to test it.
Thanks – I’ll have to come back to this next week but seems like we’re getting there!
The bpges_batch_count() code you’ve provided, is there any particular file it should go in? Does it need integrating into the BPGES_Async_Request_Send_Queue class?
It’s written as a plugin, so you can put it into an mu-plugins file, or into a standalone plugin file. Or it would probably work in functions.php of your theme. No modification of BPGES is necessary.
Of course, thanks for the clarification! It’s in place in our custom theme. I guess I’ll leave my debugging stuff in place for now and check back in a few days to see how batches are doing.
- Are you running the BPGES log?
- The topic ‘A lot of entries in bpges_queued_items’ is closed to new replies.