Forum Replies Created

Viewing 15 replies - 1 through 15 (of 305 total)
  • Thread Starter Steve Taylor

    (@gyrus)

    I’ve just checked the logs on this. Given that I added your patch on Wednesday evening, the immediately obvious signs of change are:

    1. From that time, items like this begin appearing in the bpges-debug.log: ‘[17-May-2023 17:51:07] Sent 101 immediate notifications for 49830 this batch. Launching another batch….’
    2. In the wp_bpges_queued_items table, the latest remaining ‘immediate’ item is from Wednesday morning.
    3. There are ‘sum’ items from today going back to Wednesday – I think it’s normal for them to not be sent for up to a week?

    Looking at the most recent batched ‘immediate’ notifications for an activity, the debug log says 208 notifications sent, which matches the number of ‘supersub’ subscriptions to that group. There seem to be 206 matching emails in the send log – which seems fine?

    Let me know if you can think of anything else worth verifying before I remove the debug logging, but it looks like we’re in a much better place now.

    Thread Starter Steve Taylor

    (@gyrus)

    Of course, thanks for the clarification! It’s in place in our custom theme. I guess I’ll leave my debugging stuff in place for now and check back in a few days to see how batches are doing.

    Thread Starter Steve Taylor

    (@gyrus)

    The bpges_batch_count() code you’ve provided, is there any particular file it should go in? Does it need integrating into the BPGES_Async_Request_Send_Queue class?

    Thread Starter Steve Taylor

    (@gyrus)

    Thanks – I’ll have to come back to this next week but seems like we’re getting there!

    Thread Starter Steve Taylor

    (@gyrus)

    Thanks for reminding me to cross-reference the email logs! As far as I can see, for the activity 49530 around 04-May-2023 11:02, there were 284 successfully sent emails, 4 failures. Three of the failures appear to be ones legitimately blocked by our code (because reasons). One failure seems to be fair, probably an issue with the recipient? Anyway this (very roughly) tallies with my custom log of $items.

    To clarify on the occurrence of POST /wp-admin/admin-ajax.php?action=wp_bpges_send_queue in the access log, this does appear about 8 seconds after the beginning of sending for that activity appears in bpges-debug.log. The previous occurrence is about 7 minutes before, then there’s one 2 hours after – so it looks like this is the one.

    Anyway, I misunderstood, I thought this AJAX request indicated the end of the first round of sending and the triggering of the next. Sounds like this is the start of the first round, and the second never happened.

    Also I missed the crucial detail out of the access log line, the status code, probably because it’s not familiar to me! 499, which seems to be related to an NGINX timeout. Perhaps something to ask our hosts about? Of course they’re usually reluctant to ‘support 3rd party plugins’, so please let me know what exactly we might need from them, a lot in this kind of thing depends on phrasing.

    Just to be clear on something, is ‘supersub’ a kind of synonym for ‘immediate’?

    Thread Starter Steve Taylor

    (@gyrus)

    So yes, in the access log at 11:02:34 (8 seconds after the batch beginning noted above), there is: “POST /wp-admin/admin-ajax.php?action=wp_bpges_send_queue&nonce=xxxxxx HTTP/1.1”

    Based on what you’ve said, this would be the attempt to trigger another round of sending after the first lot stopped with queued item 320135 – but since records after that remain in the queued items table, this attempt failed? Anything else to look for as to why? It looks like the PHP error logs have been purged from that far back, but I never found anything checking previously.

    Thread Starter Steve Taylor

    (@gyrus)

    Running that SQL to check on the number of immediate subscriptions in the group where activity 49472 was posted (group ID 60) gives 0. Looking directly in the table for that group, there’s 5 ‘dig’, 30 ‘sum’ and 432 ‘supersub’ (and 35 ‘no’). I’m not sure where that leaves things for that activity.

    Anyway, while my bpges-debug.log goes back to 3/5, I didn’t get my own additional custom logging in til 4/5. So, looking at my own custom logging for the earliest apparent issue, and cross-referencing with other sources of info:

    • In bpges-debug.log there’s this line: ‘[04-May-2023 11:02:26] Beginning batch of immediate notifications for 49530.’
    • For activity 49530 in wp_bpges_queued_items there’s 139 records from this time, all of type ‘immediate’. The IDs of these records go from 320135 to 320280.
    • The associated group ID is 60 (the same as the group detailed at the top of this post).
    • In my custom log, which outputs $items from \BPGES_Async_Request_Send_Queue::handle_immediate_queue(), just after $items = $query->get_results() – the first BPGES_Queued_item is ID 319816, type immediate (this record no longer seems to be in the queued items table).
    • The last queued item for this activity is ID 320135. I assume it’s no coincidence that this is the first queued item still in the table (before the remaining records up to 320280 stack up).

    So it looks like something’s failed at this point, and it’s not returned to send the rest of the items?

    But – what about the ‘supersub’ subscription types? As mentioned, there don’t appear to be any ‘immediate’ subscriptions for group 60 in wp_bpges_subscriptions. There’s 292 items in my custom log being sent for this activity, all type ‘immediate’. Then there’s 139 items still in the queued items table. Allowing for the one overlap, that’s 430. The closest match for that seems to be the 432 ‘supersub’ subscriptions for this group.

    I’m just getting the access logs, I’ll report back.

    Thread Starter Steve Taylor

    (@gyrus)

    OK so the activity I picked turned out to be tricky because a member cross-posted the same update to multiple groups around the same time! But it looks like there are actually sent email logs for the activity 49472 – about 322. There’s about 480 members in the group. This seems feasible, 322 being the number of members with immediate subscriptions. Is it worth checking that – how would I find the count of members with an immediate subscription?

    Anyway, this makes debugging a little trickier, if lack of ‘Finished sending…’ in the BPGES debug log doesn’t necessarily indicate a failed batch. I guess this does mean that there’s a problem with purging the queue table items – perhaps in addition to failed sending? :-/

    I’ll come back and check some other logs later.

    • This reply was modified 1 year, 10 months ago by Steve Taylor. Reason: Typo
    Thread Starter Steve Taylor

    (@gyrus)

    Hi, thanks for the response. As soon as I got this reply I set BPGES_DEBUG before I came back to this. The log now has details going back to the beginning of yesterday. There are entries there saying, ‘Beginning batch of immediate notifications for…’.

    First off I’m wondering how old still-queued items should be before we can deem them as having failed. For now, if I take the oldest activity for which immediate notifications are being sent (activity ID 49470, 03-May-2023 09:34:11), there are no entries in the queue table, so I assume these got sent. The WP SMTP log confirms this.

    So, looking at the queue table for items still remaining… There’s some for activity 49472. The beginning of this batch is indicated in the log file, but unlike 49470, there’s no subsequent ‘Finished sending…’. The next line in the log is ‘Beginning batch…’ for another activity. Looking in the PHP error log, there seem to be a few warnings around that time, but no fatal errors, nothing that looks relevant.

    I’ll now try to add some custom debug logging in there to catch the next issue better. As you’ve mentioned, it looks like the main thing to check is $items. Let me know if you think there’s anything else worth logging.

    Thread Starter Steve Taylor

    (@gyrus)

    Something I’ve just noticed, perhaps relevant. The old queued items go back to 9/12/21. Looking at my commit log, just before this (3/12/21) was when I updated my own forked version of BPGES to 4.0.0, but including my own customisations. You might recall I’d forked BPGES in order to include these customisations, and missed the 4.0.0 update, which I think was when you improved the queueing (which was causing issues for us). You eventually included my customisations, but until those were included in the main plugin, I updated my forked version manually to 4.0.0.

    There’s a 6-day gap between the shift to 4.0.0 and the first apparently unsent items – but it seems significant.

    Thread Starter Steve Taylor

    (@gyrus)

    FYI the HTTP Auth fix works for staging – email notifications are now being sent from there.

    I’ve looked closely at one of the notifications on production which is still in the queue table. It’s from 3/2/23. The WP Mail SMTP log shows it as ‘Not sent’, and indeed it doesn’t show in the DotDigital email long. There’s nothing in the on-site WP Mail log to give further clues as to why it’s not sent. Also, there are many successful sends in the same batch, and successful sends from other batches on the same day going from the same address and to the same address (just notifications for updates in different groups).

    Anyway, I’ll close this issue as solved, many thanks. Will get back to you if I manage to get more details on the other issue.

    Thread Starter Steve Taylor

    (@gyrus)

    Thanks. To be clear, there’s a ‘pile-up’ of items in the DB queue on both the staging and production environments.

    Most of the old ones on staging will be copied from production as we periodically copy production data to staging. Staging is where we have HTTP Auth, so it’s also expected that nothing will have been sent out – so on top of old ‘immediate’ items copied from production, there are also a number of ‘dig/sum’ items piling up from failed sends on staging (queue processing blocked by HTTP Auth).

    However, there’s no HTTP Auth on production, so that’s possibly another issue. The client hasn’t reported missing notifications, so it seems likely that they’re being sent but not cleared from the queue. However, I’ll need to confirm that through monitoring.

    Thread Starter Steve Taylor

    (@gyrus)

    There’s a bunch of dig/sum that are today, presumably about to be sent. Anything older is immediate.

    Thread Starter Steve Taylor

    (@gyrus)

    Looking at the live site, it has over 15,000 queued items dating back to a year ago. But AFAIK notifications are going out. I know it’s a slight tangent but this strikes me as odd if deleting them is how to ‘de-queue’ them (there being no flag for each record to keep it and mark it as sent).

    Thread Starter Steve Taylor

    (@gyrus)

    Many thanks, just giving this a go. BTW if I want to purge (as in delete) the queued items that have piled up while this issue has been active, do I just manually delete all entries in the bpges_queued_items table?

Viewing 15 replies - 1 through 15 (of 305 total)