Post missing, search results’ order
-
Hi,
After updating to v1.65.225, I’ve noticed some things changed in search results after having to re-index our whole database.
The most worrying part is that some posts aren’t showing up in results at all. Here’s one example: https://macmagazine.com.br/post/2023/12/06/apple-tv-recebe-5-indicacoes-aos-spirit-awards-com-shrinking-the-changeling-e-mais/
Here’s the test for its ID: https://d.pr/i/nLWK9U
And a test search for “Spirit Awards”: https://d.pr/i/1DQofL
Besides that, I’ve also noticed that the search results’ order has changed a bit from the previous to the new version. Did it change the results’ weights or anything else I can change to make it more or less how it was before?
Thank you!
-
Okay, so we located the problem, that’s great. It seems WPFTS does not extract post content correctly for OLD posts (ones created in WP Classic Editor) when Block Editor is active.
Let me make some checks on my dev machine, I hope to repeat the same problem here and then I will be able to fix that.
Thanks!
Awesome, thanks a lot @epsiloncool! I hope it’s indeed that. ??
I was thinking a bit more about it, @epsiloncool, and maybe — unfortunately — it’s not that simple. I just ran some test searches on ever older posts than that one, all made with the Classic Editor, and they’re correctly indexed by WP Fast Total Search.
Also, that Spirit Awards one that wasn’t indexed at all is from December 2023, made of course with the Gutenberg editor. ??
Thank you for this idea, @rfischmann !
I also found some other posts in your CSV report. Could you please check if those records have some data in Sandbox tester? Could you say were these posts created with Classic editor or Block?
217165, 336337, 452385, 491877
Also I would ask you to NOT update those post, because we would need them to test fixed WPFTS version later.Thanks!
None of those 4 posts have any data in Sandbox tester besides the title, like this one: https://d.pr/i/dV6by0
However, it’s weird that, even like that, it’s showing up in the search results: https://d.pr/i/hAbBeL — I’ve also tested with the other three and all of them show up (unlike that Spirit Awards one).
And yes, all 4 posts are old, created with the Classic Editor. I haven’t updated any of them.
Hi @rfischmann
It’s okay that those posts are searchable by title, since the title was indexed fine. But they should be not searchable by content.
I have one more question. Could you please check the options in the Indexing Engine Settings / Indexing Defaults?
Do you have all the options ON? At least first 3 options. Please let me know.
Thanks!
Yeah, but once again going back to that Spirit Awards issue, wasn’t its title also indexed? https://d.pr/i/nLWK9U
Here are my settings: https://d.pr/i/1cqnD0
I’ve disabled shortcodes because we have one that generates a full “widget” of apps from Apple’s App Store, and although it has a cache, when it’s expired, it needs to contact Apple’s server to update it. When it was on, the indexation would never finish because we have hundreds of posts with a few app widgets in each of them [one example here], so that was “blocking” it.
Nice, thank you.
This fact localizes the problem even more, because for disabled “Index Shortcodes Content” WPFTS uses different piece of the code and I have changed this part of the code recently. Now I know where to dig deeper.
In relation to “Spirit Awards issue” it’s still an open question. But I hope we will find an answer soon.
Also I would ask you to run that MySQL query again, but I made some changes: added “oembed_cache” to the exclusions, because we really not interested in them. Also I’ve increased limit to 10000 to see more info. Also I’ve grouped records with the same warn_level and post_type to simplier find them.
May be we can find something interesting then.
select * from (select p.ID post_id, wi.id windex_id, p.post_title, p.post_type, p.post_status, wlog.
status
, wd_t.n wdt_n, dd_t.n_total wdt_total, dd_t.n_nw wdt_nw, length(p.post_title) title_len, (length(p.post_title) / wd_t.n) av_t, wd_c.n wdc_n, dd_c.n_total wdc_total, dd_c.n_nw wdc_nw, length(p.post_content) content_len, (length(p.post_content) / wd_c.n) av_c, (if(isnull(wi.id),128,0) + if(wlog.status
!= 3,64,0) + if((dd_t.n_nw != dd_t.n_total) or (dd_t.n_total != wd_t.n) or (dd_c.n_nw != dd_c.n_total) or (dd_c.n_total != wd_c.n), 32, 0) + if(((length(p.post_title) > 0) and (wd_t.n = 0)) or ((length(p.post_content) > 0) and (wd_c.n = 0)), 16, 0) + if(((isnull(wd_t.n) or (wd_t.n < 1)) and (isnull(wd_c.n) or (wd_c.n < 1)) and (p.post_type in ("post", "page"))), 8, 0)) warn_level from wp_posts p left join wpftsi_index wi on p.ID = wi.tid and wi.tsrc = "wp_posts" left join wpftsi_docs wd_t on wd_t.token = "post_title" and wd_t.index_id = wi.id left join wpftsi_docs wd_c on wd_c.token = "post_content" and wd_c.index_id = wi.id left join wpftsi_ilog wlog on wlog.index_id = wi.id left join ( select v.did, sum(if(isnull(w.word),0,1)) n_nw, count(*) n_total from wpftsi_vectors v left join wpftsi_words w on w.id = v.wid group by v.did ) dd_t on dd_t.did = wd_t.id left join ( select v.did, sum(if(isnull(w.word),0,1)) n_nw, count(*) n_total from wpftsi_vectors v left join wpftsi_words w on w.id = v.wid group by v.did ) dd_c on dd_c.did = wd_c.id ) tt where tt.warn_level > 0 and (tt.post_type not in ("oembed_cache")) order by tt.warn_level desc, tt.post_type desc limit 10000Thank you.
Of course, @epsiloncool, I’m glad to help!
Here’s the new CSV for that query: https://d.pr/f/Ttriua
It seems that excluding “oembed_cache” made it much smaller, 170 lines total.
On an unrelated note (or not), @epsiloncool, I’ve just noticed that my error_log file is being flooded by this error:
[22-May-2024 09:20:03 UTC] PHP Warning: Undefined variable $finish_ts in /home/macmagazine/public_html/wp-content/plugins/fulltext-search/includes/wpfts_core.php on line 3161
If you could please fix it, thanks a lot!
Hi @epsiloncool, I’m back with a new, different example than the ones we’ve seen so far.
At this very moment, this article isn’t showing up in our search results: https://macmagazine.com.br/post/2021/06/20/applecare-protection-plan-services-e-applecare-o-que-muda/
The difference from the others is that it’s not old, it was written with Gutenberg and the worst part is that it was edited/updated just yesterday.
It seems to be ok in the Sandbox Area: https://d.pr/i/UidYWD
But it really doesn’t show up in searches: https://d.pr/i/dhf25d
Hi, @rfischmann
It’s very good that you found this new case. I would ask you to execute MySQL query to get some information about this post and index state.
Could you please execute this and then show me the result? Thanks
select * from (select p.ID post_id, wi.id windex_id, p.post_title, p.post_type, p.post_status, wlog.
status
, wd_t.n wdt_n, dd_t.n_total wdt_total, dd_t.n_nw wdt_nw, length(p.post_title) title_len, (length(p.post_title) / wd_t.n) av_t, wd_c.n wdc_n, dd_c.n_total wdc_total, dd_c.n_nw wdc_nw, length(p.post_content) content_len, (length(p.post_content) / wd_c.n) av_c, (if(isnull(wi.id),128,0) + if(wlog.status
!= 3,64,0) + if((dd_t.n_nw != dd_t.n_total) or (dd_t.n_total != wd_t.n) or (dd_c.n_nw != dd_c.n_total) or (dd_c.n_total != wd_c.n), 32, 0) + if(((length(p.post_title) > 0) and (wd_t.n = 0)) or ((length(p.post_content) > 0) and (wd_c.n = 0)), 16, 0) + if(((isnull(wd_t.n) or (wd_t.n < 1)) and (isnull(wd_c.n) or (wd_c.n < 1)) and (p.post_type in ("post", "page"))), 8, 0)) warn_level from wp_posts p left join wpftsi_index wi on p.ID = wi.tid and wi.tsrc = "wp_posts" left join wpftsi_docs wd_t on wd_t.token = "post_title" and wd_t.index_id = wi.id left join wpftsi_docs wd_c on wd_c.token = "post_content" and wd_c.index_id = wi.id left join wpftsi_ilog wlog on wlog.index_id = wi.id left join ( select v.did, sum(if(isnull(w.word),0,1)) n_nw, count(*) n_total from wpftsi_vectors v left join wpftsi_words w on w.id = v.wid group by v.did ) dd_t on dd_t.did = wd_t.id left join ( select v.did, sum(if(isnull(w.word),0,1)) n_nw, count(*) n_total from wpftsi_vectors v left join wpftsi_words w on w.id = v.wid group by v.did ) dd_c on dd_c.did = wd_c.id ) tt where tt.post_id = 795909Sure, here you go: https://d.pr/f/uKAGgq
Hi @rfischmann
I believe we are as close as we can get to the cause of the problem. My initial MySQL query did not detect this situation; it is that the number of words in the document was correctly counted and stored, but in fact the words themselves were not included in the index vector lists.
Below I am attaching an improved version of the query, could you run it and attach the result as before – it will identify all unindexed publications of the first type (like Spirit Awards and 795909).All I have to do is find out why exactly these publications were not processed properly. I’m already analyzing my code and trying to replicate the problem on my side.
I’m wondering if there are any publications where only the titles were not indexed, but not the content, or vice versa. And also, is the problem related to the length of publications or the presence of special characters in the text (probably this can break queries if the encoding of the database tables does not match).
Thank you!
select * from (select p.ID post_id, wi.id windex_id, p.post_title, p.post_type, p.post_status, wlog.
status
, wd_t.n wdt_n, dd_t.n_total wdt_total, dd_t.n_nw wdt_nw, length(p.post_title) title_len, (length(p.post_title) / wd_t.n) av_t, wd_c.n wdc_n, dd_c.n_total wdc_total, dd_c.n_nw wdc_nw, length(p.post_content) content_len, (length(p.post_content) / wd_c.n) av_c, (if(isnull(wi.id),128,0) + if(wlog.status
!= 3,64,0) + if((ifnull(dd_t.n_nw, 0) != ifnull(dd_t.n_total, 0)) or (ifnull(dd_t.n_total, 0) != ifnull(wd_t.n, 0)) or (ifnull(dd_c.n_nw, 0) != ifnull(dd_c.n_total, 0)) or (ifnull(dd_c.n_total, 0) != ifnull(wd_c.n, 0)), 32, 0) + if(((length(p.post_title) > 0) and (wd_t.n = 0)) or ((length(p.post_content) > 0) and (wd_c.n = 0)), 16, 0) + if(((isnull(wd_t.n) or (wd_t.n < 1)) and (isnull(wd_c.n) or (wd_c.n < 1)) and (p.post_type in ("post", "page"))), 8, 0)) warn_level from wp_posts p left join wpftsi_index wi on p.ID = wi.tid and wi.tsrc = "wp_posts" left join wpftsi_docs wd_t on wd_t.token = "post_title" and wd_t.index_id = wi.id left join wpftsi_docs wd_c on wd_c.token = "post_content" and wd_c.index_id = wi.id left join wpftsi_ilog wlog on wlog.index_id = wi.id left join ( select v.did, sum(if(isnull(w.word),0,1)) n_nw, count(*) n_total from wpftsi_vectors v left join wpftsi_words w on w.id = v.wid group by v.did ) dd_t on dd_t.did = wd_t.id left join ( select v.did, sum(if(isnull(w.word),0,1)) n_nw, count(*) n_total from wpftsi_vectors v left join wpftsi_words w on w.id = v.wid group by v.did ) dd_c on dd_c.did = wd_c.id ) tt where tt.warn_level > 0 and (tt.post_type not in ("oembed_cache")) order by tt.warn_level desc, tt.post_type desc limit 10000I’m glad to be able to help you figure out what’s going on, @epsiloncool! I hope you’ll get there. ??
Here’s the new CSV: https://d.pr/f/i9iGib
- The topic ‘Post missing, search results’ order’ is closed to new replies.