Viewing 15 replies - 16 through 30 (of 52 total)
  • Plugin Author Epsiloncool

    (@epsiloncool)

    Okay, so we located the problem, that’s great. It seems WPFTS does not extract post content correctly for OLD posts (ones created in WP Classic Editor) when Block Editor is active.

    Let me make some checks on my dev machine, I hope to repeat the same problem here and then I will be able to fix that.

    Thanks!

    Thread Starter Rafael Fischmann

    (@rfischmann)

    Awesome, thanks a lot @epsiloncool! I hope it’s indeed that. ??

    Thread Starter Rafael Fischmann

    (@rfischmann)

    I was thinking a bit more about it, @epsiloncool, and maybe — unfortunately — it’s not that simple. I just ran some test searches on ever older posts than that one, all made with the Classic Editor, and they’re correctly indexed by WP Fast Total Search.

    Also, that Spirit Awards one that wasn’t indexed at all is from December 2023, made of course with the Gutenberg editor. ??

    Plugin Author Epsiloncool

    (@epsiloncool)

    Thank you for this idea, @rfischmann !

    I also found some other posts in your CSV report. Could you please check if those records have some data in Sandbox tester? Could you say were these posts created with Classic editor or Block?

    217165, 336337, 452385, 491877

    Also I would ask you to NOT update those post, because we would need them to test fixed WPFTS version later.

    Thanks!

    Thread Starter Rafael Fischmann

    (@rfischmann)

    None of those 4 posts have any data in Sandbox tester besides the title, like this one: https://d.pr/i/dV6by0

    However, it’s weird that, even like that, it’s showing up in the search results: https://d.pr/i/hAbBeL — I’ve also tested with the other three and all of them show up (unlike that Spirit Awards one).

    And yes, all 4 posts are old, created with the Classic Editor. I haven’t updated any of them.

    Plugin Author Epsiloncool

    (@epsiloncool)

    Hi @rfischmann

    It’s okay that those posts are searchable by title, since the title was indexed fine. But they should be not searchable by content.

    I have one more question. Could you please check the options in the Indexing Engine Settings / Indexing Defaults?

    Do you have all the options ON? At least first 3 options. Please let me know.

    Thanks!

    Thread Starter Rafael Fischmann

    (@rfischmann)

    Yeah, but once again going back to that Spirit Awards issue, wasn’t its title also indexed? https://d.pr/i/nLWK9U

    Here are my settings: https://d.pr/i/1cqnD0

    I’ve disabled shortcodes because we have one that generates a full “widget” of apps from Apple’s App Store, and although it has a cache, when it’s expired, it needs to contact Apple’s server to update it. When it was on, the indexation would never finish because we have hundreds of posts with a few app widgets in each of them [one example here], so that was “blocking” it.

    Plugin Author Epsiloncool

    (@epsiloncool)

    Nice, thank you.

    This fact localizes the problem even more, because for disabled “Index Shortcodes Content” WPFTS uses different piece of the code and I have changed this part of the code recently. Now I know where to dig deeper.

    In relation to “Spirit Awards issue” it’s still an open question. But I hope we will find an answer soon.

    Also I would ask you to run that MySQL query again, but I made some changes: added “oembed_cache” to the exclusions, because we really not interested in them. Also I’ve increased limit to 10000 to see more info. Also I’ve grouped records with the same warn_level and post_type to simplier find them.

    May be we can find something interesting then.

    select * from
    (select 
    	p.ID post_id,
    	wi.id windex_id,
    	p.post_title,
    	p.post_type,
    	p.post_status,
    	wlog.status,
    	
    	wd_t.n wdt_n,
    	dd_t.n_total wdt_total,
    	dd_t.n_nw wdt_nw,
    	length(p.post_title) title_len,
    	(length(p.post_title) / wd_t.n) av_t,
    	
    	wd_c.n wdc_n,
    	dd_c.n_total wdc_total,
    	dd_c.n_nw wdc_nw,
    	length(p.post_content) content_len,
    	(length(p.post_content) / wd_c.n) av_c,
    	
    	(if(isnull(wi.id),128,0) +
    	if(wlog.status != 3,64,0) +
    	if((dd_t.n_nw != dd_t.n_total) or (dd_t.n_total != wd_t.n) or (dd_c.n_nw != dd_c.n_total) or (dd_c.n_total != wd_c.n), 32, 0) +
    	if(((length(p.post_title) > 0) and (wd_t.n = 0)) or ((length(p.post_content) > 0) and (wd_c.n = 0)), 16, 0) +
    	if(((isnull(wd_t.n) or (wd_t.n < 1)) and (isnull(wd_c.n) or (wd_c.n < 1)) and (p.post_type in ("post", "page"))), 8, 0)) warn_level
    	
    from wp_posts p
    left join wpftsi_index wi
    	on p.ID = wi.tid and wi.tsrc = "wp_posts"
    left join wpftsi_docs wd_t
    	on wd_t.token = "post_title" and wd_t.index_id = wi.id
    left join wpftsi_docs wd_c
    	on wd_c.token = "post_content" and wd_c.index_id = wi.id
    left join wpftsi_ilog wlog
    	on wlog.index_id = wi.id
    left join (
    	select
    		v.did,
    		sum(if(isnull(w.word),0,1)) n_nw,
    		count(*) n_total
    	from wpftsi_vectors v
    	left join wpftsi_words w
    		on w.id = v.wid
    	group by v.did
    ) dd_t
    	on dd_t.did = wd_t.id
    	
    left join (
    	select
    		v.did,
    		sum(if(isnull(w.word),0,1)) n_nw,
    		count(*) n_total
    	from wpftsi_vectors v
    	left join wpftsi_words w
    		on w.id = v.wid
    	group by v.did
    ) dd_c
    	on dd_c.did = wd_c.id
    ) tt	
    where
    	tt.warn_level > 0
    	and
    	(tt.post_type not in ("oembed_cache"))
    order by tt.warn_level desc, tt.post_type desc
    	
    limit 10000

    Thank you.

    Thread Starter Rafael Fischmann

    (@rfischmann)

    Of course, @epsiloncool, I’m glad to help!

    Here’s the new CSV for that query: https://d.pr/f/Ttriua

    It seems that excluding “oembed_cache” made it much smaller, 170 lines total.

    Thread Starter Rafael Fischmann

    (@rfischmann)

    On an unrelated note (or not), @epsiloncool, I’ve just noticed that my error_log file is being flooded by this error:

    [22-May-2024 09:20:03 UTC] PHP Warning: Undefined variable $finish_ts in /home/macmagazine/public_html/wp-content/plugins/fulltext-search/includes/wpfts_core.php on line 3161

    If you could please fix it, thanks a lot!

    Thread Starter Rafael Fischmann

    (@rfischmann)

    Hi @epsiloncool, I’m back with a new, different example than the ones we’ve seen so far.

    At this very moment, this article isn’t showing up in our search results: https://macmagazine.com.br/post/2021/06/20/applecare-protection-plan-services-e-applecare-o-que-muda/

    The difference from the others is that it’s not old, it was written with Gutenberg and the worst part is that it was edited/updated just yesterday.

    It seems to be ok in the Sandbox Area: https://d.pr/i/UidYWD

    But it really doesn’t show up in searches: https://d.pr/i/dhf25d

    Plugin Author Epsiloncool

    (@epsiloncool)

    Hi, @rfischmann

    It’s very good that you found this new case. I would ask you to execute MySQL query to get some information about this post and index state.

    Could you please execute this and then show me the result? Thanks

    select * from
    (select 
    	p.ID post_id,
    	wi.id windex_id,
    	p.post_title,
    	p.post_type,
    	p.post_status,
    	wlog.status,
    	
    	wd_t.n wdt_n,
    	dd_t.n_total wdt_total,
    	dd_t.n_nw wdt_nw,
    	length(p.post_title) title_len,
    	(length(p.post_title) / wd_t.n) av_t,
    	
    	wd_c.n wdc_n,
    	dd_c.n_total wdc_total,
    	dd_c.n_nw wdc_nw,
    	length(p.post_content) content_len,
    	(length(p.post_content) / wd_c.n) av_c,
    	
    	(if(isnull(wi.id),128,0) +
    	if(wlog.status != 3,64,0) +
    	if((dd_t.n_nw != dd_t.n_total) or (dd_t.n_total != wd_t.n) or (dd_c.n_nw != dd_c.n_total) or (dd_c.n_total != wd_c.n), 32, 0) +
    	if(((length(p.post_title) > 0) and (wd_t.n = 0)) or ((length(p.post_content) > 0) and (wd_c.n = 0)), 16, 0) +
    	if(((isnull(wd_t.n) or (wd_t.n < 1)) and (isnull(wd_c.n) or (wd_c.n < 1)) and (p.post_type in ("post", "page"))), 8, 0)) warn_level
    	
    from wp_posts p
    left join wpftsi_index wi
    	on p.ID = wi.tid and wi.tsrc = "wp_posts"
    left join wpftsi_docs wd_t
    	on wd_t.token = "post_title" and wd_t.index_id = wi.id
    left join wpftsi_docs wd_c
    	on wd_c.token = "post_content" and wd_c.index_id = wi.id
    left join wpftsi_ilog wlog
    	on wlog.index_id = wi.id
    left join (
    	select
    		v.did,
    		sum(if(isnull(w.word),0,1)) n_nw,
    		count(*) n_total
    	from wpftsi_vectors v
    	left join wpftsi_words w
    		on w.id = v.wid
    	group by v.did
    ) dd_t
    	on dd_t.did = wd_t.id
    	
    left join (
    	select
    		v.did,
    		sum(if(isnull(w.word),0,1)) n_nw,
    		count(*) n_total
    	from wpftsi_vectors v
    	left join wpftsi_words w
    		on w.id = v.wid
    	group by v.did
    ) dd_c
    	on dd_c.did = wd_c.id
    ) tt	
    where
    	tt.post_id = 795909
    Thread Starter Rafael Fischmann

    (@rfischmann)

    Sure, here you go: https://d.pr/f/uKAGgq

    Plugin Author Epsiloncool

    (@epsiloncool)

    Hi @rfischmann

    I believe we are as close as we can get to the cause of the problem. My initial MySQL query did not detect this situation; it is that the number of words in the document was correctly counted and stored, but in fact the words themselves were not included in the index vector lists.
    Below I am attaching an improved version of the query, could you run it and attach the result as before – it will identify all unindexed publications of the first type (like Spirit Awards and 795909).

    All I have to do is find out why exactly these publications were not processed properly. I’m already analyzing my code and trying to replicate the problem on my side.

    I’m wondering if there are any publications where only the titles were not indexed, but not the content, or vice versa. And also, is the problem related to the length of publications or the presence of special characters in the text (probably this can break queries if the encoding of the database tables does not match).

    Thank you!

    select * from
    (select 
    	p.ID post_id,
    	wi.id windex_id,
    	p.post_title,
    	p.post_type,
    	p.post_status,
    	wlog.status,
    	
    	wd_t.n wdt_n,
    	dd_t.n_total wdt_total,
    	dd_t.n_nw wdt_nw,
    	length(p.post_title) title_len,
    	(length(p.post_title) / wd_t.n) av_t,
    	
    	wd_c.n wdc_n,
    	dd_c.n_total wdc_total,
    	dd_c.n_nw wdc_nw,
    	length(p.post_content) content_len,
    	(length(p.post_content) / wd_c.n) av_c,
    	
    	(if(isnull(wi.id),128,0) +
    	if(wlog.status != 3,64,0) +
    	if((ifnull(dd_t.n_nw, 0) != ifnull(dd_t.n_total, 0)) or (ifnull(dd_t.n_total, 0) != ifnull(wd_t.n, 0)) or (ifnull(dd_c.n_nw, 0) != ifnull(dd_c.n_total, 0)) or (ifnull(dd_c.n_total, 0) != ifnull(wd_c.n, 0)), 32, 0) +
    	if(((length(p.post_title) > 0) and (wd_t.n = 0)) or ((length(p.post_content) > 0) and (wd_c.n = 0)), 16, 0) +
    	if(((isnull(wd_t.n) or (wd_t.n < 1)) and (isnull(wd_c.n) or (wd_c.n < 1)) and (p.post_type in ("post", "page"))), 8, 0)) warn_level
    	
    from wp_posts p
    left join wpftsi_index wi
    	on p.ID = wi.tid and wi.tsrc = "wp_posts"
    left join wpftsi_docs wd_t
    	on wd_t.token = "post_title" and wd_t.index_id = wi.id
    left join wpftsi_docs wd_c
    	on wd_c.token = "post_content" and wd_c.index_id = wi.id
    left join wpftsi_ilog wlog
    	on wlog.index_id = wi.id
    left join (
    	select
    		v.did,
    		sum(if(isnull(w.word),0,1)) n_nw,
    		count(*) n_total
    	from wpftsi_vectors v
    	left join wpftsi_words w
    		on w.id = v.wid
    	group by v.did
    ) dd_t
    	on dd_t.did = wd_t.id
    	
    left join (
    	select
    		v.did,
    		sum(if(isnull(w.word),0,1)) n_nw,
    		count(*) n_total
    	from wpftsi_vectors v
    	left join wpftsi_words w
    		on w.id = v.wid
    	group by v.did
    ) dd_c
    	on dd_c.did = wd_c.id
    ) tt	
    where
    	tt.warn_level > 0
    	and
    	(tt.post_type not in ("oembed_cache"))
    order by tt.warn_level desc, tt.post_type desc
    	
    limit 10000
    Thread Starter Rafael Fischmann

    (@rfischmann)

    I’m glad to be able to help you figure out what’s going on, @epsiloncool! I hope you’ll get there. ??

    Here’s the new CSV: https://d.pr/f/i9iGib

Viewing 15 replies - 16 through 30 (of 52 total)
  • The topic ‘Post missing, search results’ order’ is closed to new replies.