• Resolved sgrx

    (@sgrx)


    A WordPress post seems to get split by this plugin and those multiple records are then stored in Algolia. Our avg. record size is 2.65KB. I think 10 KB is the limit?

    Can this split size be reduced so that we store less records for each post in Algolia?

    Also, when we use the instantsearch display page using the suggested code and instantiate the widgets, the count there shown is for the number of records (including the split ones) and not the number of search results. Any suggestions on how to fix this?

    For example, the facet for post types shows Posts 9,505 (we have around 1600 posts). Clicking on it returns 1677 results found, which is accurate.. Those 1677 posts have 9505 records because they’re being split.

Viewing 7 replies - 1 through 7 (of 7 total)
  • Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Regarding the content splitting part, it looks like we handle that at https://github.com/WebDevStudios/wp-search-with-algolia/blob/2.8.1/includes/class-algolia-utils.php#L232-L261

    Most specifically with the ALGOLIA_CONTENT_MAX_SIZE constant. We default the record splitting max size to 2000 byte characters, and we implore to only reduce that value with the constant, instead of increasing otherwise your queue will break.

    That said, leaving at that amount, based on what I’m seeing, would help with keeping the record totals as minimal as possible. If you’re averaging 2.65KB then it makes sense that most of those would need 2 records instead of one, for the given posts. However, it’d be very easy to start getting longer post content in the future and forget that changes to the ALGOLIA_CONTENT_MAX_SIZE got changed and result in going over that 10kb

    Regarding the facet, hard to say offhand, as we’re using pretty basic settings for that menu instantsearch.js widget https://www.algolia.com/doc/api-reference/widgets/menu/js/

    That’s feeling more like behavior from Algolia themselves. I am seeing the same behavior on my local install as well, so you’re definitely not alone on that part.

    Based on this specific spot from the documentation, https://www.algolia.com/doc/api-reference/widgets/menu/js/#widget-param-attribute, it is saying that it’s meant to show records.

    That said, I’m checking on something with that and will try to circle back soon.

    Thread Starter sgrx

    (@sgrx)

    Thanks for the quick response and sorry about my late reply.

    With the average of 2.65 KB average record size we have, ~1600 posts requiring ~9000 records, making the average around 5.6 records per post. If the recommended size per record is within 10 KB, can’t I ALGOLIA_CONTENT_MAX_SIZE to 6000 which would reduce the number of records by a lot? The other things apart from the post content shouldn’t be enough to make it above 10 KB.

    We default the record splitting max size to 2000 byte characters, and we implore to only reduce that value with the constant, instead of increasing otherwise your queue will break.

    The queue would break if I set it to 6000 byte characters even if the total size is below 10 KB?

    Thanks for confirming the facet behaviour. Will hide it until I find a way to show the number of posts/search results instead of the number of record numbers.

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    The queue would break if I set it to 6000 byte characters even if the total size is below 10 KB?

    I have to assume so.

    For some context, our plugin is a fork of a plugin that Algolia themselves originally created, and there are a good number of parts that we haven’t changed from their original work. This includes this explode_content() function. Original from Algolia’s code base: https://github.com/algolia/algoliasearch-wordpress/blob/master/includes/class-algolia-utils.php#L162-L191

    So I have to believe if they coded it this way, it was intentional. That said, it could be worth trying increasing to see what happens. Perhaps, though, try it on a test/dev install and indexes so that you don’t “wreck” production usage.

    Thread Starter sgrx

    (@sgrx)

    I tried it on a dev install and it didn’t wreck, so I just pushed it live and so far it is OK. Will be monitoring it though.

    I changed $max_size from 2000 to 6000. I’m down to ~4k records from ~10k records with avg. record size of 5.3KB now compared to 2.6KB earlier.

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Rooting for you. Probably not something we’re going to change defaults for, but definitely good to know that this may be workable for certain user(s) situations and needs.

    golden_g73

    (@graham73may)

    I’m indexing PDF content and stumbled upon this thread (I’ve got really long PDFs and they’re being split into a really large number of parts).

    @sgrx how is the 6000 setting working out for you? do you have many other attributes being indexed contributing to your record size?


    @tw2113 Do you have any idea what the behaviour is if the content is chunked as such:

    Chunk 1:

    Algolia is a French proprietary search-as-a-service platform, with its headquarters in San

    Chunk 2:

    Francisco and offices in Paris and London. Its main product is a web search platform for individual websites.

    If a user comes along and searches for a string that spans chunks, e.g. “with its headquarters in San Francisco and offices in Paris” how does it behave with split chunks? Would it find two items as fairly low partial matches and then de-duplicate them to return one item?

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Based on my experiences, I believe it’d still return 1 result.

Viewing 7 replies - 1 through 7 (of 7 total)
  • You must be logged in to reply to this topic.