• Resolved Renners

    (@renners)


    How do I prevent searches being carried out on Algolia for non-coherent terms such as “cfi</a><li><a class=page-numbers href=”

    It’s costing me a fortune as I have a commercial subscription with Algolia and searches like this are about 60% of the searches. I suspect there is an issue with the theme however no errors are thrown so really difficult to stop the cause and thought I would explore an alternative solution.

    I asked AI and they suggested I use the following code:

    const whitelist = /^[a-zA-Z0-9\s]+$/;
    
    const search = instantsearch({
      ...
      searchFunction(helper) {
        helper.setQueryHook((query, search) => {
          if (!whitelist.test(query)) {
            return 'Invalid search terms';
          }
          return search(query);
        });
        helper.search();
      }
    });
    

    I’m not great with javascript so don’t know if this would work or where to put it.

    Thanks,

    Renners.

    The page I need help with: [log in to see the link]

Viewing 11 replies - 1 through 11 (of 11 total)
  • Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Not the end-all be-all solution but this page offers some ideas https://support.algolia.com/hc/en-us/articles/4406981900433-Monitoring-Search-Operations

    Depending on the results that you’re seeing in your Dashboard, this may be re-assuring as well https://support.algolia.com/hc/en-us/articles/9577399780497-Am-I-billed-for-user-queries-that-return-404-errors-

    Checking on other things in the meantime, as I don’t have a great answer just yet for the javascript example you have above, but adding this would require some template customization with the plugin, I know that much.

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Since I have seen similar issues with some client work, I re-responded to an original thread I made over at https://discourse.algolia.com/t/spam-queries-prevention/15111/3

    Providing it here in case you want to follow along with that, I’ll cross post back to here regardless.

    Thread Starter Renners

    (@renners)

    Thank you for your reply, I continued to press AI on how to do it and it said this:

    This code should be placed in the JavaScript file(s) that are responsible for initializing and configuring the instantsearch.js library on your WordPress site.

    It is common to have a separate JavaScript file for instantsearch.js configuration, where you would place the above code snippet. You can then include this JavaScript file in your WordPress theme or plugin using the wp_enqueue_script() function, which is used to load JavaScript files on the frontend of a WordPress site.

    You could also include the code directly in the template file where you are using instantsearch.js, but it’s generally recommended to keep your JavaScript code in separate files for better maintainability and organization.

    It’s also important to note that, you need to make sure that the instantsearch.js library is properly loaded and configured before the above code snippet is executed, otherwise it would throw an error.

    and:

    If you are using the ‘WP Search with Algolia’ plugin, you will need to place the code snippet in a custom JavaScript file that you create and include it to the plugin’s settings.

    You can create a new JavaScript file, for example algolia-search.js, and place the code snippet there.

    Then you can include this file to the plugin’s settings, you should look for the option to include custom javascript files in the plugin’s settings page or documentation.

    It’s important to note that, you need to make sure that the instantsearch.js library is properly loaded and configured by the plugin before the above code snippet is executed, otherwise it would throw an error.

    You should also make sure that the code is only loaded on the search page, and not on every page of your website.

    —-

    I got the distinct feeling the AI was generalising for wordpress plugins and the last comment worried me as it implies there will be issues.

    Regards,

    Renners

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    I’m not going to claim the AI being completely wrong, but it definitely is generalized.

    We do have template customization, and you can see how to get that started over at https://github.com/WebDevStudios/wp-search-with-algolia/wiki/Customize-Templates

    You can see both templates, one for autocomplete, and one for instantsearch, at this spot in our GitHub Repo and in the same spot in your downloaded copy of the plugin:

    https://github.com/WebDevStudios/wp-search-with-algolia/tree/main/templates

    Based on your already handled searching, pun intended, for a possible solution, that whitelisting code would need to be added around this section of the instantsearch.php file:

    https://github.com/WebDevStudios/wp-search-with-algolia/blob/main/templates/instantsearch.php#L79-L102

    I would love to see how well it does so that we could consider potentially adding it to the default template ourselves, but we would need to do some extensive testing with hopefully live data first. Definitely not ruling it out though as I know I’ve seen similar spam logs in the past.

    Thread Starter Renners

    (@renners)

    Well, I it added into the code, in the customised template which I am already using thus:

    				/* Instantiate instantsearch.js */
    				var search = instantsearch({
    					indexName: algolia.indices.searchable_posts.name,
    					searchClient: algoliasearch( algolia.application_id, algolia.search_api_key ),
    					routing: {
    						router: instantsearch.routers.history({ writeDelay: 1000 }),
    						stateMapping: {
    							stateToRoute( indexUiState ) {
    								return {
    									s: indexUiState[ algolia.indices.searchable_posts.name ].query,
    									page: indexUiState[ algolia.indices.searchable_posts.name ].page
    								}
    							},
    							routeToState( routeState ) {
    								const indexUiState = {};
    								indexUiState[ algolia.indices.searchable_posts.name ] = {
    									query: routeState.s,
    									page: routeState.page
    								};
    								return indexUiState;
    							}
    						}
    					}
    				});
    /* Added sanitisation code */				
    				const whitelist = /^(?:(?!<[^>]*>).)*$/;
    
    				search.addWidgets([
    
    					/* Search box widget */
    					instantsearch.widgets.searchBox({
        					container: '#algolia-search-box',
        					placeholder: 'Search constrained...',
        					queryHook: function(query, search) {
            		if (!whitelist.test(query)) {
                	return 'Invalid search terms';
            		}
    							return search(query);
    						},
        			showReset: false,
        			showSubmit: false,
        			showLoadingIndicator: false,
      				}),
    /* end of Added sanitisation code */
    /*				search.addWidgets([
    
    					/* Search box widget */
    /*					instantsearch.widgets.searchBox({
    						container: '#algolia-search-box',
    						placeholder: 'Search for...',
    						showReset: false,
    						showSubmit: false,
    						showLoadingIndicator: false,
    					}),
    */

    Hasn’t made a blind bit of difference either way. It still happily accepts additional characters, you see in the example code above I even tried a different regex to exclude html content only, to no effect.

    Regards,

    Renners.

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Any javascript errors showing up in your console with it?

    Is it presently implemented over at https://www.anniversaryideas.co.uk/gifts/anniversary-gifts/ ?

    Thread Starter Renners

    (@renners)

    Thank you for your reply and assistance. I’ve got it on a staging server at the moment so not on our live site. I’ve got no errors on the console although I appear to have different results between staging and live currently even though I copied it over at the beginning of this. I need to understand this first before proceeding further.

    Regards,

    Renners.

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Let us know when you need some extra assistance again, we’ll help out as best we can.

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Algolia’s dev advocates replied over at https://discourse.algolia.com/t/spam-queries-prevention/15111/4 and appear to agree that the approach above is generally right, though not with the denying the search request itself. Just with stripping out any of the undesired markup and just whittle it down as best possible to plain text.

    Thread Starter Renners

    (@renners)

    I finally got our staging server to function properly and have tested the changes. It works OK with a few caveats 1. No error is produced when illicit characters are entered. 2. It didn’t solve our problem!

    I’ve left it in and pushed it live any way as it now stops any nonalphanumeric search terms someone enters hitting algolia (and us being charged for it.)

    The Issue we have, which became apparent once we dug further into it, is that we are getting hit by searchbots hitting the search page with terms such as domain.com/?s=”cfi</a><li><a class=page-numbers href=” this is not parsed by instantsearch.js. We are looking into a solution based on their useragent/url combo to prevent them from hitting the search page to stop this.

    Regards,

    Renners.

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Thanks for the followup so far. I definitely want to figure out what could be done here as a whole, so that if anything WebDevStudios can knowingly advise and point to accurate resources that can be used.

Viewing 11 replies - 1 through 11 (of 11 total)
  • The topic ‘Sanitise search input on instant search’ is closed to new replies.