• Resolved RA_NPL

    (@ra_npl)


    Hello,

    I am running https://www.ra-plutte.de, which is a lawyer website with loads of legal posts. Recently I noticed that searching for court rulings / judgements based on their file number does not return proper results.

    Example: If you search for 6 U 14/19 on https://www.ra-plutte.de, no result is returned. However, it should return the blog post https://www.ra-plutte.de/olg-frankfurt-irrefuehrung-werbung-gekaufte-bewertungen/ which contains the string in the first sentence.

    I found a post in your blog covering bible verses but I am not sure if this works for my case because the suggested solution (search in quotation marks) did not work in my test meaning that a search for "6 U 14/19" in my blog also returns 0 results.

    Can you help me out on this?

    Best, Nick

    The page I need help with: [log in to see the link]

Viewing 15 replies - 1 through 15 (of 19 total)
  • Plugin Author Mikko Saari

    (@msaari)

    That’s something Relevanssi can’t really handle without some extra help. With the default settings, that query is unsearchable: all parts in it are under the three-letter minimum word length.

    My recommendation would be to add code that converts the file numbers to a format Relevanssi understands better. If you could convert that file number to eg. “6U1419”, that would be much better: it’s unique and easy to search.

    This would require a filter function that checks both post content and the search queries and when it sees a file number, it does the conversion automatically. Something like this:

    add_filter( 'relevanssi_remove_punctuation', 'rlv_convert_filenumbers' );
    add_filter( 'relevanssi_post_content', 'rlv_convert_filenumbers' );
    function rlv_convert_filenumbers( $a ) {
      $a = preg_replace_all( '#(\d)\s(\w)\s(\d+)/(\d+)#', '\1\2\3\4', $a );
      return $a;
    }

    That would handle a file number format like the one you mention (a digit, a space, a letter, a space, digits, slash, digits). I don’t know what the actual format is, so probably that needs so adjustment, but that’s the basic gist of it.

    Thread Starter RA_NPL

    (@ra_npl)

    Hi Mikko,

    thank you for your quick reply and the snippet. I understand that the plugin does not cover such odd search queries by default. However the reference number pattern is a little more complex. Here are the possible variations:

    1. The typical way of displaying a court ruling is something like LG K?ln, Beschluss vom 11.10.2021, 28 O 350/21) or LG K?ln, Beschluss vom 11.10.2021, Az. 28 O 350/21). These patterns contain three elements separated by ,.

    2. The first element is an abbreviation of the court type, specifically AG, LG, OLG, BGH, EuGH or BVerfG. There are more court types. However these are not relevant in our field. This abbreviation is followed by a city name starting with a capital letter and then random chars including the german mutated vowels ?, ? and ü. Usually it is one word like K?ln or Berlin but it could be two words in some cases, i.e. Bad Homburg.

    3. The second element is the type of ruling with its date. The two options are Beschluss or Urteil followed by the word vom and closing with the date of the ruling.

    4. Eventually, the ruling closes with Az. (often left away by lawyers when searching) and the actual reference number. These reference numbers have different variations.

    a. Let’s look at my example 28 O 350/21. The first number (28) can have one or two digits always followed – to my knowledge – by a space and then a single capital letter again followed by a space.

    b. The last part of the number (in my example 350/21) can have one, two, three or four digits before the / and always two digits behind (as it represents the year of the ruling).

    I know this is a lot to ask, but could you update your snippet according to these requirements? This would be of great help, so people can finally find court rulings with ease on my website.

    Also, where would I have to paste the snippet? In my functions.php file or somewhere else?

    Best, Nick

    Plugin Author Mikko Saari

    (@msaari)

    I think the critical part here is the reference number. The other parts of the ruling are already searchable (or not significant).

    add_filter( 'relevanssi_remove_punctuation', 'rlv_convert_filenumbers' );
    add_filter( 'relevanssi_post_content', 'rlv_convert_filenumbers' );
    function rlv_convert_filenumbers( $a ) {
      $a = preg_replace_all( '#(\d+)\s(\w)\s(\d+)/(\d\d)#', '\1\2\3\4', $a );
      return $a;
    }

    This is now

    – 1 or more digits
    – single letter
    – 1 or more digits
    – slash
    – 2 digits

    So this should cover the reference numbers. Try adding this to your theme functions.php. Rebuild the index. Searching for the reference numbers should then work better. You can confirm whether this works or not by checking one of your posts with the Relevanssi debugger (Settings > Relevanssi > Debugging). If there are reference numbers in the post content, they should appear there in the format “28O35021”.

    Thread Starter RA_NPL

    (@ra_npl)

    Thanks a lot for your updated snippet. I included it in my functions.php, uploaded it via ftp and tried to rebuilt the search index in my wp backend. However this did not work. The rebuilt process got stuck no matter what I tried (cache purged, page refresh). I think the index was not or not fully built, because when I performed searches afterwards, I got critical errors. That is why I removed the snippet from my functions.php for now until we figured out the issue.

    Below please find the complete code of my functions.php with your snippet included. I am not a developer but does it matter if there is a closing ; in the last row of your snippet or not? I am asking because the other entries contain such a closing.

    <?php
    
    add_filter( 'xmlrpc_enabled', '__return_false' );
    
    function raplutte_enq_stuff() {
    	wp_deregister_script('dorayaki-custom');
        wp_enqueue_style('fontawesome', get_stylesheet_directory_uri() . '/fonts/font-awesome.min.css');
        wp_dequeue_style('dorayaki-fonts');
        wp_enqueue_script('scripts', get_stylesheet_directory_uri() . '/js/scripts.js', ['jquery'], false, true);
    }
    
    add_action('wp_enqueue_scripts', 'raplutte_enq_stuff', 20, 1);
    
    // We also want to move around and remove default stuff.
    add_action('wp_enqueue_scripts', function() {
    
    	wp_deregister_style('wp-block-library');
    
    }, 200, 1);
    
    add_filter('jetpack_implode_frontend_css', 'return__true');
    
    // Here we filter the style tags.
    add_filter('style_loader_tag', function($html, $handle, $href, $media) {
    
    	$styles_to_modify = ['rpt', 'wordfenceAJAXcss', 'widgets-on-pages', 'fontawesome', 'borlabs-cookie', 'jetpack-top-posts-widget'];
    
    	if(!in_array($handle, $styles_to_modify)) {
    		return $html;
    	}
    
    	// Now we build our own link text.
    	$html = "<link rel='stylesheet' id='" . $handle . "-css'  href='" . $href . "' type='text/css' media='none' onload='this.media=\"" . $media . "\"' />\n";
    	return $html;
    
    }, 10, 4);
    
    // Here we filter the script tags.
    add_filter('script_loader_tag', function($tag, $handle, $src) {
    
    	$scripts_to_modify = ['fitvids'];
    
    	if(!in_array($handle, $scripts_to_modify)) {
    		return $tag;
    	}
    
    	// Now we build our own link text.
    	$tag = '<script type="text/javascript" href="' . $src . '" id="' . $handle . '-js" defer></script>' . "\n";
    	return $tag;
    
    }, 10, 4);
    
    // We also want to preload fonts.
    add_action('wp_head', function() {
    
    	echo '<link rel="preload" href="' . get_stylesheet_directory_uri() . '/fonts/open-sans-v15-latin-ext_latin-regular.woff2' . '" as="font" crossorigin="anonymous" />';
    	echo '<link rel="preload" href="' . get_stylesheet_directory_uri() . '/fonts/open-sans-v15-latin-ext_latin-600.woff2' . '" as="font" crossorigin="anonymous" />';
    	echo '<link rel="preload" href="' . get_template_directory_uri() . '/font/genericons-regular-webfont.woff' . '" as="font" crossorigin="anonymous" />';
    
    });
    
    function quickcheck() {
        ob_start(); ?>
    <script>
    	function seller_add() {
    		var newExclude = jQuery('<div><input type="text" name="exclude_seller_id[]" /><a href="javascript:void(0);">entfernen</a></div>');
    		jQuery('a', newExclude).click(function() {
    			newExclude.remove();
    			return false;
    		});
    
    		jQuery("#exclude_sellers").append(newExclude);
    		return false;
    	}
    </script>
    <div class="gform_wrapper">
    	<form action="https://www.anticopy.de/bilderklau-finder/" method="get" class="ebay-check gform_body" target="anticopy.de">
    		<input type="hidden" name="submit" value="Absenden">
    		<div class="ebay-name">
    			<p>Geben Sie hier Ihren eBay-Namen (Account) ein. Das Tool ermittelt sofort, wer für welche Auktionen Ihr Bildmaterial nutzt:</p>
    			<p>
    				<input name="ebay_name" id="ebay_name" type="text" value="">
    			</p>
    			<p>
    				<input type="submit" value="Absenden" class="button">
    			</p>
    		</div>
    		<div class="ebay-exclude" id="exclude_sellers">
    			<p>Sofern Sie mehrere ebay-Accounts haben oder aus anderen Gründen bestimmte ebay-Accounts vom Suchlauf ausschlie?en wollen, geben Sie hier die entsprechenden ebay-Namen ein:</p>
    			<p>
    				<input type="text" id="exclude_seller_id1" name="exclude_seller_id[]" value="">
    			</p>
    			<p>
    				<input type="submit" value="Absenden" class="button">
    			</p>
    		</div>
    		<p>
    			<a id="#add_exclude_seller_id" href="javascript:void(0);" onclick="seller_add();">Weiteren eBay-Namen ausschliessen</a>
    		</p>
    	</form>
    </div>
    <?php
        $out = ob_get_clean();
        return $out;
    }
    
    add_shortcode('quickcheck', 'quickcheck');
    
    // function change_to_http($content) {
    // 	$content = str_replace('data-url="https', 'data-url="http', $content);
    // 	return $content;
    // }
    // add_filter('the_content', 'change_to_http');
    
    add_filter('amp_post_template_file', 'xyz_amp_set_custom_template', 10, 3);
    
    function xyz_amp_set_custom_template($file, $type, $post) {
        if ('footer' === $type) {
            $file = dirname(__FILE__) . '/templates/footer.php';
        }
        return $file;
    }
    
    add_filter('gform_confirmation_anchor', create_function('', 'return false;'));
    
    add_image_size('tablet_size', 768, 0, false);
    add_image_size('mobile_size', 400, 0, false);
    add_image_size('content-wide', 1180, 0, false);
    add_image_size('content-mobile', 315, 0, false);
    add_image_size('single-post', 800, 0, false);
    
    add_filter('image_size_names_choose', function($sizes) {
        return array_merge($sizes, [
            'content-wide' => __('Content Wide (1180px)'),
        ]);
    });
    
    function wpb_remove_commentsip($comment_author_ip) {
        return '';
    }
    add_filter('pre_comment_user_ip', 'wpb_remove_commentsip');
    
    // Change thumbsize of top posts from jetpack.
    add_filter('jetpack_top_posts_widget_image_options', function($array) {
    	$array['avatar_size'] = false;
    	$array['width'] = 60;
    	$array['height'] = 31;
    	return $array;
    });
    
    // Custom Fix for Relevanssi plugin to detect court rulings in searches
    add_filter( 'relevanssi_remove_punctuation', 'rlv_convert_filenumbers' );
    add_filter( 'relevanssi_post_content', 'rlv_convert_filenumbers' );
    function rlv_convert_filenumbers( $a ) {
      $a = preg_replace_all( '#(\d+)\s(\w)\s(\d+)/(\d\d)#', '\1\2\3\4', $a );
      return $a;
    }
    
    // Add our "top-themen" menu.
    register_nav_menu('top-bar', 'Hei?e Themen');
    
    if (! function_exists('dorayaki_comment')) :
        /*-----------------------------------------------------------------------------------*/
        /* Comments template dorayaki_comment
        /*-----------------------------------------------------------------------------------*/
        function dorayaki_comment($comment, $args, $depth)
        {
            $GLOBALS['comment'] = $comment;
            switch ($comment->comment_type) :
                case 'comment':
            ?>
    
    		<li <?php comment_class(); ?> id="li-comment-<?php comment_ID(); ?>">
    			<article id="comment-<?php comment_ID(); ?>" class="comment">
    
    				<div class="comment-avatar">
    					<?php echo get_avatar($comment, 45); ?>
    				</div>
    
    	<div class="comment-content">
    					<ul class="comment-meta">
    						<li class="comment-author"><?php printf(__(' %s ', 'dorayaki'), sprintf(' %s ', get_comment_author_link())); ?></li>
    						<li class="comment-time"><a href="<?php echo esc_url(get_comment_link($comment->comment_ID)); ?>">
    						<?php
                                /* translators: 1: date */
                                printf(
                                    __('%1$s', 'dorayaki'),
                                    get_comment_date('d.m.y')
                                ); ?></a></li>
    						<li class="comment-edit"><?php edit_comment_link(__('Edit', 'dorayaki')); ?></li>
    					</ul>
    						<div class="comment-text">
    							<?php comment_text(); ?>
    							<?php if ($comment->comment_approved == '0') : ?>
    								<p class="comment-awaiting-moderation"><?php _e('Your comment is awaiting moderation.', 'dorayaki'); ?></p>
    							<?php endif; ?>
    							<p class="comment-reply"><?php comment_reply_link(array_merge($args, array( 'reply_text' => __('Reply', 'dorayaki'), 'depth' => $depth, 'max_depth' => $args['max_depth'] ))); ?></p>
    						</div><!-- end .comment-text -->
    
    				</div><!-- end .comment-content -->
    
    			</article><!-- end .comment -->
    
    		<?php
                    break;
            case 'pingback':
                case 'trackback':
            ?>
    		<li class="pingback">
    			<p><?php _e('<span>Pingback:</span>', 'dorayaki'); ?> <?php comment_author_link(); ?></p>
    			<p class="pingback-edit"><?php edit_comment_link(__('Edit &rarr;', 'dorayaki'), ' '); ?></p>
    		<?php
                    break;
            endswitch;
        }
        endif;
    
    • This reply was modified 3 years, 5 months ago by RA_NPL.
    • This reply was modified 3 years, 5 months ago by RA_NPL.
    Plugin Author Mikko Saari

    (@msaari)

    No, that’s not the problem (there shouldn’t be a semi-colon after a curly brace); the problem is using a non-existing function.

    The correct code is

    // Custom Fix for Relevanssi plugin to detect court rulings in searches
    add_filter( 'relevanssi_remove_punctuation', 'rlv_convert_filenumbers' );
    add_filter( 'relevanssi_post_content', 'rlv_convert_filenumbers' );
    function rlv_convert_filenumbers( $a ) {
      $a = preg_replace( '#(\d+)\s(\w)\s(\d+)/(\d\d)#', '\1\2\3\4', $a );
      return $a;
    }
    Thread Starter RA_NPL

    (@ra_npl)

    Hi Mikko,

    many thanks for your snippet. I implemented it according to your explanation and rebuilt the index. In the “Debugging” tab, the reference numbers are now displayed without spaces in between, so for example when debugging https://www.ra-plutte.de/schadensersatz-wegen-weiterleitung-von-ungeschwaerztem-urteil/, the file reference number 5 O 84/21 is now displayed as 5o8421 in the “Content” section of the “Debugging” tab. If I understand your approach right, this is correct behaviour because now after removing spaces, the minimum number of chars is high enough to be detected in user searches.

    – However, if I search for 5 O 84/21 now via the regular search function in the frontend of https://www.ra-plutte.de, I get 0 results (of course I cleared the browser cache). The search url created is https://www.ra-plutte.de/?s=5+O+84%2F21&submit=Suche.

    – If I search for "5 O 84/21", I also get 0 results. The search url created is https://www.ra-plutte.de/?s=%225+O+84%2F21%22&submit=Suche

    Am I missing something?

    • This reply was modified 3 years, 5 months ago by RA_NPL.
    • This reply was modified 3 years, 5 months ago by RA_NPL.
    Plugin Author Mikko Saari

    (@msaari)

    No, I’m the one missing something. The way the code currently runs, the default Relevanssi punctuation removal removes the slashes first before this function is applied. The fix is simple: just run this function slightly earlier.

    // Custom Fix for Relevanssi plugin to detect court rulings in searches
    add_filter( 'relevanssi_remove_punctuation', 'rlv_convert_filenumbers', 9 );
    add_filter( 'relevanssi_post_content', 'rlv_convert_filenumbers' );
    function rlv_convert_filenumbers( $a ) {
      $a = preg_replace( '#(\d+)\s(\w)\s(\d+)/(\d\d)#', '\1\2\3\4', $a );
      return $a;
    }

    This little change should fix this.

    Thread Starter RA_NPL

    (@ra_npl)

    Hi Mikko, now it works like a charm. Very impressive, thanks for your outstanding support. This customization improves the search quality in my blog quite significantly.

    Plugin Author Mikko Saari

    (@msaari)

    Thread Starter RA_NPL

    (@ra_npl)

    Done ??

    Thread Starter RA_NPL

    (@ra_npl)

    Hi Mikko,

    I hope you had a nice start into 2022. Today I searched for the file reference number I ZR 20/21 on https://www.ra-plutte.de. 0 results were returned even though this post have been returned: https://www.ra-plutte.de/markenverletzung-auskunft-schadensersatz-uebersicht/. I guess the regex string needs a change, right? If you find time, I would appreciate an update a lot.

    Best, Nick

    Plugin Author Mikko Saari

    (@msaari)

    The format you specified was

    – 1 or more digits
    – single letter
    – 1 or more digits
    – slash
    – 2 digits

    “I ZR 20/21” does not fit the format. So what exactly is the correct format? Apparently, the first digit can also be a letter, but can it be just “I”, or any other letter? That single letter can be 1 or more letters?

    Please think through the pattern, and I can then make changes once you’re sure what the correct pattern should be.

    Thread Starter RA_NPL

    (@ra_npl)

    Hi Mikko,

    you are right. I was not aware that the court reference number pattern is more complex than initially expected – actually it is wildly more complex: https://de.wikipedia.org/wiki/Aktenzeichen_(Deutschland).

    However regarding my specific field (civil law), it should be sufficient to include the following formats:

    1. variant

    – 1, 2 or 3 digits
    – space
    – single letter
    – space
    – 1, 2, 3, 4 or 5 digits
    – slash
    – 2 digits

    Examples: 5 C 84/21, 15 W 84/21, 5 O 184/20, 5 K 1996/19, 15 S 11384/18

    2. variant

    – 1, 2 or 3 digits
    – minus or slash
    – 1, 2 or 3 digits
    – space
    – single letter
    – space
    – 1, 2, 3, 4 or 5 digits
    – slash
    – 2 digits

    Examples: 3-06 O 24/21

    3. variant

    – 1 letter
    – minus
    – 1, 2, 3 or 4 digits
    – slash
    – 2 digits

    Example: C-236/08

    4. variant

    – 1 or 2 letters
    – space
    – 2 or 3 letters
    – space
    – 1, 2, 3 or 4 digits
    – slash
    – 2 digits

    Examples: I ZR 20/21, XI ZR 26/15, VI ZR 488/19

    5. variant

    – 3 letters
    – space
    – 1, 2 or 3 digits
    – slash
    – 2 digits

    Example: GSZ 1/20

    PS. Even though the correct format always uses capital letters, the user might not use capital letters in his search. This is why I think we can’t go specifically for capital letter matches in the formats.

    Plugin Author Mikko Saari

    (@msaari)

    Quite a bit of work here, specific to your needs, to do for free. I’ll see when I’ll have time to look at this. If you need this done faster, you can show this thread to any skilled PHP developer, and they’ll be able to create the required regular expressions for you and it shouldn’t be terribly expensive.

    PS. Relevanssi is case-insensitive anyway, so it doesn’t matter.

    • This reply was modified 3 years, 2 months ago by Mikko Saari.
    Thread Starter RA_NPL

    (@ra_npl)

    Hi Mikko, sure I don’t want to bother you. You have been great help. Just if you find time: I created the following regex, which matched all reference numbers in my examples (see https://regex101.com/r/TIcybY/1):

    ((\d{1,3}\-)?(\d{1,3})\s(\w)\s(\d{1,5})\/(\d\d))|(\w\-\d{1,4}\/\d\d)|(\w{1,2}\s\w{2,3}\s\d{1,4}\/\d\d)|(\w\w\w\s\d{1,3}\/\d\d)

    To avoid mistakes, would changing your snippet

    $a = preg_replace( '#(\d+)\s(\w)\s(\d+)/(\d\d)#', '\1\2\3\4', $a );

    to

    $a = preg_replace( '#((\d{1,3}\-)?(\d{1,3})\s(\w)\s(\d{1,5})\/(\d\d))|(\w\-\d{1,4}\/\d\d)|(\w{1,2}\s\w{2,3}\s\d{1,4}\/\d\d)|(\w\w\w\s\d{1,3}\/\d\d)#', '\1\2\3\4', $a );

    be correct? I am asking because of the opening and closing # which was not necessary in my regex tests to match the criteria and the ‘\1\2\3\4’ behind (I assume both don’t need change). Also, I had to escape the lone / in my tests to get matches.

    Again many thanks for your great support. If this is too much efford for my specific case, I would understand.

    Best, Nick

    • This reply was modified 3 years, 2 months ago by RA_NPL.
Viewing 15 replies - 1 through 15 (of 19 total)
  • The topic ‘Search for exact multi-part strings’ is closed to new replies.