• Hey everyone, I’ve really been struggling to figure this out on my own. So far without succes. So I thought, why not give it a shot and ask for a solution here on the WordPress forum.

    I have installed a plugin called OCR and it works like a charm. I have Tesseract and ImageMagick installed on my server. So the plugin works as it was developed.

    Now what i’m trying to achieve is, instead that the extracted OCR text is placed in a custom field, i would to have it in the image alt or better said the image alt of the thumbnail. I’m not very skilled when it comes to coding, so i hope someone can help me out here.

    Below is the code of the plugin from the developer:

    class OCR {

    function __construct(){
    if ( function_exists(‘register_uninstall_hook’) ){
    register_uninstall_hook( __FILE__, array( $this, ‘Uninstall’ ) );
    }

    add_action( ‘add_attachment’, array( $this, ‘AnalyzeImage’ ) );
    add_action( ‘admin_menu’, array( $this, ‘SubMenuItem’ ) );

    add_filter( ‘attachment_fields_to_edit’, array( $this, ‘EditOCRText’ ), 10, 2);
    add_filter( ‘attachment_fields_to_save’, array( $this, ‘SaveOCRText’ ), 10, 2);

    add_option( ‘ocr_resize_percent’, 200 ); //set the default value for the resize percent
    }

    function AnalyzeImage($image_id){
    $upload_dir = wp_upload_dir();
    $upload_dir = $upload_dir[‘basedir’];
    $image_path = $upload_dir.’/’.get_post_meta($image_id, ‘_wp_attached_file’, true);
    if(getimagesize($image_path)){ //only go through the steps for OCR if the file is an image
    $imagemagick = get_option(‘ocr_imagemagick_path’);
    $tesseract = get_option(‘ocr_tesseract_path’);
    $size_percent = get_option(‘ocr_resize_percent’);
    if($imagemagick && $tesseract && $size_percent){ //only analyze the image if the plugin configuration has been filled in
    $temp_image = $upload_dir.’/ocr_image.tif’; //tesseract requires a tiff
    $temp_text = $upload_dir.’/ocr_text’;
    $command = $imagemagick.’ -resize ‘.$size_percent.’% ‘.$image_path.’ ‘.$temp_image.’ && ‘.$tesseract.’ ‘.$temp_image.’ ‘.$temp_text.’ && cat ‘.$temp_text.’.txt && rm -f ‘.$temp_text.’.txt ‘.$temp_image;
    $ocr_text = shell_exec($command);
    add_post_meta( $image_id, ‘ocr_text’, $ocr_text, true );
    }
    }
    }

    function SubMenuItem(){
    add_submenu_page( ‘plugins.php’, ‘OCR Configuration’, ‘OCR’, ‘administrator’, __FILE__, array( $this, ‘SettingsPage’ ) );
    add_action( ‘admin_init’, array( $this, ‘RegisterSettings’ ) );
    }

    function RegisterSettings() {
    register_setting( ‘ocr-settings-group’, ‘ocr_imagemagick_path’ );
    register_setting( ‘ocr-settings-group’, ‘ocr_tesseract_path’ );
    register_setting( ‘ocr-settings-group’, ‘ocr_resize_percent’ );
    }

    function SettingsPage(){
    ?>
    <div class=”wrap”>
    <h2>OCR Settings</h2>
    <p>
    The OCR plugin requires PHP5 and two command line utilities: ImageMagick for preparing the images and Tesseract for the actual OCR.
    These utilities must be manually installed on your server and executable by PHP. This process, and consequently this plugin, is recommended only for advanced users.
    </p>
    <form method=”post” action=”options.php”>
    <?php settings_fields( ‘ocr-settings-group’ ); ?>
    <table class=”form-table”>
    <tr valign=”top”>
    <th scope=”row”>Absolute Path to ImageMagick’s convert<br><i style=”font-size:10px;”>(ex: /opt/local/bin/convert)</i></th>
    <td><input type=”text” name=”ocr_imagemagick_path” value=”<?php echo get_option(‘ocr_imagemagick_path’); ?>” /></td>
    </tr>
    <tr valign=”top”>
    <th scope=”row”>Absolute Path to Tesseract<br><i style=”font-size:10px;”>(ex: /opt/local/bin/tesseract)</i></th>
    <td><input type=”text” name=”ocr_tesseract_path” value=”<?php echo get_option(‘ocr_tesseract_path’); ?>” /></td>
    </tr>
    <tr valign=”top”>
    <th scope=”row”>Resize percentage<br><i style=”font-size:10px;”>A higher % might lead to more accurate OCR but will take longer to calculate. Default = 200%</i></th>
    <td><input type=”text” name=”ocr_resize_percent” value=”<?php echo get_option(‘ocr_resize_percent’); ?>” />%</td>
    </tr>
    </table>
    <p class=”submit”>
    <input type=”submit” class=”button-primary” value=”<?php _e(‘Save Changes’) ?>” />
    </p>
    </form>
    </div>
    <?php
    }

    function EditOCRText( $form_fields, $post ){
    if ( substr($post->post_mime_type, 0, 5) == ‘image’ ) {
    $ocr_text = get_post_meta($post->ID, ‘ocr_text’, true);
    if ( empty($ocr_text) )
    $ocr_text = ”;

    $form_fields[‘ocr_text’] = array(
    ‘value’ => $ocr_text,
    ‘label’ => __(‘OCR Text’),
    ‘helps’ => __(‘Text automatically pulled from the image via the OCR plugin.’),
    ‘input’ => ‘textarea’
    );
    }
    return $form_fields;
    }

    function SaveOCRText($post, $attachment){
    if ( isset($attachment[‘ocr_text’]) && !empty($attachment[‘ocr_text’]) ) {
    update_post_meta($post[‘ID’], ‘ocr_text’, $attachment[‘ocr_text’]);
    }
    return $post;
    }

    function Uninstall(){
    delete_option( ‘ocr_imagemagick_path’ );
    delete_option( ‘ocr_tesseract_path’ );
    delete_option( ‘ocr_resize_percent’ );
    }
    }

    if(!$ocr_plugin){ $ocr_plugin = new OCR(); }

Viewing 1 replies (of 1 total)
  • Hi there,

    I found where the OCR text is being inserted into the database, but there’s no filter or action hook that will let you modify this action directly.

    What you could do is to write a WP Cron job to periodically — how often depends on how quickly you need the info and the volume of files you’re handling — to turn the inserted text into the alt text.

    Basically, the cron would execute a SQL query to find all the records in the wp_postmeta table where meta_key = ‘ocr_text’ and change the meta_key to ‘_wp_attachment_image_alt’, which is the name of the field the alt text is stored in. Some things you might want to have safety checks for are whether alt text is already set (in which case you’d end up with multiple alt text values for an image, which will create problems) and a check that the text isn’t too long for the field type.

    If you could get the developer to add an action hook to the AnalyzeImage($image_id) function, right after: the line

    add_post_meta( $image_id, ‘ocr_text’, $ocr_text, true );

    you wouldn’t need to do this as a cron. You could write a function to change the meta_key value and add it to execute immediately after the ocr_text is inserted into the database.

    This is a broad outline rather than a step-by-step how-to, but if you google WP Cron and WP Query, you’ll find additional details.

Viewing 1 replies (of 1 total)
  • The topic ‘How To Set Extracted OCR Text To Image Thumbnail Alt Instead Of Custom Field…’ is closed to new replies.