• Resolved Paul

    (@pawelszroeder)


    Hello, how are media files like .pdf indexed in Google? I have some files attached to my private page. In the YOAST settings option “Enable media pages” is disabled. Despite this, all my pdf files are indexed by Google.

Viewing 4 replies - 1 through 4 (of 4 total)
  • Plugin Support devnihil

    (@devnihil)

    @pawelszroeder Regarding your concern, ?the plugin does not contain a feature to set a?.pdf?on your site to?noindex.?The settings for media that you are referring to is in reference to the attachment pages that WordPress creates when you upload a media file.?

    If you need to set .pdf files to noindex on your site, we would recommend exploring using the X-Robots tag to accomplish this. For example, I’ve used the following code in my own .htaccess file to do this:

    ??<FilesMatch "\.pdf$"> header set x-robots-tag: noindex </FilesMatch>

    ?As .htaccess is a very sensitive file and any errors can result in breaking a site, we would recommend consulting with your hosting provider on the best way to implement this if you aren’t comfortable working with .htaccess, or alternately testing on a staging site so as not to run the risk of breaking a live site.

    ??You can also find more information on using the X-Robots tag here:?https://yoast.com/x-robots-tag-play/

    Thread Starter Paul

    (@pawelszroeder)

    Thank you, do you know how google is crawling pdf / any other docs from media uploads? Will it index always my all pdf files if they are added only in private page? I don’t understand how it works in WP with default options.

    So if I upload pdf to media, will this url be visible in search engine even if I don’t add it to page?

    • This reply was modified 2 years, 1 month ago by Paul.
    Plugin Support Maybellyne

    (@maybellyne)

    Hello Paul,

    According to Google, they can index textual content (written in any language) from PDF files that use various character encodings, provided they’re not password protected or encrypted. We may process the images with OCR algorithms to extract the text if the text is embedded as images. The general rule of thumb is that if you can copy and paste the text from a PDF document into a standard text document, we should be able to index that text. You can read more on the subject by checking the resources below:

    Plugin Support Maybellyne

    (@maybellyne)

    This thread was marked resolved due to a lack of activity, but you’re always welcome to re-open the topic. Please read this post before opening a new request.

Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘indexing pdf files’ is closed to new replies.