• nwevhajkan

    (@nwevhajkan)


    Hello,

    Two questions about the pages/URLs that WordPress auto-generates, for example image files (/image-gallery/ or /training-banner-image-3/), blog categories (/category/training/), icons used in the site (/trainer/icon/), etc.

    1. Where are all these pages/URLs in my Dashboard? They’re definitely not in Pages/All Pages.
    2. Can I have Google Search Console stop crawling them by adding noindex to them? (I realize this is sorta an SEO question, but it’s basic so please forgive me!)

    Basically, I don’t want these types of pages to show up in my GSC under indexed pages (I don’t want them indexed). Main question, how do I alter their HTML (where are they in WP)?

    Thank you!

    ~Don Gorr

Viewing 3 replies - 1 through 3 (of 3 total)
  • clayp

    (@clayp)

    1. Your media files (/image-gallery/) are uploaded to the /wp-content/uploads/folder. (in your control panel)
    2. If you have access to the server config files, I suggest you also add X-Robots-Tag, so the images wouldn’t be indexed even if hot-linked, e.g.

    Stop crawling images using the WordPress dashboard.

    1. Log into your WordPress dashboard.
    2. Go to the media library.
    3. Find the image you want to prevent from being indexed.
    4. Search for the “Visibility” option in the right sidebar.
    5. Set the visibility option to hidden or private.

    Stop crawling images using robots.txt rules

    User-agent: Googlebot-Image
    Disallow: /images/ex1.jpg

    To exclude multiple images from being indexed on your site, introduce a disallow rule for each image. Alternatively, if the images follow a common pattern, such as sharing a suffix in the filename, employ the * character in the filename. For example:

    User-agent: Googlebot-Image
    # Repeated ‘disallow’ rules for each image:

    Disallow: /images/ex1.jpg
    Disallow: /images/ex2.jpg
    Disallow: /images/ex3.jpg

    # Wildcard character in the filename for
    # images that share a common suffix:
    Disallow: /images/pictures-*.jpg

    Stop crawling images using the server’s configuration file

    To include the X-Robots-Tag in a website’s HTTP responses, modify the configuration files of your site’s web server software. For instance, on Apache-based web servers, you can utilize the .htaccess and httpd.conf files. Incorporating an X-Robots tag in HTTP responses offers the advantage of defining crawling rules that apply universally across a site. The use of regular expressions provides a high level of flexibility.

    You can use the X-Robots-Tag for non-HTML files, such as image files, where implementing robots meta tags in HTML is not feasible. Here’s an instance of incorporating a noindex X-Robots-Tag rule for image files (.png, .jpeg, .jpg, .gif) throughout an entire site:

    Apache:
    <Files ~ “\.(png|jpe?g|gif)$”>
    Header set X-Robots-Tag “noindex”
    </Files>

    Nginx:
    location ~* \.(png|jpe?g|gif)$ {
    add_header X-Robots-Tag “noindex”;
    }

    Thread Starter nwevhajkan

    (@nwevhajkan)

    Thank you Clay.

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘Stop auto-generated URLs from being crawled’ is closed to new replies.