• Resolved nosamttimhcs

    (@nosamttimhcs)


    Hello,

    I received an email notification from Google Search Console, indicating that 3 ai1ec pages on my site are indexed, even though they are in my robots.txt file.

    I believe the ai1ec plugin adds entries to the robots.txt file, with the intent of blocking specific pages from being indexed, however, according to Google this is not the recommended way of preventing indexing.

    Would you please consider updating the plugin to use the noindex attribute?

    Thanks!

    The page I need help with: [log in to see the link]

Viewing 15 replies - 1 through 15 (of 20 total)
  • Hi @nosamttimhcs,

    Unfortunately, no, this isn’t ideal, as many users actually want their calendar and events to be indexeed in search engines. Use of the robots.txt file is a method to prevent search bots from listing particular pages from your domain. Be aware that updating this page may take time to reflect a search engine’s results.

    Thread Starter nosamttimhcs

    (@nosamttimhcs)

    Hi @sunny454,

    I too want the calendar on my website to be indexed by search engines, so I’m not requesting that all ai1ec pages be listed with the noindex attribute. There are however specific pages that the ai1ec plugin adds to the robots.txt file, such as the one I linked to in my original post, for which Google is recommending that the noindex attribute be used.

    Hi @nosamttimhcs,

    The plugin uses the “Disallow:” attribute for security purposes, as it supercedes the “Noindex”.

    Noindex: tells search engines not to include your page in search results. Used for perferences.

    Disallow: tells them not to crawl your page. Used for security reasons.

    You can add your own noindex attribute for any pages you do not want indexed from your website.

    Thread Starter nosamttimhcs

    (@nosamttimhcs)

    I’m not sure that using Disallow: in robots.txt supersedes the Noindex attribute or that you can consider robots.txt to be a security tool. Reading and obeying the robots.txt file is purely voluntary, so it’s not like putting pages in the robots.txt file is going to somehow protect them from being accessed. If you want to secure specific pages, you can put a rule in a .htaccess file or the equivalent for non-Apache web servers, or you can use authentication to prevent access to those pages for non-authenticated users.

    What we’re talking about is whether the search engines include the page in their index or not, after they crawl your site. If another website links to a page on my website, Google’s crawler will crawl and submit the page to its index, regardless of what’s in the robots.txt. However, the noindex attribute will be obeyed by Google, regardless of whether another site links to a given page.

    I hope that helps clear up the confusion.

    Hi @nosamttimhcs,

    Clearly, the robots.txt file is not being used as security tool.

    The plugin uses Disallow to prevent the pages from being crawled in the first place.

    The Noindex pages get crawled but the bot is told not to index them.

    Hence, since Disallow prevents the bot from crawling the page in the first place, there is no chance of those pages being listed or indexed and the data from those pages will not be part of the search engine’s data.

    Noindex is generally meant to be used on a page by page basis.

    For reference, Google’s John Mueller said not to use the Noindex attribute:
    https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html

    Noindex is also not supported by Bing.

    • This reply was modified 6 years, 5 months ago by Sunny Lal.
    Thread Starter nosamttimhcs

    (@nosamttimhcs)

    Hi @sunny454,

    There appears to be some confusion about noindex being used in robots.txt vs noindex being included in the page as a meta tag. John Mueller was saying that noindex should not be used in a robots.txt file – which is correct. Noindex is meant to be used as a meta tag in a HTML page.

    What I’m suggesting, is that the ai1ec plugin add the noindex meta tag, to pages that shouldn’t be indexed, rather than using the Disallow attribute in the robots.txt file; as this will ensure that the pages aren’t indexed, even if someone else links to them.

    As mentioned in my original message: when I received an email notification from Google Search Console, I was directed to the following content on Google’s support site.

    Indexed, though blocked by robots.txt: The page was indexed, despite being blocked by robots.txt (Google always respects robots.txt, but this doesn’t help if someone else links to it). This is marked as a warning because we’re not sure if you intended to block the page from search results. If you do want to block this page, robots.txt is not the correct mechanism to avoid being indexed. To avoid being indexed you should either use ‘noindex’ or prohibit anonymous access to the page using auth. You can use the robots.txt tester to determine which rule is blocking this page. Because of the robots.txt, any snippet shown for the page will probably be sub-optimal. If you do not want to block this page, update your robots.txt file to unblock your page.

    Here’s the URL for the above quote – https://support.google.com/webmasters/answer/7440203#indexed_though_blocked_by_robots_txt

    For my website, Google Search Console is providing warnings about the following 3 URLs:

    • This reply was modified 6 years, 5 months ago by nosamttimhcs.

    Hi @nosamttimhcs,

    There are certain directories that are blocked using robots.txt. What particular pages are you suggesting should be blocked via inline linking? The events pages, calendar views, etc are created dynamically, so links are dynamically created.

    Also if one page uses a noindex attribute to link to a page, Google can still index the page if another page links to it without a noindex attribute.

    Nevertheless if you want to submit this to the development team as a feature request, youc an do do here, please include the details:

    https://ideas.time.ly

    Ok, whatever it is best one thing or the other I don’t know, but I have the same problem in one of my customers.

    Then what can we do to avoid the Search console errors? I don’t think default settings should make so many errors to show up.

    Thanks.

    Hi @taisa1984

    What are the errors your customer is receiving?

    Hello,

    Same Errors here about robots.txt file

    https://xxxxxxxx.fr/?plugin=all-in-one-event-calendar&controller=ai1ec_exporter_controller&action=export_events

    Disallow: /*controller=ai1ec_exporter_controller*

    I have exactly the same problem as guys above. My calendar is on this page:
    https://ba2.casd.sk/kalendar/
    When you check source code, the page itself is correctly set as noindex,nofollow.
    Robots.txt shows for example this disallow:
    Disallow: /kalendar/action~oneday/
    Google found around 130 issues related to this. For example these pages:
    https://ba2.casd.sk/kalendar/action~oneday/exact_date~3-10-2018/
    https://ba2.casd.sk/kalendar/action~oneday/exact_date~20-12-2016/
    https://ba2.casd.sk/kalendar/action~oneday/exact_date~24-5-2017/
    etc.
    In plugin’s settings, I have setting “Exclude events from search results” marked since always.
    So, it seems like plugin is not correctly doing it dynamically for all pages?

    Hi @skubko,

    The way the robots.txt file works is that there is a blanket statement for your directory. There are not any individual noindex or disallow statements for each page. If your robots.txt file is already stating:

    Disallow: /kalendar/action~oneday/

    To explain what’s happening, per @nosamttimhcs above:

    “What we’re talking about is whether the search engines include the page in their index or not, after they crawl your site. If another website links to a page on my website, Google’s crawler will crawl and submit the page to its index, regardless of what’s in the robots.txt. However, the noindex attribute will be obeyed by Google, regardless of whether another site links to a given page.”

    The noindex attribute he refers to is in the link itself, if any, that is point to your page.

    Hi @sunny454,

    I am aware of what @nosamttimhcs wrote. Exactly, Google obeys pages that have noindex attribute. Default calendar website (in my case ba2.casd.sk/kalendar) has this noindex set, so it is correctly obeyed. However, for whatever reason this is not correct for pages generated by your plugin as those ARE indexed by Google. My request is basically exactly the same as @nosamttimhcs wrote above:

    “What I’m suggesting, is that the ai1ec plugin add the noindex meta tag, to pages that shouldn’t be indexed”.

    Until this is achieved, I do not think we have any other option, just to always receive warning from google about plugin’s generated pages…

    Btw. I am using Yoast SEO plugin and that one apparently adds this meta tag to also this generated page. When I checked the source code for
    https://ba2.casd.sk/kalendar/action~oneday/exact_date~3-10-2018/
    I see it has this line:
    <meta name=”robots” content=”noindex,nofollow”/>

    So now I am not sure about this one either…

    skubko

    Hi @skubko,

    I don’t believe you fully understand what I quoted.

    Noindex on your own server can be over ridden (forced) by the attribute in a link pointing to your site, so it is not a definite method, and thus, pages can still get indexed by a search engine. As you mentioned also, your Yoast plugin does what was asked by adding the meta tag — if the page still appears in Google’s index, it may be that it will take time to drop off the search results once it was indexed prior to the addition of the noindex meta tag.

    In any case, for your idea, you can submit a ticket directly to our Dev Team to implement this feature for a future release:

    https://ideas.time.ly

    keysnparrots

    (@keysnparrots)

    Noindex on your own server can be over ridden (forced) by the attribute in a link pointing to your site

    There is no such thing as a noindex attribute on a link. You’re talking about nofollow, about which Google says, “In general, we don’t follow them.

    noindex can only be used in a page’s meta tags, and Google does not allow links on other sites to override it; the idea is ridiculous. Imagine if other sites could tell Google not to index yours at all!

    The approach you are using causes warnings in Google Search Console and Google provides instructions for correcting this by using noindex. Period. There’s nothing left to debate. You’re doing it wrong.

Viewing 15 replies - 1 through 15 (of 20 total)
  • The topic ‘Google Search Console warning’ is closed to new replies.