Essay URLs indexed on Google.
-
So here’s the deal. I do a site: search for my site entertheraptor.net on Google and I find Google has indexed all of these URL’s that relate to essay writing, pages that don’t exist on my site. If I follow these URL’s they lead either to my homepage or an archive page on my site.
I’ve used Wordfence to scan the site with all of the options turned on (files outside your WordPress install etc.) and I have searched the database using phpMyAdmin for any reference to essay and deleted any tables that resulted in the search. Wordfence finds nothing wrong and I still have the problem.
I’m a bit stuck. Does anyone have any ideas?
The page I need help with: [log in to see the link]
-
Hi @entertheraptor,
Because your website does not return anything different for these URls, Google will continue to index pages.
What I can recommend is to explicitly declare that these parameters should not be crawled.
To do this: Edit your
robots.txt
and add the following lines:User-agent: * Disallow: /*?essay=* Disallow: /*?number=* Disallow: /*?solutions=*
I noticed while doing a google search that there were other URLs also being indexed with
number
andsolutions
.Once your robots.txt has been updated on your end, use your Google search console to recrawl your website at
https://entertheraptor.net/robots.txt
.Dave
Hi @wfdave
Thanks for the suggestions. What I would really like to do though is find the malicious code that is creating these URL’s and remove it rather than just tell Google not to index them. Because if Google can see these URL’s anyway then Google may conclude that the site has been hacked and apply manual actions which is exactly what has happened just this morning to another site I manage.
So the problem isn’t that Google can see these URL’s and is indexing them, the problem is that the site has been hacked and these bogus URL’s exist. And that this has occured while the site was running Wordfence and Wordfence can’t see or detect that anything is wrong.
Also, I don’t actually have a robots.txt.
Cheers, any further ideas would be very much appreciated.
Ok so now I’m completely baffled.
So after all other avenues leading to dead ends I decided to delete the database and all of the site files and start again from scratch. That would have to eliminate any malicious code right?
So I clean installed WordPress with a clean database and then did a site: search and followed these links expecting them to now return a 404 not found right? Wrong!
If I follow these links to my site they still go to either the home page or an archive page and don’t return a 404. I’m at a complete loss.
Any ideas before Google gives me manual actions?
Do you have a google webmasters account?
I would go into this via your google account and try sort it from there. I haven’t the best memory of exactly where to do this and how, but I reckon that would be the place to start as what I do know, google crawlers don’t necessarily index your website just because you change your robot.txt inside public_html. Google may not index your website for days or weeks or even longer. In the meantime, a lot of this will be cached and so it doesn’t immediately disappear just because you wiped your website and started over…
So the above is the reason for me suggesting you go to the source and attack it from that location (ie google webmasters)Thanks for trying to help but the problem isn’t what appears in Google search, the problem is that the site displays content for URL’s that don’t exist on the site. Even if I try to use Googles URL removal tool it won’t remove the URL’s from search because it detects that the content that URL points to is still live on the site because it loads content (be it the home page or an archive page) rather than returning a 404 which is what you would expect it to do.
I can’t understand why these URL’s don’t return a 404 even after I deleted the database and all of the site files and did a clean install of WordPress.
can you provide me with the essay urls that you are referring too?
I opened internet explorer and the search engine returns only the following for me that relates to your .net website…
entertheraptor.net
Making an Impact Across the Globe. Consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.entertheraptor.net/about
Services? What services? We could probably help you build a truly awesome website but it would cost you. Let’s face it, you get what you pay for.entertheraptor.blogspot.com
Just kicking off my new blog. I’ll post something that’s actually worth while soon.- This reply was modified 5 years, 5 months ago by adamjedgar.
Wow I’m not sure how you got that.
Anyway go to Google and type site:entertheraptor.net and you’ll see what I mean.
ah ok I see the references you are talking about now.
Most of them give me “page not found errors” and the ones that don’t redirect to you new homepage.
I also note that at the top of my google browser search results google has placed the following entry…
Try Google Search Console
https://www.google.com/webmasters/
Do you own entertheraptor.net? Get indexing and ranking data from Google.I am fairly sure these are cached index results that are not coming directly from your website…they are cached somewhere else (ie google).
You need to look for any files on you server, or that have been uploaded to google webmasters that have these indexes (ie something like sitemap files such as sitemap.xml or something like that). If there is no sitemap on your webserver somewhere that google has managed to find, then it must be in webmaster tools somewhere and you need to manually find and remove it.
Just one other thing to look at, are you using cloudflare or any kind of caching like that?
Have you used any kind of third party wordpress caching plugins that contain this kind of sitemapping on an external url somewhere that is backlinking to your site and that google is finding?
- This reply was modified 5 years, 5 months ago by adamjedgar.
- This reply was modified 5 years, 5 months ago by adamjedgar.
The answer is pretty much no to everything, and some of them return a 404 but others load either the home page or an archive page and these pages are not cached. I deleted everything this morning and the pages I’m now getting are fresh installed WP running 2019 theme and not the site that was there before I deleted everything which is what you would expect to get if it was a cached page. And even Google’s removal tool thinks these pages are live (and not cached).
As for finding files on the server, refer to deleted everything.
Thanks anyway.
So you have 100% gone into Google webmaster tools (or setup a new account and added this website to it) and 100% ensured that there are no sitemap.xml files or any urls that refer to these?
IT is absolutely impossible for this to happen on a fresh install unless something is either still in the database, or on your server somewhere, or the pages that are returning actual content are cached!
Can I ask, when you say you reinstalled, you completely deleted the entire wordpress database, and all files in your directory on the server, then created a new database (with a new name)?
If you indeed have performed a completely fresh install without any possibility of the old database or directories being inadvertently utilised, then this must be a caching issue!
One could argue some kind of masking, however, I would have thought if that was the case, then it would not show your new blank website on any of the urls I see in the google search results.
A final question…can you outline briefly your hosting?
Do you have VPS, shared hosting …?
who is your host? I could look it up I suppose….oh!!!! Australian domains international! this isn’t part of crazydomains is it?
Could I suggest that you also post onto https://forums.whirlpool.net.au/forum/63 (this is under the IT Industry forums/Web development)
There are some guys on whirlpool forums who are absolute geniuses when it comes to this kind of thing (particularly a guy called “Reg”). I am also on those forums (im no genius I mostly use it for help with problems myself with my hosting business), I will keep an eye out for your post and will “pm” you his details.
A little word of warning…whilst don’t be afraid to make mention of crazy domains (if that is the network you are on), they will all tell you to move away from them as a matter of urgency.
I am sorry I cant help further
kind regards
Adam“IT is absolutely impossible for this to happen on a fresh install unless something is either still in the database, or on your server somewhere,”
My thoughts exactly!
“Can I ask, when you say you reinstalled, you completely deleted the entire wordpress database, and all files in your directory on the server, then created a new database (with a new name)?”
YEP!
Yes CrazyDomains shared.
Thanks for the Whirlpool tip.
ok.
More than likely the boys over at whirlpool will additionally suggest you move hosts. I wouldn’t necessarily accept this as a solution to this issue…as it may have nothing to do with it…they just have a passionate dislike for crazydomains! (me too actually).
Anyway, I hope to read about your problem and its solution there.
I will sign off now.
kind regards
AdamYou might also check your .htaccess file to see if you have redirects in it that point any of those requests to your home page. As mentioned before, if Google sees a valid page, even if they were redirected to it because the page they went to sent them there, it is possible that Google will count the bad URL as valid because it sees a real page. A redirection plugin may be the culprit as well.
I can’t see whether or not you answered that you have a Google Webmaster account in the thread but asking to have the site re-indexed from there will likely clear the search results. With Google, unfortunately sometimes it is a waiting game.
Tim
Hi @wfsupport
Thanks for responding. I have checked the .htaccess file and it is fine. There is no page redirection, there are no pages.
I do have a Search Console account and have been trying to see what I can do from there but it doesn’t appear to be much. Like you said, because those URL’s resolve to content that is live, Google thinks it’s ok.
The real problem here (you could say) is there is no way to communicate with Google and tell them these URL’s are dodgy
Do you know how to use Htaccess to redirect these URL’s to a 404?
Hey @entertheraptor,
I looked around a bit and it looks like this stackoverflow article might help. But you might also reach out to your host for their help, I would think that they could help with this.
https://stackoverflow.com/questions/18030491/redirect-one-url-to-another-url-using-htaccess
Please let us know how it goes.
Thanks,
Gerroald
- The topic ‘Essay URLs indexed on Google.’ is closed to new replies.