Google (etc) indexing of data on page
-
I’ve just installed the plugin (on localhost site for now) and see that if I look at the page source of the page I have included a bibliography on, that no text appears and therefore Google (etc.) will not index the text (as it’s not there) and therefore it will not show up in search queries: is that right or am I missing something fundamental?
I did a quick check of this site https://peml.uwaterloo.ca/?page_id=73 who had posted on this forum, and sure enough if I Google some of the references on that page that are generated by Zotpress, they don’t show up on Google results.
-
Yes, I believe that’s true, because as I recall Zotpress is using AJAX (to be specific WordPress’s AJAX functions) to load content.
That’s a great pity as that makes the plugin fundamentally not useful for anyone who wants to share their data and for their data to be found by others – using the plugin means that all the references are effectively black boxed and not able to be found by people searching the internet.
Yes, it’s not ideal. Over the years Zotpress has gone back and forth (between cURL and AJAX), because there’s trade-offs either way. In the last big update, based on all the feedback I received (especially a survey I ran with current users), I went with syncing (accurate and automatic updating), loading speed, and user experience, and that meant AJAX, even though that led to the content becoming dynamic. For SEO, the hope is that other content will bring people to the website, and in any case many search engine’s algorithms are mysterious and ever-changing (ie., having your name listed many times on a page, such as in a personal publications list, might be a bad sign to search engine crawlers rather than a good one due to people trying to game Google’s system before).
I’ve been pondering a combined version but it’s a very complicated plugin under the hood; developers are welcome to provide ideas.
Thanks for the reply.
Re. “having your name listed many times on a page, such as in a personal publications list, might be a bad sign to search engine crawlers rather than a good one due to people trying to game Google’s system before”. I would definitely disagree with that: I have one publications list and two long bibliographies and I’m fairly certain it is not a bad thing: not just for SEO, but for actual people to be able to find your site and content.I really do think for the information to be searchable and findable by people on the internet is crucial (sort of what the internet is for).
I wonder if the people who are using your plugin are aware that in using it they are effectively telling Google not to index their content, and that their data can’t be found?
Anyhow, I would strongly suggest for another update to bring the content back onto the pages as indexable content somehow.
The idea is that references and bibliographies should supplement existing rich content, which should help you more for SEO purposes. Google warns against “numerous unnecessary” keywords but it’s not clear what “numerous” means or if it’s smart enough to understand “necessary” in context. This is coming from the days when people would create pages filled with keywords to hijack search results. But as far as I know none of us can be sure if/how Google distinguishes these since it’s proprietary.
In any case, yes, it would be best to have a non-dynamic aspect. I’m already thinking about it as I said; I’m just not sure how it can/will work out.
I guess one needs to think beyond SEO as such and think of how other people find content; i.e. if someone searches for a given article or author, they can also find results where people are citing that article/author on blog/wordpress site.
Another major limitation of the plugin is an impossibility of searching within a site for author/articles across a website. If a blog has 50 posts and references 200 articles by 250 authors, a reader may want to find all instance of ‘Author A’ or ‘Article C’ on a website: using the search box this should be possible to see all posts that mention Author A or Article C, but using Zotpress means that Author A etc will not be returned as a result because the text is not actually on any post. So, any kind of searching by author/year/word in article etc etc is impossible!
I think if you could resolve the issues regarding functionality and speed while making the content searchable and indexable you would have an excellent plugin! As is, unfortunately, I can’t use it.
re: searching, it would work a different way, so it’s not impossible per se. When you create shortcodes, you have to supply an ID, and these are linked to the semantic content. The challenge is knowing what pages have shortcodes. I think I’d have to search all posts and check if the shortcodes exist on a given post and if so what they say. It sounds resource intensive, especially on a large site. Assuming WP search considers content generated by shortcodes, having some form of static content would be best to address both issues. Unfortunately I’m not sure if/when I can get this done.
Just for clarity’s sake, I wanted to add that it’s not true that Google doesn’t index dynamic (AJAX) content. Google’s crawler does in fact render the page with javascript enabled and can include dynamic content in their results. More info here.
Instead of viewing the page source (which is just the HTML delivered by the server), try using something like Chrome’s developer tools and view the “Elements” tab. From there you’ll see that page content as it’s actually rendered with javascript active. That’s how Google’s crawler sees the DOM as well.
That doesn’t solve the problem of local search, but that’s a much trickier problem. WP search does not consider content generated by shortcodes by default. Most likely you’d have to do some kind of sync with Zotero and save the results to the local database, or use a third-party search system that searches the generated content (like a Google custom search), not the database.
Thanks for clarifying, Dalton! I’m relieved re: Google, but still pondering re: WP search. I wonder if it’s possible to make use of a hook to get it to search the database (a local copy of just the used shortcodes are stored by Zotpress in a database).
Just for clarity’s sake, I wanted to add that it’s not true that Google doesn’t index dynamic (AJAX) content.
Be that as it may: if you do a few tests like I did whereby you put exact excerpts in quotation marks of the content that has been used by users of Zotpress and search in Google (using pages that have been up for a long time e.g. https://www.ads-software.com/support/topic/are-you-using-zotpress-post-your-url-here?replies=9 ) you will see that none of the content displayed by Zotpress shows up on Google searches, meaning that all the content is effectively blackboxed and cannot be found by users and therefore is actually disruptive in terms of the internet being an interconnected and searchable entity.
I did a good few of these tests, and they all showed no results. Therefore, I wasn’t just basing this on what I can see on the page source, but actually testing to what degree Google is indexing the content that is presented via Zotpress.
My sneaking suspicion is that users of Zotpress do not know that they have effectively blackboxed their content and made practically invisible on the internet!
Is it just the Zotero content, or is it the other content on the page as well? e.g. “Lala [zotpress]. Lala.” Will “Lala. Lala” show up at least?
I checked a few users’s contents and it seems it’s just the Zotpress component’s content that’s not turning up in searches.
Thanks for checking. Let’s be careful about phrasing here so as not to scare people — it sounded like you meant all content was blackboxed, but it’s just the Zotero citations, to some degree. I’m still not clear on whether Google isn’t indexing it, or if it’s just not showing up on the preview text in the Google search results.
Hi, As I mentioned in the very first post: I’m just trying to figure this out, and not trying to scare anyone.
Using the example I used earlier (https://peml.uwaterloo.ca/?page_id=73 ) the word count on that page is about 4400 words of which 4285 is via Zotpress and not being indexed, so that’s what I mean that the users content is not being indexed: in this example probably 97% of the pages content.
isn’t indexing it, or if it’s just not showing up on the preview text
I don’t care about showing up in a preview: I am talking about not being indexed and not being searchable: i.e. A google search coming back with ‘no results’ is what I found when looking at exact matches, or the similar article found on other webpages but not the example on.
Again, I am not trying to scare anyone: I’d like to be proved wrong here, because I actually want to use this plugin!
- The topic ‘Google (etc) indexing of data on page’ is closed to new replies.