• I’m trying to load existing content from a website using the links finder, but I don’t want to make a list of lots and lots of container URLs. Instead, I would like to use the sitemap.xml file. It has all the links already listed but for some reason I can’t make it work.

    Here is a sample wordpress.com generated sitemap.xml

    https://citizenwells.com/sitemap.xml

    As you can see it would be very useful to be able to process these because they have:

    a) All the links.
    b) URL to featured image.
    c) Last modification date

    From this you could even store the xml page and so when you process it again, you can skip anything that is old (because you have the date) and only process the latest links (new and/or modified). For the modified you can update the existing post, or just skip.

    The HMTL parser could then get title etc from the page.

    This becomes an easy way to load existing content from most sites , and also to load new content for sites without RSS feeds. I think it would be easier than using the normal links finder in many cases.

    Almost all websites have a sitemap.xml because that is what google uses to index a website. And all sitemap.xml files have the link and last modified date, some may have image urls. More information is here:

    https://www.sitemaps.org/protocol.html

    https://www.ads-software.com/plugins/wp-pipes/

Viewing 5 replies - 1 through 5 (of 5 total)
  • Thread Starter NanoWisdoms

    (@nanowisdoms)

    1,000,000 self hosted blogs (not wordpress.com) use this plugin to make sitemap.xml for google:

    https://en-ca.www.ads-software.com/plugins/google-sitemap-generator/

    WordPress.com automatically creates sitemap.xml (see my first post).

    Almost every website has sitemap.xml because google needs it. So, if you can read sitemap.xml with Links Finder that would make everything very easy for loading content from old and new links on a website.

    If website does not have a sitemap.xml, we can use these tools to make our own sitemap.xml of a website for a first time load:

    https://www.xml-sitemaps.com/
    https://www.web-site-map.com/
    https://xmlsitemapgenerator.org/

    Thread Starter NanoWisdoms

    (@nanowisdoms)

    This is a sample site map made by the Google Sitemap Generator plugin (above):

    https://www.arnebrachhold.de/sitemap.xml

    Very easy to use to load with Links Finder because this sitemap is HTML, but all sitemap.xml are not HTML, some are only XML (like WordPress.com) — see first sitemap.xml in my first post here.

    But if you can also use the time, then you can find the new content like RSS without much trouble.

    Plugin Contributor Tung Pham

    (@phamtungpth)

    Hi Nanowisdoms,

    Links Finder add-on should be used to detect<a> tags from a page. So it can not read the URLs from XML files. You will need the suitable source for your XML files.

    Best Regards!

    Thread Starter NanoWisdoms

    (@nanowisdoms)

    Yes I understand. That is why I’m suggesting you create a new source to support sitemap.xml files because every website has sitemap.xml

    They have the post link, last updated date, sometimes image file link too and they are updated like RSS with new posts (old posts that are changed get a new date, but then they will have same slug so will just be skipped as duplicates).

    So for websites with no RSS they make it easier than using DOM and regions of home page etc. Sitemap.xml would be more stable (as doesn’t matter if website design changes — like with RSS) and so also less errors.

    Plus they can be used to load old content very quickly and easily.

    Plugin Contributor Tung Pham

    (@phamtungpth)

    Hi Nanowisdoms,

    Thank you for very valuable suggestion! We will consider to develop that new source!

    Best Regards!

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Using Links Finder source with sitemap.xml files’ is closed to new replies.