• I am trying to scrape copyright free public domain content from a vast network of old websites (circa 2003) to a single new website on my localhost. The network of old sites has a little over 40,000 posts in total spread across 12 domains. And the post are only written content and images, no videos at all.

    I do not own these sites and the sites do not even have a sitemap and when I tried importing from a XML generator tool I used, the posts came out all messed up or erroneous. So instead I used this web scraper plugin from the wordpress plugin repository and all was working out well.

    When I started scraping at first, the posts got scraped really fast at around 100 posts/10 min. But as the total post number kept increasing, the scraping speed got slower.

    After completing 10,000 posts, the speed reduced to 200 posts/hour which is still doable. But now that I have posted 25,000 posts, the scraping speed is down to 200 posts/day!

    I still have about 20,000 posts left to add and it is not possible for me to continue at such slow pace.

    I created a fresh second site on my localhost and tried scraping to see if it was my WAMP that has slowed things down. But no, it has not. Newer sites scrape just as fast as the main site did at the beginning.

    As an alternate solution, I tried creating posts on a second site and importing from its XML to my main site but importing through the XML file is just as slow and the posts often mess up the featured images. Importing content through XML on any site other than the main site takes less than a few minutes for me otherwise. So I am pretty sure it is only the main site I’m working on that has slowed down immensely

    I tried using WP-optimize to optimize the tables and database for the main site but that has not helped at all. I’ve tried deleting cache using Ctrl + F5 but that has not helped either.

    I did expect the site to get slower as I added more content but not this slow. And it seems only adding new posts, whether it be through scraping or importing is affected. The dashboard, site, navigation and everything else even though a bit is working alright. So if someone could help me figure out how I can make adding new content through scraping just as fast as it was the beginning or even half as fast, I would be very thankful.

Viewing 3 replies - 1 through 3 (of 3 total)
Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘Adding posts via scrapping getting slower and slower each da’ is closed to new replies.