My apache server CPU is 99.99% when I activate this plugin.
How I can fix it?
]]>Hi Akshay,
Love your plugin, which we’ve been using for many years. Thanks for your help in the past for the Council of Industry (NY State, USA).
We’ve suddenly stopped getting results, and my troubleshooting returned this error message:
Error fetching: cURL error 35: error:14077458:SSL routines:SSL23_GET_SERVER_HELLO:reason(1112)
What does this mean?
Anything that could lead us towards a solution would be greatly appreciated!
Thanks,
– Paul
Hi,
I moved servers yesterday and the plugin just stopped working, not even an error its giving me. Do I need to change something on the server side? Can someone help me with this?
Thanks,
]]>Hello!
First of all, thanks for the plugin! It’s awesome!
At some moment when testing one of the sites got the error:
cURL error 60: SSL certificate problem: unable to get local issuer certificate
Solved the problem by adding the code in the class.wpws.php (after the line $request_args[‘user-agent’] = $request_args[‘useragent’];):
$request_args[‘sslverify’] = false;
It would be great if you added in WP Web Scraper Settings option to enable/disable the certificate verification. As it turned out, this option might be useful.
]]>hi
I am trying to embed an external page that I wrote to my page. I have been able to use your plugin to bring the page in, but it includes sidebars, ads, header and footer. is it possible to have it only bring in the content of the page and not the extras?
Currently this is the code:
[wpws url=”https://www.huahintoday.com/food-wine/northern-thailand-foodies-guide/” query=”” output=””]
regards
]]>Hello,
in the past your plugin worked well, now it shows “Error parsing: Query returned empty response”. We are scrapping the official soccer tables for our team. Could you please tell me what might be wrong?
Link: https://www.skrejsice.cz/tabulka/stredocesky-krajsky-prebor-2016-2017/
Error: “Error parsing: Query returned empty response”
Thank you.
Best,
Ladislav
]]>Is there any way to get the contents of a CSS selector, rather than the CSS selector itself, so the unstyled text from within the selector, eg. Hey so you can get “Hey” as the output, rather than “Hey” as the output?
]]>Hi,
Thank you very much for the WP Web Scaper plugin; I’ve been using it for a few years and it has solved a tricky task for us.
We had run into a problem, in which the cached version was never being updated. With the Cache expiration (minutes) value set to 0, it works. If we set it back to any other value, it shows an old and out-dated cached version.
So following your advise, we installed W3 Total Cache plugin and got that plugin all set up (which of course has many other benefits!) Then I went and changed the Cache expiration (minutes) value in your plugin again, and flushed the W3 Total Cache cache, then reloaded the page which displays the scraped info. Again, it fell back to the old out-dated cached version!
Can you advise on what we should do, or what the proper settings are? If we are using W3 Total Cache, should we just leave the Cache expiration (minutes) value in the WP Web Scraper options set to 0?
By the way, I did enable Object Caching in W3 Total Cache settings.
Thanks for you help!
]]>I sent a paid support request couple of days ago using the plugin site but since then have no heard anything back. Does any of you have expereince with paid supprt or if you know the support is still alive?
Thanks
]]>Hi guys,
This error appears to me with all the sites that I try to make a test
plugin: https://www.ads-software.com/plugins/wp-web-scrapper/
Output
Error parsing: Query returned empty response
Shortcode
[wpws url=”https://www.ads-software.com/showcase/” query=”col-7 main-content” ]
Template tag
<?php echo wpws_get_content(‘https://www.ads-software.com/showcase/’, ‘col-7 main-content’ ); ?>
Debug info
Scrap source and info
Source URL https://www.ads-software.com/showcase/
Query (cssselector) col-7 main-content
WPWS Cache Control Remote-fetch via WP_Http
Other arguments
headers string(0) “”
cache string(2) “60”
useragent string(33) “WPWS bot (https://mywebsite.com)”
timeout string(3) “360”
on_error string(10) “error_show”
output string(4) “html”
glue string(1) ” ”
eq string(0) “”
gt string(0) “”
lt string(0) “”
query_type string(11) “cssselector”
remove_query string(0) “”
remove_query_type string(11) “cssselector”
replace_query string(0) “”
replace_query_type string(11) “cssselector”
replace_with string(0) “”
basehref int(1)
a_target string(0) “”
callback_raw string(0) “”
callback string(0) “”
debug int(1)
charset string(5) “UTF-8”
Any idea?
Regards
]]>I have installed your pro version on our website, however I do not see how you would match the url of the page I am am scraping. Seems easy, but I just don’t see it.
]]>Dear Sirs,
we wrote you 2 days ago about an urgent request.
Do you have some news for me?
Thank you in advance.
Giusy
Hello. I am Petya Panayotova and our company is in the process of making a booking site.As you know, all hotels change their prices and i have one pre-purchase question.We need to know how to track and change the prices on a daily basis. Is this software can download prices, rooms and ratings from multiple hotels urls? So if it can’t, could you please reccomend us a software such as this or right for us ?
]]>There are a couple of issues with the way caching is implemented in this plugin.
1) There is no way to clear cached pages. This feature would make the next issue less important.
2) The TTL on the cache is set when each page is cached. Subsequently changing the duration of the cache does not affect previously retrieved items.
Use case: Scraping a page, discovering an error on the scraped page, fixing the error on the scraped page, then re-scraping the page.
Currently there exists no way to re-scrape the page other than waiting for the cache, as it was when the page was originally scraped, to expire.
]]>For some sites it scrapes this plugin works great. Others it doesn’t.
Here’s an example.
source url: https://www.xtrainingequipment.com/HiTemp-Bumper-Plate-45b-pair_p_155.html
query: #price
This shows the price, but it wraps it in a SPAN tag with id=”price”. It’s the only element with that id, so it shouldn’t be including the SPAN tag.
]]>For some sites it scrapes this plugin works great. Others it doesn’t.
Here’s an example.
source url: https://www.xtrainingequipment.com/HiTemp-Bumper-Plate-45b-pair_p_155.html
query: #price
This shows the price, but it wraps it in a SPAN tag with id=”price”. It’s the only element with that id, so it shouldn’t be including the SPAN tag.
]]>works fine at local but gives 403 forbidden error at live host. how to solve it,
]]>works fine at local but gives 403 forbidden error at live host. how to solve it,
]]>Hi all,
First of all i want to say thank you to the developer. I’ve been looking for a plugin like wp web scraper for two years.
Issue:
I want to scrape only the price from a website, i have the following code but i need to find out how to only display the price and not all the content in the selector=”#main” area.
[wpws url="https://www.bol.com/nl/p/25/9200000051153125/?promo=music_202_AC-2voor40-LP_B7__1_9200000051153125" selector="#main_block"]
I’ve been looking at the CSS selectors page from W3School. But is there a way to narrow the shortcode down to only the div i want to display? In this case….the price.
How do i do that?
Thank you so much.
Cheers,
]]>I installed this plugin and I get the following errors.
In the Settings, Sandbox tab, I get this error in the “Source url” box:
<b>Notice</b>: Undefined index: url in <b>/home/myaccount/public_html_blog/wp-content/plugins/wp-web-scrapper/views/settings.php</b> on line <b>31</b>
On the Import tab, these errors are at the top:
Notice: Undefined variable: post_id in /home/myaccount/public_html_blog/wp-content/plugins/wp-web-scrapper/views/settings.php on line 101
Notice: Undefined variable: post_id in /home/myaccount/public_html_blog/wp-content/plugins/wp-web-scrapper/views/settings.php on line 102
I have debug mode on, so that might be why. More importantly, the CSS selector isn’t working. I tested the sandbox. I cleared all the fields of the errors above, entered these settings.
source url: https://www.roguefitness.com/rogue-echo-bar
query: .product-title
The output is:
Error parsing: Invalid CSS selector
the debug info:
Scrap source and info
Source URL https://www.roguefitness.com/rogue-echo-bar
Query (cssselector) .product-title
WPWS Cache Control Cache-hit Transients API
Other argumentsheaders string(0) “”
cache string(3) “720”
useragent string(72) “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1”
timeout string(1) “2”
on_error string(10) “error_show”
output string(4) “html”
glue string(1) ” “
eq string(0) “”
gt string(0) “”
lt string(0) “”
query_type string(11) “cssselector”
remove_query string(0) “”
remove_query_type string(11) “cssselector”
replace_query string(0) “”
replace_query_type string(11) “cssselector”
replace_with string(0) “”
basehref int(1)
a_target string(0) “”
callback_raw string(0) “”
callback string(0) “”
debug int(1)
charset string(5) “UTF-8”
I tried the CSS selector in Chrome Developer Tools, with the selector “.product-title” as above and it came up with 1 result as it should.
]]>Hello Akshay,
I would like to echo text together with the output when there is a successful response only. Something like the opposite of the ‘on_error’ like ‘if_content’. Is there such a thing? I tried adding . ‘text’ . within the template tag but couldn’t get it to work. If not please add it.
Thank you.
]]>I went in via FTP with Filezila and hand deleted all the 7998 files in http-cache and the next day found that all the files had re-populated that directory?
How can I purge that directory without the files returning?
The site I want to scrape has a page it inserts fro first time visitors, so all the scraper plug in sees is that page. Is there any way to have it save a cookie or other way of getting to the desired page?
]]>Hi!
Is there a chance you remove PHP_EOL in header and footer or make it configurabel (possibly with debug=”0″) ?
I love your plugin, but with this state I can’t build urls scraping parts from other websites.
EXAMPLE
<a href='https://repository.pietma.com/nexus/service/local/repositories/releases/content/com/pietma/billoader/[wpws url="https://repository.pietma.com/nexus/service/local/artifact/maven/resolve?r=releases&g=com.pietma&a=billoader&c=win32&p=x86.zip&v=RELEASE" query="version" output="text" debug="0"]/billoader-[wpws url="https://repository.pietma.com/nexus/service/local/artifact/maven/resolve?r=releases&g=com.pietma&a=billoader&c=win32&p=x86.zip&v=RELEASE" query="version" output="text" debug="0"]-win32.x86_64.zip'>download</a>
Should result in
<a href='https://repository.pietma.com/nexus/service/local/repositories/releases/content/com/pietma/billoader/0.0.2/billoader-0.0.2-win32.x86_64.zip'>download</a>
CHANGED CODE
} else {
WP_Web_Scraper::$error = "Error fetching: ".$response->get_error_message();
}
$ob_header = ""; # PHP_EOL
$ob_footer = ""; # PHP_EOL
if ( WP_Web_Scraper::$args['debug'] == 1 ){
$ob_header = PHP_EOL.
'<!--' . PHP_EOL .
' Start of web scrap (created by wp-web-scraper)'
Thanks!
Marti
]]>I have the newest version of WP and have been getting scraped property data by my IDX vendor using WP Web Scraper ver 2.7 and the line of code they suggest – however, when I upload ver 3.5 the properties disappear and in fact the entire page below the navigation bar disappears to the background except for the page title? I also get a message in the WP admin dashboard saying the plug in was deactivated because “it has no header info” – and I have to reactivate it? Still no properties – just a clear page?
Yet on some of my sites (using the exact same theme (WP Twenty Eleven Theme) ver 3.5 of the plugin is working with the exact same code?
What would cause this?
The site is https://www.greatlassiterhighschoolhomes.com/lassiter-high-school-homes-for-sale/ – you can see the home listings only because I have the old Ver 2.7 of the plugin installed.
Any advice or insight – all my sites have the exact same plugins so it should not be a plugin conflict…
Apostrophes, Em-Dashes, other simple characters are coming through the scraper as “?¢????” … Set output to text, checked source, and most are simple queries.
Example:
Source URL: https://www.youtube.com/feeds/videos.xml?playlist_id=PLXo2mTq4AYbJF2QrGtpJWrKAVxY0W4A0c
Query: description
arguments: eq=1
The apostrophe in Kasich’s name is a jumbled mess.
Any ideas how to fix?
]]>Hello, firstly I would like to thanks everyone who contributed towards this plugin, I had one issue to see if anybody can help to resolve this issue where the font of the data that I extract comes along with it. I use the shortcode when inputting the extracted data into the table.
I wanted to know is this an issue with the table or the web scrapper if so how can I avoid this
I scrap the price from this site the price
https://www.cdkeys.com/pc/games/tom-clancy-s-the-division-pc-cd-key-uplay
Hope you can help thank you
]]>Im using the latest WP and latest WP Web Scraper and PHP 5.3.3.
But not matter what I scrape I always get an empty response. I see no obvious errors.
This is an example of a trivial code that does not work.
[wpws url="https://www.mooball.com" query="title" ]
What am I doing wrong? how do I resolve this?
]]>hey,
I scraped a text with a lot of links I don’t need. so is it possible to remove them?
]]>Hi.
I’ve just started using WP web scraper, it is a great product. But now I have run into som trouble.
I have set up this:
[wpws url=”https://hirtshals-fyr.dk/cam3.htm” query=”/html/body/div[1]
This parses the image fine, but after 10 seconds, when the image refreshes. Then the whole window jumps to the url it is scraping from.
What have I done wrong?
]]>