Forum Replies Created

Viewing 1 replies (of 1 total)
  • I am crawl engineer for Croatian Web Archive and our bot sends this User-agent header since 2004: Mozilla/5.0 (compatible; SrceHarvester/3.3.1 +https://haw.nsk.hr/)
    Our good/legitimate bot is filtered out (status 403) by BPS just because it contains the word “harvest” (defined in BPS htaccess RewriteCond). It is also possible that other national archives (https://www.netpreserve.org/) have similar problems due to the common use of the word “harvesting” in the context of web archiving.

    We could suggest web owners to manualy edit theit htaccess and remove the word harvest but the average wordpress user does not feel comfortable with editing htaccess.

    Is it possible that you exclude “harvest” from rewrite rules in future BPS releases?
    If not what should we suggest to web owners that want to have their web archived?

    Thanks.

Viewing 1 replies (of 1 total)