I've been using Data Extractor for scraping on Craig's List with great results, but the problem I'm having is the software will sometimes just stop and not continue harvesting from it's URL list for no apparent reason. I'm mystified as to why this is happening. I've tried randomizing the order of the list of URLs to be harvested from (all URLs are for real, existing pages), so that there aren't any identifiable patterns in my scraping which the site might pick up on and block (eg. 500 consecutive page requests to a single classifieds area (eg. Antiques For Sale) in a single city (eg. Detroit). But this doesn't seem to help. Could they be monitoring my IP address? I've found none of the IP address changing programs seem to work.

I'd also like to know if there's a way for the software to automatically resume scraping if you hit an error page (eg. page not found). These pages stop the software dead in it's tracks and you have to manually prompt the software to get it back on track again. Many hours of productivity get lost this way - and I don't want to have to babysit the thing. If the software had a user specifiable maximum time between page requests (eg. 60 seconds) before continuing to the next URL to be harvested, this problem would be solved.

Any insight? Thanks.


