Discussion Thread
Data Extractor
Message Thread

Extract any data, including email addresses and URLs from your files and webpages.
Posted in the Data Extractor Forum.
Quit after first email address of each domain
Is it possible to crawl a list of domains searching for email addresses, and quit looking on each domain when an email address is found on each domain? In other words, I only want to find one email address for each domain in the list. I tried the code below on a short list of domains, and it did terminate the search early, but it returned the same email address twice from one domain, and no email addresses from the second domain. I wanted one email address from each domain.
DataExtractor.AddHeader(1, 'Email Addresses');
var objRE = /\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*/g;
var objResults = document.body.outerHTML.match(objRE)
if (objResults) {
for (var i=0; i < objResults.length; i++) {
DataExtractor.StartNewResult();
DataExtractor.SetColumns(2);
DataExtractor.AddResult(1, objResults[i]);
DataExtractor.AddResult(2, document.location.href);
DataExtractor.QuitExtraction();
}
}
Quit after first email address of each domain
You can do this from the code in the sense that you can show only the first entry even if Data Extractor retrieves them all.
I don't think there is a way to prevent Data Extractor to retrieve all entries matching a regular expression.
Quit after first email address of each domain
Can you post the code that will accomplish this?
Quit after first email address of each domain
My idea is that you should execute the lines inside the "for" instruction only once so you should have something like:
...
...
for (var i=0; i < objResults.length; i++) {
if (i == 0)
{
// here come the lines you want to execute to show the result
...
...
// And, instead of DataExtractor.QuitExtraction(); I would put a
//"break;" instruction to quit the "for" early
}
}