Discussion Thread

Data Extractor

Message Thread

For WindowsData Extractor

Extract any data, including email addresses and URLs from your files and webpages.

Posted in the Data Extractor Forum.

How to extract any specific instances of @domain.com?

In using Data Extractor I want to know how to extract any instances of emails from a specific domain within Google. For instance, if you search Google for @blah.com, the results returned will show any instances of email addresses containing @blah.com bolded after the @ sign. This prevents Data Extractor from recognizing these strings as email addresses because the program seems to be strictly looking for the conventional email format only within the HTML with no tags such as bolding. However, any instances of @blah.com will show up in the HTML code with bolding tags within the address. For instance, the address steve@blah.com will show up in the HTML as steveem@blah.com/em. Data Extractor will not recognize this address. However, if the program found an instance of john@florida.com, it would extract it because it has no HTML tags within it like the steve@blah.com address does.

So how do I train Data Extractor to extract any instances of @blah.com as well as any other email addresses? Thanks.
by Randy Macdonald on Aug 6 2008 7:10am Reply

How to extract any specific instances of @domain.com?

You can do this quite easily.

If you go to the second tab in Data Extractor and select the "Extract Emails from Webpages" rule then press "Edit Rule Details".

Look for the line:
var objResults = document.body.outerHTML.match(objRE)

change it to:
var objResults = document.body.innerText.match(objRE)

That will then ignore the HTML tags and just search the text of the page.
by Nico Westerdale on Aug 6 2008 7:28am Reply

How to extract any specific instances of @domain.com?

Ok, that's awesome. But some email addresses come up in search results with extraneous characters or text in them. For instance, some come up as mailto:name@address.com or (name@address.com) or name@address.com. So I have 2 questions

1) Do the default settings in Data Extractor recognize these types of emails?

2) What criteria does Data Extractor use to distinguish an email address from any other string?

3) Can this criteria be modified?

by Randy Macdonald on Aug 10 2008 5:17am Reply

How to extract any specific instances of @domain.com?

Data Extractor uses a Regular Expression to find the addresses.

The easiest way to fix the problem is to replace this line:

DataExtractor.AddResult(1, objResults[i]);

with this:

DataExtractor.AddResult(1, objResults[i].replace('mailto:', '').replace('(', '').replace(')', ''));
by Nico Westerdale on Aug 10 2008 6:01am Reply

How to extract any specific instances of @domain.com?

Please I have a similar problem.   I need to extract partially displayed email addresses from several webpages.  The addresses are displayed as "name@...".  The extractor is not extracting because it cannot find the @URL.com portion.  I need it to extract the partial display and I will fill out the rest myself.  In short, I need to extyract only the "name@..." expression. 
by Alex Adka on Mar 25 2010 5:59pm Reply

Our Software Stores

IconicoAccurate Design and Development Software

BitsDuJourDiscount Deal Coupons for Windows and Mac Software Apps

Our Software Services

SoftwareMarketingResourceYou Wrote the Code, Now How do you Sell it?

IcoBlogOur Official Blog

© copyright 2004-2019 Iconico, Inc. Code & Design. All Rights Reserved. Terms & Conditions Privacy Policy Terms of Use