Discussion Thread
Data Extractor
Message Thread

Extract any data, including email addresses and URLs from your files and webpages.
Posted in the Data Extractor Forum.
Specific Fields Required
I have tried 'many' software in the past (Automation Anywhere, iMacros, ezExtract etc. etc.) and I still cannot find something for my dilemma.
I have a list of over 4 million URLs from the same website/layout and I am in the process of downloading the HTML source for all of them.
What I need is the following columns extracted;
- Registrant
- Company Name
- Address
- Phone
- Fax
I tried your demo and am fiddling with the rules but can't seem to find how-to get it working for my needs.
Here is a sample of what I need;
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
table cellspacing="2" cellpadding="2" border="0" id="dvPubPersonRebba2002" width="520"
tr bgcolor="#84C463"
td colspan="2"font color="White"b
span id="dvPubPersonRebba2002_lblPubPersonDetailCaption"font size="4"Registrant: TEST J TEST/font/span
/b/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bThis Registrant is a /b/font/tdtdfont color="#333333"
span id="dvPubPersonRebba2002_lblPubPersonRegDetail"Salesperson/span
/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bRegistered Tradename/b/font/tdtdfont color="#333333" /font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bRegistration/b/font/tdtdfont color="#333333"
span id="dvPubPersonRebba2002_lblPubPersonRegistration"Registered/span
/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bRegistration Expiry/b/font/tdtdfont color="#333333"2010/11/20/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bProposal Information/b/font/tdtdfont color="#333333"span id="dvPubPersonRebba2002_lblPubPersonProposal"/span/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bSuspension Information/b/font/tdtdfont color="#333333"span id="dvPubPersonRebba2002_lblPubPersonSuspension"/span/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bConditions/b/font/tdtdfont color="#333333"span id="dvPubPersonRebba2002_lblPubPersonTC"/span/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bCharges/b/font/tdtdfont color="#333333"span id="dvPubPersonRebba2002_lblPubPersonCharges"/span/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bConvictions/b/font/tdtdfont color="#333333"span id="dvPubPersonRebba2002_lblPubPersonConvictions"/span/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bDiscipline Decision/b/font/tdtdfont color="#333333"span id="dvPubPersonRebba2002_lblPubPersonCCD"/span/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bCompany Name/b/font/tdtdfont color="#333333"span id="dvPubPersonRebba2002_lblPubPersonCompany"A href=URL HERETEST TEST INC./a/span/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bBusiness Address/b/font/tdtdfont color="#333333"span id="dvPubPersonRebba2002_lblPubPersonCompanyAddress"123 TEST Unit: 1000BRTORONTO, ONBR555 555BRCANADA/span/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bBusiness Phone/b/font/tdtdfont color="#333333"555-555-0494/font/td
/trtr bgcolor="#E8F3E9"
td valign="top" bgcolor="#ECF4E8" width="150"font color="DarkGreen"bBusiness Fax/b/font/tdtdfont color="#333333"555-555-7106/font/td
/tr
/table
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
Much obliged,
Specific Fields Required
Better yet after looking at the post I will simply give the URL of what I need;
reco. on .ca/ SearchRegistrant.aspx?section=RAPubPerson&K= 1450900
I have put spaces in spots to make it not readable by the search engines.
Thank you deeply!
Specific Fields Required
You would need a custom Javascript rule to do this. If you're familiar with Javascript you should be able to write one, just see the examples on the 2nd tab. If not we have a service:
http://www.iconico.com/Da...mRule.aspx