ICONICO

Discussion Thread

Data Extractor

Message Thread

For WindowsData Extractor

Data Extractor iconExtract any data, including email addresses and URLs from your files and webpages.

Posted in the Data Extractor Forum.




extraction rules

pattern based rules which are developed using regular expression are based standard regular expression rules available in every language? Like I can create a regular expression using expresso to extract phone numbers? will that regular expression work in data extractor?

secondly, can you point some basic tutorial or link for rule creation
by kadrwa badam on Oct 23 2010 7:46am Reply

extraction rules

Can I append some text or symbol to extracted data? say I extracted phone numbers and at the end of every phone number i want to add ;
by kadrwa badam on Oct 23 2010 7:57am Reply

extraction rules

Hi,

Sure, you can develop the regular expressions in any tool you want and then use them in Data Extractor.

To read more about regular expressions please see here:
http://www.iconico.com/DataExtractor/help.aspx#pattern
and further here:
http://geekswithblogs.net/brcraju/articles/235.aspx
but you could also search Google by "regular expression tutorial" to find in depth tutorials for regular expressions.

Regarding your second question: please notice that using regular expressions, Data Extractor can only extract data from different sources and cannot append any characters or strings to what it finds.

However Data Extractor can work with HTML Webpage Script rules (not only regular expression rules) and using these HTML Webpage Script rules you may control the output. But these rules are created differently than regular expression rules. Read more about them here:
http://www.iconico.com/DataExtractor/help.aspx#jscript

Thanks!
Constantin
by Constantin Florea on Oct 23 2010 9:23am Reply

extraction rules

However Data Extractor can work with HTML Webpage Script rules (not only regular expression rules) and using these HTML Webpage Script rules you may control the output. But these rules are created differently than regular expression rules. Read more about them here:
http://www.iconico.com/DataExtractor/help.aspx#jscript

>>>and that Webpage Script rules can be applied to any file? let it be html, xls, cvs or txt?

Webpage Script is Data Extractor's independent scripting mechanism? or it is standard like regular expression?

Any tool which you may recommend to append some string to extracted date?
by kadrwa badam on Oct 23 2010 2:50pm Reply

extraction rules

Hi,

Regarding where these rules can be applied:
Answer: The Webpage Script rules can be applied to HTML files only. But as a work around you could include into an HTML file (if it's feasible) the data that you need to be parsed and then use these rules.

Regarding if these rules are an independant scripting mechanism:
Answer: Writing these rules means that you have to write Javascript so they are not an independant scripting mechanism but DataExtractor does offer a few additional Javascript commands for better control:
DataExtractor.QuitExtraction();     After the current rule has completed Extraction will halt.
DataExtractor.SetColumns(columns);     Sets the number of columns in the extraction grid.
DataExtractor.AddResult(column, data);     Adds the result text specified in data, at the column number column.
DataExtractor.AddHeader(column, data);     Specifies the column header text specified in data, at the column number column.
DataExtractor.StartNewResult();     Instructs the Data Extractor to start a new result. This should be specified before any results are added.
DataExtractor.ShowError(errorText);     Reports an error to the user specified as errorText. This does not halt the extraction.
DataExtractor.ClearResults();     Clears all results from the Extraction grid.
DataExtractor.AddURL(URL);     Adds the specified URL to the 'Files for Extraction' list. If the Data Extractor is using that list to extract, then the URL that is being added will also be extracted. This command can be used to create custom webpage spiders.

Regarding any tool we recommend for appending string to extracted data:
Answer: I am not very familiar with actually writing these rules but since you can control the output and I'm looking here at the third instruction above:
DataExtractor.AddResult(column, data);
I believe it would be possible to format the output as you like by appending anything you want before, between and after each found result.

Please read the following text extracted from:
http://www.iconico.com/DataExtractor/help.aspx#jscript
explaining what I've just said :)

HTML Webpage Script
If you're extracting from the web, or any HTML files then you may use an HTML Webpage Script to extract data. This option is the most flexible, and does require that you're familiar with JavaScript. The script that you enter will be executed directly on the webpage, so you will have access to the Document Object Model (DOM) in exactly the same manner you would if you were writing javascript on an HTML page.

The Data Extractor allows for several additional javascript commands that control the Data Extractor:

DataExtractor.QuitExtraction();     After the current rule has completed Extraction will halt.
DataExtractor.SetColumns(columns);     Sets the number of columns in the extraction grid.
DataExtractor.AddResult(column, data);     Adds the result text specified in data, at the column number column.
DataExtractor.AddHeader(column, data);     Specifies the column header text specified in data, at the column number column.
DataExtractor.StartNewResult();     Instructs the Data Extractor to start a new result. This should be specified before any results are added.
DataExtractor.ShowError(errorText);     Reports an error to the user specified as errorText. This does not halt the extraction.
DataExtractor.ClearResults();     Clears all results from the Extraction grid.
DataExtractor.AddURL(URL);     Adds the specified URL to the 'Files for Extraction' list. If the Data Extractor is using that list to extract, then the URL that is being added will also be extracted. This command can be used to create custom webpage spiders.

If you require help making scripts please contact us for our rule making service.
    http://www.iconico.com/DataExtractor/customRule.aspx

Thanks!
Constantin
by Constantin Florea on Oct 24 2010 7:21am Reply

Our Software Stores

IconicoAccurate Design and Development Software

BitsDuJourDiscount Deal Coupons for Windows and Mac Software Apps

Our Software Services

IcoBlogOur Official Blog

© copyright 2004-2024 Iconico, Inc. Code & Design. All Rights Reserved. Terms & Conditions Privacy Policy Terms of Use Login