Data Extractor Help

Simply Extract any data from files or webpages

Data Extractor Help
The Data Extractor allows you to extract any type of information from files on your computer or pages from the web. To install the Data Extractor download it and then unzip the file. Double click the DataExtractor.exe file to install.
1. Where to Extract
Use the first tab to specify where you want to extract data from. You can specify to extract information from a variety of sources.

Data Extractor Screenshot
Continually Extract From Internet Explorer as I browse different Pages
After selecting this option and starting extraction you can use Internet Explorer to define the pages that will be extracted from. As you surf to different pages the data will be extracted from each webpage until you click Stop. Note that as you visit new pages the pages are automatically entered into the file list below which you can then save and re-use later.

Extract from the Text you enter here
If you wish to extract data from some text the easiest option is to enter that text into this text area.

Extract from Multiple Files and/or Web page URLs
To extract from a selection of files and/or URLs enter them in this box. Add individual files to the Multiple file list by using the 'Add File' button, or by dragging and dropping the files directly into the file area. Each file and URL you enter will be scanned for extraction. You can also add an unlimited number of URLs by clicking the 'Add URL' button. To add the current page open in Internet Explorer use the 'IE URL' button. The 'Save List' and 'Load List' lets you save long lists of URLs for use later, the lists are saved as text files with a file on each line.

Version 3.1 Includes a new checkbox 'Follow all webpage links in the same domain and keep extracting'. When this is checked entire domains are automatically checked by the Data Extractor. Note this only works with Javascript based rules.

Extract from the Contents of a Folder
If you want to extract information from all the files in a folder then use this option. Choose a folder and you can specify whether to extract from all the files in subfolders by clicking the 'Include Subfolders' checkbox. You can specify which file types are scanned for extraction by using the 'File Types' textbox. The default file type is '*.*' which includes every file. To specify only to extract from text or html files you can use a file type mask of: '*.txt; *.html'. Note that each mask is separated by a semicolon.

Files to Scan
If you wish to have the Data Extractor scan through binary files such as images, executables and compressed data files then select 'Binary Data Files', otherwise just leave the 'Text and HTML files' option selected.
2. What to Extract
After you have chosen where to extract from use the second tab to specify what to extract. The Data Extractor comes with six predefined rules that you can use to extract common pieces of data that you may be looking for:

Email Address
Fully Typed Internet URLs
U.S. Phone Numbers
Extract Image details from webpage
Extract URLs from webpage
Extract Form Field details from webpage

These rules can be used simply by selecting them and clicking the 'Start Extracting' button. You can also add new rules by clicking the 'New' button and selecting the 'Edit Rule Details' button.

Details on how to construct your own rules follow. All rules are automatically saved as you create them.

Pattern Based Rule
This option uses regular expressions, which is an advanced pattern matching language, to match information. You can specify your own regular expression; to learn more about the regular expression language you can take a look at this tutorial. You may choose the 'Match Case' option to ignore or include uppercase and lowercase matching.

Data Extractor Screenshot
Text Based Rule
If you want to search for exact text then specify the 'Text Based Rule' option. Specifying 'Match Case' will ensure that your text is matched only when the uppercase and lowercase characters match.

To match your text using wildcards you can specify a wildcard search using the following options:

?  match any single character
*  match any substring, including an empty string
#  match any numeric character (0 to 9)
@  match any alpha character (A to Z, or a to z)
$  match any alphaNumeric character
~  match any non-alphaNumeric, non-space character

For example, performing a wildcard search for 'f?nd' will match 'fend', 'find' and 'fond'.

You can also specify a fuzzy search. Fuzzy searches have a defect limit, which specifies how fuzzy the fuzzy search should be. For example a fuzzy search for 'found' will match 'bound' with defect limit of 1. Extending the defect limit to 3 will match 'freed' as there are three character changes between 'found' and 'freed'.

Data Extractor Screenshot
HTML Webpage Script
If you're extracting from the web, or any HTML files then you may use an HTML Webpage Script to extract data. This option is the most flexible, and does require that you're familiar with JavaScript. The script that you enter will be executed directly on the webpage, so you will have access to the Document Object Model (DOM) in exactly the same manner you would if you were writing javascript on an HTML page.

The Data Extractor allows for several additional javascript commands that control the Data Extractor:

DataExtractor.QuitExtraction(); After the current rule has completed Extraction will halt.
DataExtractor.SetColumns(columns); Sets the number of columns in the extraction grid.
DataExtractor.AddResult(column, data); Adds the result text specified in data, at the column number column.
DataExtractor.AddHeader(column, data); Specifies the column header text specified in data, at the column number column.
DataExtractor.StartNewResult(); Instructs the Data Extractor to start a new result. This should be specified before any results are added.
DataExtractor.ShowError(errorText); Reports an error to the user specified as errorText. This does not halt the extraction.
DataExtractor.ClearResults(); Clears all results from the Extraction grid.
DataExtractor.AddURL(URL); Adds the specified URL to the 'Files for Extraction' list. If the Data Extractor is using that list to extract, then the URL that is being added will also be extracted. This command can be used to create custom webpage spiders.

If you require help making scripts please contact us for our rule making service.

Data Extractor Screenshot
3. Extraction Results
Data Extractor Screenshot
You extraction results will be displayed in the main area of this page. Also listed will be the files that you extracted from. Using the toolbar buttons you can Copy, Save, Print or export the extraction results to Microsoft Excel.

To view the individual files you may double-click the filenames and a preview window of the file will open up. If you need to log into websites then you can do so through this mechanism.

If you have further questions, or suggestions for new features you would like to see please get in touch.

Our Software Stores

IconicoAccurate Design and Development Software

BitsDuJourDiscount Deal Coupons for Windows and Mac Software Apps

Our Software Services

IcoBlogOur Official Blog

© copyright 2004-2024 Iconico, Inc. Code & Design. All Rights Reserved. Terms & Conditions Privacy Policy Terms of Use Login