ICONICO

Discussion Thread

Data Extractor

Message Thread

For WindowsData Extractor

Data Extractor iconExtract any data, including email addresses and URLs from your files and webpages.

Posted in the Data Extractor Forum.




Problems with extraction results layout

I have configured the java code to extract page title, all the meta tags, and text I have contained in my webpages to a spreadsheet or database format.

The problem I have now is the results format.
I want to have a number of columns that contain each section of information.
For example column 1 is the page title, C2 is the meta name description, C3 is the meta name keywords etc.

What I want is for every web page extracted for all the information to be contained onto one row. I do not want it separated onto 2,3,4 different rows.
All the page information must be kept onto one row then the following row will contain the next webpage.

Could you please help.
I have tried a number of various ways but with little result
Craig
by craig hodges on Dec 19 2007 10:46am Reply

Problems with extraction results layout

It should be very easy, just do:

DataExtractor.SetColumns(3);
DataExtractor.AddHeader(1, 'header1');
DataExtractor.AddHeader(2, 'header2');
DataExtractor.AddHeader(3, 'header3');

DataExtractor.StartNewResult();
DataExtractor.AddResult(1, 'data1');
DataExtractor.AddResult(2, 'data2');
DataExtractor.AddResult(3, 'data3');

replace 'data1' etc with your real data.
by Nico Westerdale on Dec 19 2007 11:05am Reply

Problems with extraction results layout

This is the format that I have written the javascript.
This method does extract the results that I want. The problem is that I want the results displayed with all the information for each page placed in a single row per result.
The problem is that when I run the process, instead of having the result on one line they appear as follows:

Title Meta Name Meta content
page title 1 description meta description here
page title 1 keyword meta keywords here
page title 1 author joe blogs

What I need is this format

title meta desription meta keywords meta author
page 1 meta descrip 1 keywords for page 1 author page 1
page 2 meta descrip 2 keywords for page 2 author page 2

I can extract the data it is just trying to re-fornat the data to the format above that i require.

By the way thanks for the quick response
by craig hodges on Dec 19 2007 12:55pm Reply

Problems with extraction results layout

I see what you are trying to do. You would have to change the rule to loop over the meta tags, you can use a "for" loop in javascript. Look for the description and add that to column 2, the keywords and add that to column 3 and the author and add that to column 4.
by Nico Westerdale on Dec 19 2007 1:17pm Reply

Problems with extraction results layout

How would you go about using the For command as I have heard about it but never used it, I had consider trying to use the if command but each time i tried this method I would end up with the word after the if command being placed in all the column fields.
Could you give an example on one the the fields please
by craig hodges on Dec 19 2007 1:27pm Reply

Problems with extraction results layout

I don't really have time to go in depth, but start here:
http://www.w3schools.com/...op_for.asp
by Nico Westerdale on Dec 19 2007 1:30pm Reply

Problems with extraction results layout

I take it you are trying to say to use the For command to run the program once gathering the data and then to stop the program once it reaches a determined value.
I am abit confussed about how the for command can be applied?
by craig hodges on Dec 19 2007 1:49pm Reply

Problems with extraction results layout

Craig, without writting the rule for you I don't really know how to explain it any better.

You would have to change the rule to loop over the meta tags, you can use a "for" loop in javascript. Look for the description and add that to column 2, the keywords and add that to column 3 and the author and add that to column 4.
by Nico Westerdale on Dec 19 2007 1:55pm Reply

Problems with extraction results layout

I have written the code which starts by setting the columns up and adding the headings in to each.

I have then written a varialbe command to get the meta tags and stated that if the meta length is present to run a program.

I have now inserted a for command which then using the data extract and addresult commands gets the data.

But I am still getting the data in column result format rather than each page on each row.
by craig hodges on Dec 19 2007 2:10pm Reply

Problems with extraction results layout

I will put the code up so that you can see what it looks like

Dataextractor.setcolumns (3);
dataextractor.addheader(1, 'title');
And so on to set the column headers

var meta=document.getelementsbttagname ('meta');
if (meta.length 0) {
for (i=0; imeta.length; i++) {
Dataextractor.startnewresult();
dataextractor.addresult (1, document.title);
dataextractor.addresult(2, meta[i].name);
And so on for all relevant data collection

But it still produces a column rather than row format
by craig hodges on Dec 19 2007 2:16pm Reply

Problems with extraction results layout

move the startnewresult out of the loop and put it before it.
by Nico Westerdale on Dec 19 2007 2:24pm Reply

Our Software Stores

IconicoAccurate Design and Development Software

BitsDuJourDiscount Deal Coupons for Windows and Mac Software Apps

Our Software Services

IcoBlogOur Official Blog

© copyright 2004-2024 Iconico, Inc. Code & Design. All Rights Reserved. Terms & Conditions Privacy Policy Terms of Use Login