ICONICO

Discussion Thread

Data Extractor

Message Thread

For WindowsData Extractor

Data Extractor iconExtract any data, including email addresses and URLs from your files and webpages.

Posted in the Data Extractor Forum.




h1 tags

I have modified your meta tags code to try and get the h1 tag text from each web page that I scan:

//Extracts all META tags and Title

DataExtractor.SetColumns(5);
DataExtractor.AddHeader(1, 'URL');
DataExtractor.AddHeader(2, 'Title');
DataExtractor.AddHeader(3, 'H1');
DataExtractor.AddHeader(4, 'Meta Content');
DataExtractor.AddHeader(5, 'Meta httpEquiv');

var mt = document.getElementsByTagName('meta');
if (mt.length 0) {
for (i=0; imt.length; i++) {
DataExtractor.StartNewResult();
DataExtractor.AddResult(1, document.URL);
DataExtractor.AddResult(2, document.title);
DataExtractor.AddResult(3, document.all.tags("h1");
DataExtractor.AddResult(4, mt[i].content);
DataExtractor.AddResult(5, mt[i].httpEquiv);
}
}


However, this just show [object] in the results column. Will I be able to extact the h1 or h2 tags in this way?

Regards,
Simon
by Simon EN on Jul 5 2007 9:39am Reply

h1 tags

document.all.tags("h1") is a collection of obejects, you'll need to iterate through the collection. See this for a good starting point

http://msdn2.microsoft.co...36439.aspx
by Nico Westerdale on Jul 5 2007 9:46am Reply

h1 tags

That works great! Thanks for the speedy reply.

My next issue is that some of my urls from my imported url list do not exist. This is causing an error when I try to extract the data from them. Is there anyway to skip if the page no longer exists or is blank?
by Simon EN on Jul 5 2007 11:27am Reply

h1 tags

Glad to hear it!

We have a version of the application in development that should help. I'll email it to you.
by Nico Westerdale on Jul 5 2007 1:54pm Reply

h1 tags

Thank you. The new version works a treat. Just in case anyone else wants to do the same I have included the code below for pulling off the h1 and p text. Not pretty but it works for me.


//Extracts all META tags and Title

DataExtractor.SetColumns(5);
DataExtractor.AddHeader(1, 'URL');
DataExtractor.AddHeader(2, 'Title');
DataExtractor.AddHeader(3, 'H1');
DataExtractor.AddHeader(4, 'Meta Content');
DataExtractor.AddHeader(5, 'Meta httpEquiv');

var mt = document.getElementsByTagName('meta');
var tg = document.getElementsByTagName('p')[0].firstChild.nodeValue;
if (mt.length 0) {
for (i=0; imt.length; i++) {
DataExtractor.StartNewResult();
DataExtractor.AddResult(1, document.URL);
DataExtractor.AddResult(2, document.title);
DataExtractor.AddResult(3, tg);
DataExtractor.AddResult(4, mt[i].content);
DataExtractor.AddResult(5, mt[i].httpEquiv);
}
}
by Simon EN on Jul 6 2007 1:43am Reply

h1 tags

This script seems to work; //Extracts all META tags and Title

I can't get the modified version of it to get the H1 tags to work. Using v3.3.

Also want to pull out H2, H3 etc.

Thanks for the help. This product is going to speed up our research.
by Dean Gannon on Jul 24 2007 3:06pm Reply

h1 tags

Try this rule, it should extract all headers:

http://www.iconico.com/Da...eaders.txt

You can comment out the 'ExtractTag' lines for the different h tags that you do not need.
by Nico Westerdale on Jul 25 2007 8:45am Reply

h1 tags

This DOES work great!! Thanks for the prompt help. Ordering the full product now!!
by Dean G on Jul 25 2007 9:39am Reply

Our Software Stores

IconicoAccurate Design and Development Software

BitsDuJourDiscount Deal Coupons for Windows and Mac Software Apps

Our Software Services

IcoBlogOur Official Blog

© copyright 2004-2024 Iconico, Inc. Code & Design. All Rights Reserved. Terms & Conditions Privacy Policy Terms of Use Login