Discussion Thread
Data Extractor
Message Thread

Extract any data, including email addresses and URLs from your files and webpages.
Posted in the Data Extractor Forum.
Not extracting all possible results
I have an issue regarding a Javascript rule. The extractor does not seem to be extracting ALL of the available results on the page. What is happening is that it is only getting the first result from a given page, then skipping ahead to the next page. What should happen is that it should get all the results (more than one per page) before going ahead.
Let's say my page contains this. I have two 'date-header' tags and i would like to extract the text between both of them.
content
more content
And my code is as such:
DataExtractor.SetColumns(1);
DataExtractor.AddHeader(1, 'Text');
var div = document.getElementById('date-header');
DataExtractor.StartNewResult();
DataExtractor.AddResult(1, div.innerHTML);
I am also going to add in a column for the title of the page but I omitted here for simplicity.
This problem does not appear to occur with the ready-made rules such as the email or URL extractors, but it is occuring with anything I am writing myself.
Any suggestions on how I can write this to make sure it catches all of the results on each page?
Not extracting all possible results
whoops, it didn't print my HTML or line breaks, sorry about that.
The HTML of the page would be, substituting parentheses for carets:
(div id=date-header)Tuesday, April 13, 2010(/div)
content
(div id=date-header)Monday, April 12, 2010(/div)
content
Not extracting all possible results
You need a "for" loop to cycle over the elements and add them one at a time.