Discussion Thread
Data Extractor
Message Thread
Extract any data, including email addresses and URLs from your files and webpages.
Posted in the Data Extractor Forum.
Stack Overflow Problem
Hello,
I have recently started using Data Extractor for extracting certain text from web pages, however I keep getting an error whenever I use JavaScript to extract text; "Stack Overflow at line: 0".
Any help would be appreciated.
Regards
Stack Overflow Problem
I'd try a reinstall first. Then let me know.
Stack Overflow Problem
Hello Nick,
I followed your advice and reinstalled the application, but the problem persists. Could you throw some light?
Regards
Stack Overflow Problem
That's very strange, could you tell me extactly your settings on all tabs?
Stack Overflow Problem
Well, I use Data Extractor on a cached web page, and apply a JavaScript to extract the data. Do you want to see the JavaScript?
Stack Overflow Problem
Without having all the settings there's not much I can do!
Stack Overflow Problem
Ok, These are the settings on all my tabs:
1. Where to Extract - Extract from Multiple Files and/or Web page URLs
Here I use a cached web page from a 3rd party web crawler.
2. What to Extract - I made a custom rule called "Extract Info", where I used some JavaScript code I found on the internet and wrote my own code too
It is as follows, if u want a look :
var tables = document.all.tags('TABLE');
var rows;
var cells;
var timeExp = /([0-1][0-9]|2[0-3]):([0-5][0-9])/;
var dateExp = /[0-3][0-9]-(Jan|Feb|Mrz|Apr|Mai|Jun|Jul|Aug|Sep|Okt|Nov|Dez)-[0-9][0-9][0-9][0-9]/g;
var counter = 0;
var time, date;
if (tables)
{
for (var t=0; ttables.length; t++)
{
rows = tables[t].all.tags('TR');
if (tables[t].all.tags('TABLE').length == 0)
{
for (var r=0; rrows.length; r++)
{
if (rows[r].innerText != '')
{
cells = rows[r].all.tags('TD');
DataExtractor.StartNewResult();
var text = rows[r].innerText
DataExtractor.SetColumns(3);
if (text.match("Seramis"))
{
DataExtractor.AddResult(1, text);
time = rows[r-1].innerText.match(timeExp);
date = rows[r-1].innerText.match(dateExp);
DataExtractor.AddResult(2,time);
DataExtractor.AddResult(3,date);
}
}
}
}
}
}
Then when I click start extraction I get the error "Stack Overflow at line: 0". I also noticed in the results that when I extract data between a large div tag which contains in itself more div tags, then along with the text these div tags are also extracted and when I export the results in an excel file, I see little square boxes in the places where the div tags would have been. Any thoughts on this as well?
Stack Overflow Problem
Again without the URL there isn't much I can do. What happens when you try using a simpler rule on the page?
Stack Overflow Problem
The URL is :
http://www.tropenland.at/...p?TID=8844.
I did notice one thing, the JavaScript works on the URL and it doesn't give the error, it only happens when I use the cached web page from a third party crawler.
However the special characters that I wrote about in my previous post exist in both cases. Any help would be appreciated.
Stack Overflow Problem
Well your answer clearly lies in the difference between the live and crawler pages.
Stack Overflow Problem
I agree on that issue, however the point about there being special characters in the output is the same whether I use live or crawler pages. If you were to try it yourself, you would see what I am talking about.
These characters make the output look messy, I have tried several URL's but the problem still persists.
Stack Overflow Problem
Then I suggest you use a different crawler as that's what's causing the problem!
Stack Overflow Problem
I have now tried another crawler, and also tested live web pages, the problem with the special characters in the output still persists. I suggest you also give it a try yourself to understand my problem.
I have also tried to use variations in my Javascript to strip the characters from the output but its of no avail.