Discussion Thread
Data Extractor
Message Thread
Extract any data, including email addresses and URLs from your files and webpages.
Posted in the Data Extractor Forum.
JavaScript Help extracting DOM data help requested
Hi folks,
I'm having trouble extracting data... Please help!
Issues I'm having:
1. I can't figure out how to get Data Extractor to read from a text file created locally (not on a server or domain) containing the URLs to the pages I want to scrape.
2. I'd like the links in the current domain followed.
3. My JavaScript doesn't seem to be working.
I'm trying to get the address data & contact name from this code:
###
<div class="panel-container">
<div class="panel" style="display: block;">
<div class="panel-wrapper">
<!--AGENT INFO-->
<div id="agent-info">
<span class="agent-info-h2">John T. Carmichael </span>
<span class="agent-info"></span>
<div class="clear"></div>
</div>
<!--AGENT INFO-->
<p>
1230 Yahoo Rd Ste A234<br>
Lodi, CA 94515<br>
P: (456) 123-4356<br>
F: (456) 234-5678
</p>
</div>
</div>
</div>
###
and here's the Data Extractor JavaScript I'm using...
###
DataExtractor.SetColumns(2);
DataExtractor.AddHeader(1, 'AgentName');
DataExtractor.AddHeader(2, 'ContactInfo');
var agentName = document.getElementById('agent-info').innerText;
if (agentName.nextSibling && agentName.nextSibling.nodeName == "P") {
var contactInfo = agentName.nextSibling.innerText;
}
if (contactInfo) {
for (var i=0; i contactInfo.length; i++) {
DataExtractor.StartNewResult();
DataExtractor.AddResult(1, agentName);
DataExtractor.AddResult(2, contactInfo);
//DataExtractor.ShowError(errorText);
}
}
###
JavaScript Help extracting DOM data help requested
I have improved the JS a bit...
I've commented out the 'with(document) {' line in my 'improved' version, as I haven't been able to get it to work reliably...
Here's my improved version (still doesn't work, but using the console, this gets me closer to the real information I'm looking for.
###
DataExtractor.SetColumns(3);
DataExtractor.AddHeader(1, 'URL');
DataExtractor.AddHeader(2, 'AgentName');
DataExtractor.AddHeader(3, 'ContactInfo');
var contactInfo = document.getElementById('panel-container');
var agentName = document.getElementById('agent-info');
var agentAddress = agentName.nextElementSibling;
//with(document) {
if (contactInfo) {
for (var i=0; i contactInfo.length; i++) {
DataExtractor.StartNewResult();
DataExtractor.AddResult(1, URL);
DataExtractor.AddResult(2, agentName);
DataExtractor.AddResult(3, agentAddress);
//DataExtractor.ShowError(errorText);
}
}
//}