Discussion Thread
Data Extractor
Message Thread

Extract any data, including email addresses and URLs from your files and webpages.
Posted in the Data Extractor Forum.
Question
Hi!
Could you help me with a simple Script, that will identify and list e-mail addresses? Those strings are "hidden" - the program would have to find string e.g.: "lg0 = "maciej_puk";at = "@";lg1 = "op.pl";" - and make "maciej_puk@op.pl" output... This is too difficult for me...
Best regards!
Matt from Poland
Question
Question
Although the whole thing probably can be done in Javascript I know little about it either so would split the task up :-
Run the following script :-
var obj=document.getElementsByTagName('script')
DataExtractor.SetColumns(1);
for (var t=0; t<obj.length; t++)
{
var match=/lto:/;
var matchpos=obj[t].innerHTML.search(match);
if (matchpos != -1)
{
DataExtractor.StartNewResult();
DataExtractor.AddResult(1,obj[t].innerHTML );
}
}
This will get you raw messy data that can be cleaned up easily in notepad, word, Excel etc. using search and replace. E.G.
load the extracted results into notepad and do :-
Replace, (Ctrl H)
Find what: pr_1 = " but leave the Replace with: field blank
Find what: ";pr_2 = " again leave the Replace with: field blank
Find what: ";lg0 = " put a single space in the Replace with: field
Find what: ";at = "@";lg1 = " Put @ in the Replace with: field
Finally
Find what: "; and replace with a single comma ,
This assumes that the Email addresses are "hidden" in the same way on every page.
Save file as ExtractionResults.csv and then open it in Excel, all your emails should be in the first column and you can simply delete any other columns, if you want you can do another search/replace to get rid of the "mailto: " bit.
Question
Done a bit more playing and the following script will do the whole job for the example page you posted but is unlikely to work if the pages in question are formatted significantly differently :-
First it finds the bit of HTML that contains the email this is done by looking at every script tag until it finds one containing lto: (part of the string "mailto:". Then it finds the leftmost part of the email. This can be tweaked by changing the lines :-
var match=/lg0 =/; ("lg0" is the text to find) and
left=left+7; (7 is how far from the found text the email address starts)
It then finds the rightmost part of the email address and the innerleft and innerright parts (i.e. the bit where we need to put an @) again these can be tweaked as above
var obj=document.getElementsByTagName('script')
DataExtractor.SetColumns(1);
for (var t=0; t<obj.length; t++)
{
var match=/lto:/;
var matchpos=obj[t].innerHTML.search(match);
if (matchpos != -1)
{
var match=/lg0 = /;
var left=obj[t].innerHTML.search(match);
left=left+7;
var match=/ document.write/;
var right=obj[t].innerHTML.search(match);
right=right-2
var match=/at =/;
var innerleft=obj[t].innerHTML.search(match);
innerleft=innerleft-2
var match=/lg1 = /;
var innerright=obj[t].innerHTML.search(match);
innerright=innerright+7
var st=obj[t].innerHTML;
var op=st.slice(left,innerleft);
var op2=st.slice(innerright,right);
DataExtractor.StartNewResult();
DataExtractor.AddResult(1,op+"@"+op2);
}
}
And to any experts out there yes I know it's messy :-)
Question
Oops:-
var match=/lg0 =/; ("lg0" is the text to find) and
Should read
var match=/lg0 =/; ("lg0 =" is the text to find) and
Question
Hm! A little thanks/feedback from the original poster would have been nice. Not sure I'll be posting any more freebies :-(