Discussion Thread
Data Extractor
Message Thread

Extract any data, including email addresses and URLs from your files and webpages.
Posted in the Data Extractor Forum.
Regex help
I'm having a really hard time setting the regex for 2 sets of data I need to extract. The first is a plain text url on multiple websites that is setup like this:
http://www.site.com/news/...tory.cgi/5
The problem is that its not a hyperlink/clickable url and i am not sure if data extractor can harvest urls that are in plain text
The second regex I am having problems with is extracting a plain text url that comes after certain text which looks like this:
text link for this article:
http://www.site.com/RANDO..._FILE_NAME
This one is the hardest for me to figure out because all of the urls are really different. Sometimes they are in one folder and at other times its in 3 folders or it could be an html extension, a php extension or no extension at all.
I would really appreciate any help or advice given, thanks.
Regex help
Data Extractor usually extracts actual links, so your best bet is to create a new javascript based rule.
If you make a copy of the 'Extract Emails from Webpages' rule you can change the regular expression embedded in that rule to:
https?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?