Discussion Thread

Data Extractor

Message Thread

Data Extractor

Extract any data, including email addresses and URLs from your files and webpages.

Info Download Buy $29.50

Posted in the Data Extractor Forum.

Regex help

I'm having a really hard time setting the regex for 2 sets of data I need to extract. The first is a plain text url on multiple websites that is setup like this:

http://www.site.com/news/...tory.cgi/5

The problem is that its not a hyperlink/clickable url and i am not sure if data extractor can harvest urls that are in plain text

The second regex I am having problems with is extracting a plain text url that comes after certain text which looks like this:

text link for this article: http://www.site.com/RANDO..._FILE_NAME

This one is the hardest for me to figure out because all of the urls are really different. Sometimes they are in one folder and at other times its in 3 folders or it could be an html extension, a php extension or no extension at all.

I would really appreciate any help or advice given, thanks.

by jake smith on Mar 9 2007 10:55pm

Regex help

Data Extractor usually extracts actual links, so your best bet is to create a new javascript based rule.

If you make a copy of the 'Extract Emails from Webpages' rule you can change the regular expression embedded in that rule to:

https?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?

by Nico Westerdale on Mar 13 2007 12:29pm

Back to Data Extractor

ICONICO

Discussion Thread

Data Extractor

Message Thread

Data Extractor

Regex help

Regex help

Our Discount Deals

Our Software Stores

IconicoAccurate Design and Development Software

BitsDuJourDiscount Deal Coupons for Windows and Mac Software Apps

Our Software Services

IcoBlogOur Official Blog

© copyright 2004-2024 Iconico, Inc. Code & Design. All Rights Reserved. Terms & Conditions Privacy Policy Terms of Use Login