Discussion Thread

Data Extractor

Message Thread

For WindowsData Extractor

Data Extractor iconExtract any data, including email addresses and URLs from your files and webpages.

Posted in the Data Extractor Forum.

extract text between two keywords.

Hi, My XML file has a content of

bookmark filepath="66286_toc.pdf" page="1" title="Table of Contents" type="GoToR" view="XYZ" view-left="-19" view-top="847" /

I want to extract only 66286_toc.pdf
I am not sure how to write a regex for this. Please help.
by ping pong on Apr 5 2007 9:19am Reply

extract text between two keywords.

This would depend on exactly the structure of your data but you would probably want to extract numbers followed by an underscore followed by a letters then a dot then "pdf".

You may find this regular expression help file useful in formulating regular expressions:
by Nico Westerdale on Apr 5 2007 9:47am Reply

extract text between two keywords.


Thanks for your quick reply. I read that but still cannot make it work. Maybe It's just because I don't understand it. Basically, I want to extract all pdf filenames between the bookmark tags in my XML files. Below are what I tried to do.

The first keyword is bookmark filepath=".
The second keyword is " page.

I tried ^{bookmark filepath=}*{" page}$

but it didn't work. Any thought?
by ping pong on Apr 5 2007 10:41am Reply

extract text between two keywords.

Try this pattern

by Nico Westerdale on Apr 10 2007 8:16am Reply

extract text between two keywords.


Did this work? I don't want to hijack the thread or anything but as it's so relevant I want to do something very similar but extracting everything in between the words option value and option.

This is to extract shopping cart info out of a biggish web site so it will need to search through all the sub-folders of the main site folder on my hard drive.

If anyone could help me I would very grateful, I'm a total noob at this coding stuff.


by Mark Williams on Jul 17 2007 2:25am Reply

extract text between two keywords.

It worked in my test case Mark, I can't really say more than that!
by Nico Westerdale on Jul 17 2007 4:39am Reply

extract text between two keywords.

Is it possible to create a pattern where the text in the web page looks like this.

Broker Name : A J Banford
Company Name : The Radford Company
12308 Ocean Gateway #5

ocean City, Maryland 21842
530-546-7963 (fax)

Brokers By Name

I have thousands of web pages that all have the same formatted text in the middle. I need the program to copy from "Broker Name:" until "(fax)" including this text and put it into a text file, then skip a line or two, add the text from the next web page.

The result will a text file with all the names and addresses in it. I have another program that will parse the name and address and process them.

by Bruce steinberger on Jan 13 2008 6:52pm Reply

extract text between two keywords.

I'm sure it would be, we do have a custom rule service if you're interested:
by Nico Westerdale on Jan 15 2008 3:30pm Reply

Our Software Stores

IconicoAccurate Design and Development Software

BitsDuJourDiscount Deal Coupons for Windows and Mac Software Apps

Our Software Services

IcoBlogOur Official Blog

© copyright 2004-2024 Iconico, Inc. Code & Design. All Rights Reserved. Terms & Conditions Privacy Policy Terms of Use Login