View Full Version : regexp help

red penguin
08-18-2003, 10:32 AM
Hello coders...

Setup the question: What I have thus far (working) but am looking for advice and/or improvements.

1. To search a file for any reference to an <a > tag. Grab the href value. done
2. Slap said values into an array. done
3. Display these in a checkbox form so the user can select as many as they'd like. done
4. Submit these values (an array) back to a script that will: grab the same very links we are speaking about from the very same file, figure out which links the user has selected, and replace the selected links with a different extension. done

Okay, so you ask, what's the problem? Well, the idea here is to allow the user to select links from a page (file) and change the extension from .htm/.html TO .php. Yes. This works fine via a few functions working closely together and passing a few variables around. (see above)

Now, the regexp that I use to REPLACE is:
Easy, right? Originally, I was using:
This works fine on all relative links, which, btw, is really only what the user wants to select/change, however, is not working very well on a value such as http://www.foobar.com/contact/index.html. (There should be no real reason to change that absolute link anyway...)

(FYI: I am building an app that is writing some PHP dynamically to a page, even a page that is .html, and changes it's extension...if the users site is built around HTML and not PHP, there should be relative links within the site...this allows them to selectively change extensions quite easily)

So...Should I modify the search regexp to ONLY include relative links? Do I keep it as is and just tell the user that (s)he shouldn't select any absolute links (unless they are part of his/her site)?

Sorry if this is confusing...Again, it is working, however ALL href values are selected from a page. I am really only concerned with relative links. I'm thinking it'd be a better process if only those (rel links) were returned by my search regexp:

## from my search function
if(preg_match_all("%<a href=(['\"]+[^<]*['\"]+)%",$orig_content,$args))
for($i=0;$i<count($args[1]);$i++) array_push($arr,$args[1][$i]);
return $arr;


08-19-2003, 01:30 AM
Well I am too lazy to come up with the negating regex tonight. But this will get the job done.

function getRelativeLinks($data)
$retVal = array();
preg_match_all("%<a href=(['\"]+?[^>]*['\"]+)%",$data,$args);
if(!preg_match("#^('|\")?(javascript|http|www|ftp|mailto)#", $args[1][$i], $sub))
else continue;

return $retVal;

red penguin
08-25-2003, 01:42 AM
Thanks freddy...

Have been vacationing and not doing much else but I shall take this function and try to fit it into the existing project.

I will post results to this thread.

Back to the beach!