Jul 15 2010

Regex for URL grabbing from HTML content

Category: Code @ 18:22
Something I had to work on recently, which I wont go into, but the project had to grab HREF properties from A tags inside a mass of HTML, but with a 100% success rate no matter how poorly structured the HTML was.  I think this pretty much covers it, except for one small case..  If you were to not use quotes on your href, and you made up your own properties for the A tag, AND you had actual spaces in the href, it may end up missing the end of the URL off.  Otherwise, bullet-proof.  "href+ ?=+ ?(?:(?:(?:""|')(.+?)(?:""|'))|(.+?)(?: class ?=| onclick ?=| id ?=| accesskey ?=| dir ?=| ltr ?=| lang ?=| style ?=| tabindex ?=| title ?=| onblur ?=| ondblclick ?=| onfocus ?=| onmousedown ?=| onmousemove ?=| onmouseout ?=| onmouseover ?=| onmouseup ?=| onkeydown ?=| onkeypress ?=| onkeyup ?=|>))"

Tags: , ,