Converting Relative Links to Absolute

I was recently given the task of adding a new API call to syndicate some articles from our site, articles that contain links to other content on our site. The question when writing site content is whether to make intra-site links absolute or not. If you do (e.g. http://foo.org/other-article) you’re tied to a specific domain name; if you use relative links (e.g. /other-article) they’ll break when you syndicate your content. My company chose the latter path anyway, so rather than strip all links from the text, I decided to make the article links absolute.

Note that this doesn’t handle <img> element src attributes. To do that, just find a regex that matches and repeat the code.

function absolutify($string) {

  $base_url = 'http://foo.org';

  preg_match_all('/<a.*?href\s*=\s*["\']([^"\']+)[^>]*>(.*?)<\/a>/si', $string, $hrefs);
  foreach ($hrefs[1] as $href) {

    // If the link is blank or absolute, nothing to do here
    if (!$href || substr($href, 0, 7) == 'http://') {
      continue;
    }

    // Create the absolute URL. We account for paths that start with a slash and ones that don't.
    $absolute_href = substr($href, 0, 1) == '/' ? $base_url . $href : $base_url . '/' . $href;

    // Replace the relative URL in the content with the absolute URL.
    // We use preg_replace instead of str_replace because it allows us to limit it to 1 result.
    // This is important so you don't get a link "/foo" followed by another link "/foo/bar".
    // The standard replace will replace both of them instead of the exact match. We also
    // wrap it in quotes so we don't get a problem with repeated URLs. For example: the first
    // /foo becomes http://www.foo.org/foo, then the second one finds the /foo in the already-absolutified
    // first instance and just replaces that instead of finding its own second occurrence.
    $string = preg_replace("|\"$href\"|", '"' . $absolute_href . '"', $string, 1);

  }

  return $string;
}