How to retrieve remote files in your web apps and still be friends with the server

|

It often happens that when you’re building a web page or app, you may want to include some content from a remote server. Say that it’s some statistic figure that the remote outputs as HTML or TXT and you then want to retrieve it and either do something with it or directly display as part of your own page. And you’re working in PHP.

PHP provides a fancy way of opening and including files directly over HTTP, which they call “URL wrappers”. So you might be tempted to do something like

include("http://www.example.com/index.html");

or

$f = fopen("http://www.example.com/index.txt");

As tempting as it may seem, in the long run doing remote opens with URL wrappers is not the best practice. First, many sites disallow URL wrappers altogether for security reasons (and rightly so). Second, as your site increases in popularity (and hopefully it will, yes, why else are you building it?), you’ll notice these requests suck up unnecessary bandwidth and increase page load times. So you’re doing a whole lot of wasting — most probably, the file content doesn’t change in realtime and having a delay of few seconds or minutes until the file contents refreshes would be fine for the user and save you a whole lot of HTTP (FTP, …) calls.

So here’s what I came up with when needing to do this kind of caching thing in my own scenarios. It requires you have the cURL module installed and that the webserver can read and write from /tmp. The usual security caveats apply — when working in a shared environment, you may want to change the directory to another location, and make sure you validate the input, etc. It works for me, but may eat your cat and suck up your credit. Use at your own risk.

Just copypaste this function into your own code and call it like echo get_remote_file(“http://www.example.com/index.txt”) (or capture it in a variable instead of echo or whatever).

# url - URL to retrieve
# timeout - seconds between each refresh, default 300, or 5 minutes
function get_remote_file($url, $timeout = 300) {
    # on Windows, either change the dir name or make /tmp
    $cached_name = '/tmp/'.sha1($url).'cachedwebfile';
    if (!file_exists($cached_name) or
    ((time() - filemtime($cached_name)) > $timeout)) {
        $f = fopen($cached_name, "wb");
        flock($f, LOCK_EX);
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_HEADER, 0);
/* Im not sure if you can stream from cURL to both stdout/variable and file
at the same time? doing output buffering here (capture the stream and
return it and write to file) would save us one open/read from file
(currently, when timed out, the file is both written to and then read from) */
        curl_setopt($ch, CURLOPT_FILE, $f);
        curl_exec($ch);
        curl_close($ch);
        fclose($f);
    }
    $f  = fopen($cached_name, "rb");
    flock($f, LOCK_SH);
    $retval = fread($f, filesize($cached_name));
    fclose($f);
    return $retval;
}

2 Comments

Fatal error: Call to undefined function: sha1() in /…/index.php on line 97

PHP manual says the version requirements for SHA1: (PHP 4 >= 4.3.0, PHP 5). If you’re on an older version of PHP, you may need to upgrade.

Leave a comment