Friday, June 6, 2008

Blocking Advertisements with a Hosts file, Apache and PHP

The Hosts file is located at /etc/hosts on Linux and %SystemRoot%\system32\drivers\etc\ on Windows XP and Vista. It maps host names to IP addresses and takes precedence over the DNS server. So if you add an entry in your hosts file:

207.68.172.246 google.com 
Then every time you type google.com you will be taken to msn.com instead, since 207.68.172.246 is the IP address of msn.com.

Knowing this, you can point any domain, to an IP address of choice using the Hosts file. Therefore, we can use it to block any domains that hosts unwanted advertising or malware.

Modifying your Hosts File to block Advertisements and Malware

There are many sites offering host files which block advertisments and malware. I use the one on http://www.mvps.org/winhelp2002/hosts.htm.
Here is the txt version of the hosts file: http://www.mvps.org/winhelp2002/hosts.txt

Here is an example of what the entries look like, there the list contains a lot more, about 18, 000 entries at this time.

# [Misc A - Z]
127.0.0.1  ad.a8.net
127.0.0.1  asy.a8ww.net
127.0.0.1  www.abx4.com #[Adware.ABXToolbar]
127.0.0.1  acezip.net #[SiteAdvisor.acezip.net]
127.0.0.1  www.acezip.net #[Win32/Adware.180Solutions]
127.0.0.1  phpadsnew.abac.com
127.0.0.1  a.abnad.net
127.0.0.1  b.abnad.net
127.0.0.1  c.abnad.net #[eTrust.Tracking.Cookie]
127.0.0.1  d.abnad.net
127.0.0.1  e.abnad.net
127.0.0.1  t.abnad.net
127.0.0.1  banners.absolpublisher.com
127.0.0.1  tracking.absolstats.com
127.0.0.1  adv.abv.bg
127.0.0.1  bimg.abv.bg
127.0.0.1  www2.a-counter.kiev.ua
127.0.0.1  accuserveadsystem.com
127.0.0.1  www.accuserveadsystem.com
127.0.0.1  gtb5.acecounter.com
127.0.0.1  gtcc1.acecounter.com
127.0.0.1  gtp1.acecounter.com #[eTrust.Tracking.Cookie]
127.0.0.1  acestats.com
127.0.0.1  www.acestats.com
127.0.0.1  achmedia.com
127.0.0.1  ads.active.com
127.0.0.1  am1.activemeter.com
127.0.0.1  www.activemeter.com #[eTrust.Tracking.Cookie]
127.0.0.1  ads.activepower.net
127.0.0.1  stat.active24stats.nl #[eTrust.Tracking.Cookie]
127.0.0.1  web.acumenpi.com #[AdvertPro]
127.0.0.1  ad.ad24.ru
127.0.0.1  at.ad2click.nl
127.0.0.1  cms.ad2click.nl
127.0.0.1  banner.ad.nu
127.0.0.1  ad-up.com
127.0.0.1  www.ad-up.com
You will need to download the txt file and append the entries to your hosts file.

Now once the hosts file is in effect, when you browse any website in firefox or IE or any other browser, 99% of the advertisements will not be displayed.

Setting up Apache to display a custom page or message for blocked Advertisements and Malware

Each entry in the hosts file blocks unwanted sites by resolving their domain name to 127.0.0.1 which is the IP reserved for looping back to your own IP. So all the requests for advertising sites will instead be made back to your IP. The problem with this is because there is no website on your localhost, then the browser will display an error in place of the ads.

If you're a web developer, you'll likely have a version of Apache or some other HTTP server running on your localhost. So you'll likely get a 404 error in place of the ads. You can resolve this by adding a virtual host entry into your httpd.conf file that will display a custom page instead of the 404.

To resolve this you can set up a virtual host to catch all requests made to your Apache server, for the blocked hosts. Assuming you always access your local server via the URL http://localhost/ then you probably don't need the other host possibilities on 127.0.0.1. So your virtual host could look something like:

<VirtualHost 127.0.0.1>
ServerAdmin webmaster@adblock
DocumentRoot /var/www/adblock/
ErrorDocument 404 /404.html
ErrorLog /etc/log/adblock/error.log
TransferLog /etc/log/adblock/access.log
</VirtualHost>
This will catch all requests made to 127.0.0.1. The requests will most likely have a path that doesn't exist in your file structure in /var/www/adblock/ so it will generate a 404 error. You therefore need a custom 404 document which is defined in ErrorDocument 404 /404.html. This can have the simple line, "ad or malware blocked" or something on those lines.

Now localhost also resolves to 127.0.0.1 so you will need to make sure you have a virtual host for the host localhost.

The other thing you could do instead of setting up a virtual host, and it may be simpler, is create a custom 404 document for your current setup. You can do this via a directive directly in httpd.conf like: ErrorDocument 404 /404.php. Notice that it is a PHP so you can use some PHP code to customize the error message. What you'll want is to have the PHP detect if the request was for a blocked site, and if so show your message: "site blocked", but show the regular 404 page for your actual website.

How you detect if the request is from a one of the blocked hosts is by comparing the requested host with the list of hosts in your hosts file that are blocked. The host requested is in the $_SERVER['SERVER_NAME'] variable. Since the list of blocked hosts is large and you probably do not want to read all of those with your php script each time an advertisement is blocked, you can apply the reverse comparison - if the requested host is not in the list of your valid hosts, then it is a blocked host. An example:

// our valid hosts
$valid_hosts = array('localhost', 'my.host.joe', 'my.other.host.peter');
// check if the requested host is a valid one
if (!in_array($_SERVER['SERVER_NAME'], $valid_hosts)) {
    echo 'ad or malware blocked'; // display message in place of blocked ad
} else {
    include('/404.html'); // display regular 404 page
}
Now, when you visit those websites with pesky advertisements and popups, you get a neat little line saying "ad or malware blocked" in the place of those ads.

No comments: