Basic Traffic Analysis with Unix

Written on 2019-04-15

So you want to know how many people visit your website, but don't want to set up Google Analytics or anything like that? That at least was the situation I found myself in when I started this blog. Turns out, if you just want a daily number of visitors, standard Unix tools are perfectly sufficient.

The pipeline

The solution I found assumes that you have root access to your webserver. Basically, all it does is read in the Apache logs and count how many IP addresses accessed the website in a given day. This is how you do it:

echo "$(date +"%F") \
    $(grep "terranostra.one" /var/log/apache2/access.log \
        | grep "$(date +"%d")/" \
        | cut -f 1 -d " " \
        | sort \
        | uniq \
        | wc -l)" \
    >> terranostra_stats.log

Clear as mud? Let's look at the components:

  • echo "$(date +"%F") $(<expr>) >> terranostra_stats.log The top-level expression writes the date and the number of visitors to file.

  • grep "terranostra.one" /var/log/apache2/access.log | grep "$(date +"%d")/" Find all lines in the Apache log that refer to this domain, then filter these by today's date.

  • cut -f 1 -d " " | sort Take the first field of each log entry, which is the IP address, and sort the results.

  • uniq | wc -l Get rid of all duplicate entries and count the remainder.

(If you don't know one of these commands or their arguments, man is your friend…)

Getting it to run

If you like, you can of course run this command manually, but that soon stops being fun. Instead, you can let the system do it for you using cron. Stick this in your root crontab (using sudo crontab -e):

59 23 * * * /path/to/site_traffic.sh

This line specifies that the bash script site_traffic.sh (which contains the command explained above) is to be executed daily at 23:59, i.e. one minute before midnight. Make sure your script is only modifiable by root! (It will be executed with root privileges, so you don't want just anybody to be able to change it. Use sudo chmod 755 /path/to/site_traffic.sh, for example.) In the end, this is what your output file will look like:

2019-03-09 16
2019-03-10 13
2019-03-11 27
2019-03-12 17
2019-03-13 17
2019-03-14 12
2019-03-15 14
2019-03-16 15
2019-03-17 16
2019-03-18 19

Have fun!

Tagged as computers, tutorials, terranostra


Unless otherwise credited all material Creative Commons License by Daniel Vedder.
Subscribe with RSS or Atom. Powered by c()λeslaw.