From time to time you need to check the access log of some of your servers right? Sometimes you want to find if or why something is wrong and sometimes you just want to gather some data (to understand if or why something is wrong).
Here are some neat tools and commands I use to check the access log files of my servers.
I love console based and pretty looking tools and for sure goaccess is one of those.
This tool allows you to see what is going on in your website giving you some stats and some information about your visitors. There you can see what are the most requested urls, the amount of visitors per day, the hours where you get more traffic, referrers, etc.
You can run it with
goaccess -f /var/log/nginx/access.log
And it basically looks like this:
However it also provides an HTML interface so you can periodically generate a static HTML page to check the stats in an more graphical format.
The name says it all: it is a top command for apache. However, as it requires an access.log file as input, you can also use it with nginx as long as you keep the common log format.
One of the features that I like about this command is that it will show the current rate of requests per second, averaging both since the command started and in an specified interval (default 30s). It will also show the most requested urls and other stats.
You can run it with
apachetop -f /var/log/nginx/access.log
And it looks like this:
Yes... grep and friends help me a lot to find useful information and stats.
Let me show you some examples.
List the HTTP return codes in all access logs (compressed too)
$ zgrep -Poh '" \d{3} ' access.log* | sort | uniq -c | sort -rn 196071 " 200 30397 " 302 3928 " 404 590 " 301 339 " 304 48 " 502 38 " 470 32 " 499 27 " 504 24 " 500 18 " 400 9 " 418 4 " 405 2 " 206 1 " 403
List the number of requests per day in November
$ zgrep -Poh '\d{2}/Nov/2019' access.log* | sort | uniq -c [ ... ] 1601 21/Nov/2019 1603 22/Nov/2019 1552 23/Nov/2019 1517 24/Nov/2019 1601 25/Nov/2019 1545 26/Nov/2019 1624 27/Nov/2019 1615 28/Nov/2019 1542 29/Nov/2019 1527 30/Nov/2019
List the number of requests per hour December 3rd
$ zgrep -Poh '03/Dec/2019:\d{2}' access.log* | cut -d' ' -f1 | sort | uniq -c 63 03/Dec/2019:00 66 03/Dec/2019:01 61 03/Dec/2019:02 64 03/Dec/2019:03 62 03/Dec/2019:04 65 03/Dec/2019:05 61 03/Dec/2019:06 60 03/Dec/2019:07 73 03/Dec/2019:08 73 03/Dec/2019:09 73 03/Dec/2019:10 64 03/Dec/2019:11 164 03/Dec/2019:12 137 03/Dec/2019:13 135 03/Dec/2019:14 93 03/Dec/2019:15 103 03/Dec/2019:16 195 03/Dec/2019:17 145 03/Dec/2019:18 139 03/Dec/2019:19 94 03/Dec/2019:20 114 03/Dec/2019:21 109 03/Dec/2019:22 106 03/Dec/2019:23
List the IPs that made more HEAD requets on December 3rd
$ zgrep -Poh '.* \[03/Dec/2019.*\] "HEAD ' access.log* | cut -d' ' -f1 | sort | uniq -c | sort -rn 4 54.209.251.246 3 35.247.113.5 3 217.182.175.162 2 3.82.218.185 2 35.230.122.175 2 34.83.11.15 1 96.28.180.117 1 95.216.13.24 1 94.130.53.35 1 91.121.79.122 1 69.113.35.243 1 51.91.13.105
I know what you are thinking: "This guy has no traffic at all" ... and you are right. This server has almost not traffic.
Jokes aside, I hope you can see that you can make different combinations of the above commands, or change some parameters to obtain the information you are looking for in a very easy way.
BONUS
ansible -o -m shell -a "zgrep -h '\"GET /public/awesome_85.03.img' /var/log/nginx/public.access.log* | egrep 'ROUTERCOM' | grep '\" 200' | egrep '/Nov/2019' | awk '{ print \$1\" \"\$9 }' | sort | uniq | wc -l" 'frontends:!TRASH' | awk '{sum += $8} END {print sum}' 217
This is based in a real case. It looks in all the frontends servers defined in the Ansible inventory (that are not in the TRASH group) for successfull downloads (code 200) using the user agent "ROUTERCOM" during November 2019 of the firmware awesome_85.03.img and then gets the output from ansible and sums all the values, returning single number of downloads: 217.