From time to time you need to check the access log of some of your servers right? Sometimes you want to find if or why something is wrong and sometimes you just want to gather some data (to understand if or why something is wrong).
Here are some neat tools and commands I use to check the access log files of my servers.
I love console based and pretty looking tools and for sure goaccess is one of those.
This tool allows you to see what is going on in your website giving you some stats and some information about your visitors. There you can see what are the most requested urls, the amount of visitors per day, the hours where you get more traffic, referrers, etc.
You can run it with
goaccess -f /var/log/nginx/access.log
And it basically looks like this:
However it also provides an HTML interface so you can periodically generate a static HTML page to check the stats in an more graphical format.
The name says it all: it is a top command for apache. However, as it requires an access.log file as input, you can also use it with nginx as long as you keep the common log format.
One of the features that I like about this command is that it will show the current rate of requests per second, averaging both since the command started and in an specified interval (default 30s). It will also show the most requested urls and other stats.
You can run it with
apachetop -f /var/log/nginx/access.log
And it looks like this:
Yes... grep and friends help me a lot to find useful information and stats.
Let me show you some examples.
List the HTTP return codes in all access logs (compressed too)
$ zgrep -Poh '" \d{3} ' access.log* | sort | uniq -c | sort -rn
196071 " 200
30397 " 302
3928 " 404
590 " 301
339 " 304
48 " 502
38 " 470
32 " 499
27 " 504
24 " 500
18 " 400
9 " 418
4 " 405
2 " 206
1 " 403
List the number of requests per day in November
$ zgrep -Poh '\d{2}/Nov/2019' access.log* | sort | uniq -c
[ ... ]
1601 21/Nov/2019
1603 22/Nov/2019
1552 23/Nov/2019
1517 24/Nov/2019
1601 25/Nov/2019
1545 26/Nov/2019
1624 27/Nov/2019
1615 28/Nov/2019
1542 29/Nov/2019
1527 30/Nov/2019
List the number of requests per hour December 3rd
$ zgrep -Poh '03/Dec/2019:\d{2}' access.log* | cut -d' ' -f1 | sort | uniq -c
63 03/Dec/2019:00
66 03/Dec/2019:01
61 03/Dec/2019:02
64 03/Dec/2019:03
62 03/Dec/2019:04
65 03/Dec/2019:05
61 03/Dec/2019:06
60 03/Dec/2019:07
73 03/Dec/2019:08
73 03/Dec/2019:09
73 03/Dec/2019:10
64 03/Dec/2019:11
164 03/Dec/2019:12
137 03/Dec/2019:13
135 03/Dec/2019:14
93 03/Dec/2019:15
103 03/Dec/2019:16
195 03/Dec/2019:17
145 03/Dec/2019:18
139 03/Dec/2019:19
94 03/Dec/2019:20
114 03/Dec/2019:21
109 03/Dec/2019:22
106 03/Dec/2019:23
List the IPs that made more HEAD requets on December 3rd
$ zgrep -Poh '.* \[03/Dec/2019.*\] "HEAD ' access.log* | cut -d' ' -f1 | sort | uniq -c | sort -rn 4 54.209.251.246 3 35.247.113.5 3 217.182.175.162 2 3.82.218.185 2 35.230.122.175 2 34.83.11.15 1 96.28.180.117 1 95.216.13.24 1 94.130.53.35 1 91.121.79.122 1 69.113.35.243 1 51.91.13.105
I know what you are thinking: "This guy has no traffic at all" ... and you are right. This server has almost not traffic.
Jokes aside, I hope you can see that you can make different combinations of the above commands, or change some parameters to obtain the information you are looking for in a very easy way.
BONUS
ansible -o -m shell -a "zgrep -h '\"GET /public/awesome_85.03.img' /var/log/nginx/public.access.log* | egrep 'ROUTERCOM' | grep '\" 200' | egrep '/Nov/2019' | awk '{ print \$1\" \"\$9 }' | sort | uniq | wc -l" 'frontends:!TRASH' | awk '{sum += $8} END {print sum}'
217
This is based in a real case. It looks in all the frontends servers defined in the Ansible inventory (that are not in the TRASH group) for successfull downloads (code 200) using the user agent "ROUTERCOM" during November 2019 of the firmware awesome_85.03.img and then gets the output from ansible and sums all the values, returning single number of downloads: 217.