What is slowing down my ssh login process

On one box, I had this strange problem. Every login could take 40-60 seconds, but when first in, everything worked as a charm. As I use ssh for login, I checked the obvious; that reverse DNS lookup did not time out (sshd_config: UseDNS no), and that unnecessary gssapi was not used, but to no avail. So I fetched out old uncle strace from the drawer, and was to run sshd in debug mode, on the console. Then I realized that login on the console took at least as long as via ssh.

So, the problem had to be somewhere else, probably som pam module. strace to the rescue

# strace -e file -ff /usr/sbin/sshd -D -e -ddd -p 2122

and logged in via ssh on port 2122

There it was. An old /var/log/btmp had grown and grown and grown, and login, via pam_lastlog.so (in fedora called in from session), scans it to check for previous logins, using a lot of cpu, io and time in the process.

But why had the file grown so large? Because the btmp log saves failed logins, and this box (by design) had an open ssh to the world, and was often hit by scanners. But also because of missing log rotation. /etc/logrotate.conf on fedora actually rotates /var/log/btmp once a month, but to save space, someone had gzipped the last rotation (again, because of size). And by some strange reasoning (bug?), logrotate on fedora won’t rotate /var/log/btmp at all, if there exists some /var/log/btmp-20140606.gz (unless compress is switched on, which it by default, is not).

Leave a Reply