unix – Sheep Guarding Llama

Testing Socket Connections Programmatically

Scott Alan Miller — Tue, 12 Jan 2010 05:57:40 +0000

Often we have to use “telnet remotehost.somewhere.com 80” to test if a remote socket connection can be established. This is fine for one time tests but can be a problem when it comes time to test a number of connections – especially if we want to test them programmatically from a script. Perl to the rescue:

#!/usr/bin/perl
use IO::Socket;

my $sock = new IO::Socket::INET (
                   PeerAddr => $ARGV[0],
                   PeerPort => $ARGV[1],
                   Proto => tcp,
);

if ($sock) { print "Success!\n" } else { print "Failure!\n" };
close($sock);

Just copy this code into a file called “sockettest.pl” and “chmod 755 sockettest.pl” so that it is executable and you are ready to go. (This presumes that you are using UNIX. As the script is Perl, it should work anywhere.)
To use the code to test, for example, a website on port 80 or an SSH connection on port 22 just try these:

./sockettest.pl www.yahoo.com 80
./sockettest.pl myserver 22

You aren’t limited to known services. You can test any socket that you want. Very handy. Now, if you have a bunch of servers, you could test them from a simple, one line BASH command like so (broken so as not to be one line here for ease of reading…)

for i in myserver1 myserver2 yourserver1 yourserver2 someoneelsesserver1
do
  echo $i $(./sockettest.pl "$i" 80)
done

Linux Active Directory Integration with LikeWise Open

Scott Alan Miller — Sun, 01 Mar 2009 23:27:12 +0000

I downloaded the latest RPM package (for Red Hat, Suse, CentOS and Fedora) from the LikeWise web site (you need to register before starting your download.) I downloaded the RPM package to the /tmp directory. The version that I am testing is the Winter 2009 Edition.

Warning: LikeWise modifies many configuration files and its uninstall routine does not replace these. Installing LikeWise and then uninstalling again will likely cause you to lose the ability to log back in to your machine. Treat modifying authentication systems with the utmost care.

The RPM download still uses a script so you will need to add execute permissions.

chmod a+x LikewiseIdentityServiceOpen-5.1.0.5220-linux-x86_64-rpm.sh

./LikewiseIdentityServiceOpen-5.1.0.5220-linux-x86_64-rpm.sh

The package steps you through the installation program. You will need to accept the license as there are actually several packages, covered under various licenses, that need to be installed to support LikeWise. If you are installing on an AMD64 platform then you will be questioned as to whether or not you want to install 32-bit support libraries. Unless you really know what you need just select the “auto” option. After that, the installation will take care of itself.

If you use SELinux like you should, you will need to turn this off during the configuration.

setenforce Permissive

Then we can join the Linux machine to the Active Directory domain.

/opt/likewise/bin/domainjoin-cli join exampledomain.com domainadminuser

At this point basic authentication is already working. You will need to make some changes to your setup if you have existing accounts as well, but we can address that later.

Test your login:

ssh -l exampledomain\\username linuxhostname

Once you are all set do not forget to turn SELinux back on.

setenforce Enforcing

The big caveat with using LikeWise Open for your Unix to AD integration needs is that there is no Windows to UNIX GID/UID mapping so your UNIX (Linux, Solaris, Mac OSX, etc.) machines are stuck using Windows IDs. This is not necessarily the end of the world depending on your environmental needs but it can be quite a pain if you are introducing AD into a large, established Unix environment. LikeWise Enterprise does not suffer from this limitation, but it is obviously not free.

WordPress on Red Hat / CentOS Linux

Scott Alan Miller — Thu, 26 Feb 2009 00:17:36 +0000

If you run WordPress on Red Hat Enterprise Linux (RHEL) or its free cousin CentOS then you will likely run into the following error after you have unpacked WordPress, installed it and tried to do your initial setup:

Error establishing a database connection

This either means that the username and password information in your wp-config.php file is incorrect or we can’t contact the database server at databasename. This could mean your host’s database server is down.

Are you sure you have the correct username and password?
Are you sure that you have typed the correct hostname?
Are you sure that the database server is running?

If you’re unsure what these terms mean you should probably contact your host. If you still need help you can always visit the WordPress Support Forums.

You are not alone, this happens to everyone. If you do some searching on this you will find that pretty much no one has an answer for what is wrong. People running MySQL server locally already know the trick necessary to fix this problem but if you are running MySQL remotely, as I am, then you can be easily mislead into thinking that the fix does not apply to you, but it does.

The issue here, surprisingly, is that SELinux is enabled on the web server and is keeping the MySQL library from communicating with the MySQL server whether local or remote. Simply set SELinux to Permissive rather than Enforcing and voila, you should be working well.

The command to set SELinux to Permissive mode is:

setenforce 0

You can verify that the mode has changed correctly with:

getenforce

It is important to note that this SELinux issue (bug, I am told) does NOT affect the MySQL client but does affect PHP. So if you are testing your database connection with “mysql” and it works but WordPress throws the error than you are a prime candidate for this problem.

Also, be sure that PHP has the MySQL module installed:

yum install php-mysql

I have seen this issue on several versions of all of the software components but specifically just dealt with it in CentOS 5.2 with PHP 5.1.6 and WordPress 2.7.1.

How To – Easy NTP on Solaris 10

Scott Alan Miller — Fri, 20 Feb 2009 20:44:47 +0000

Setting up NTP (the Network Time Protocol) on Solaris 10 is very simple but requires a few less than obvious steps that can trip up someone looking to set up a basic NTP daemon to sync their local machine.

The first step is to install the NTP packages SUNWntpr and SUNWntpu, both of which are available from the first CD of the Solaris 10 installation CDs. These packages are located, along with the others, are located in /mnt/cdrom/Solaris_10/Product/ assuming that you mounted your Solaris 10 CD 1 or its ISO image to /mnt/cdrom, of course. Personally, I keep an ISO copy of this CD available on the network for easy access to these packages although they could very easily be copied off into a package directory. Depends on the number of machines which you need to maintain.

Go ahead and install the two packages. This can be done easily by moving into the Product directory and using the “pkgadd -d .” command and selecting the two packages from the menu. There are no options to worry about with these packages so just install and then we are ready to configure.

The “gotcha” with NTP on Solaris is that there is no default configuration to get you up and running automatically and most online information about the installation either leaves out this portion or supplies details unlikely to be used under common scenarios.

Solaris’ NTP comes with two sample configuration files, /etc/inet/ntp.client and /etc/inet/ntp.server. Confusingly, for the most basic use we are going to want to work from the ntp.server sample file rather than from the ntp.client sample file. NTP uses /etc/inet/ntp.conf as its actual configuration file and, as you will notice, after a default installation this file does not exist. So we start by making a copy of ntp.server.

# cp /etc/inet/ntp.server /etc/inet/ntp.conf

Now we can make our changes to the new configuration file that we have just created. I will ignore any of the commented lines here and only publish those lines actually being used by my configuration. In this case I have gone with the most simple scenario which includes using an external clock source and ignoring my local clock. In a production machine you should set up the local clock as a fallback device.

For my example here, I am syncing NTP on Solaris 10 to the same machine pool to which my CentOS Linux machines get their time, the CentOS pool at ntp.org. You should replace the NTP server names in this sample configuration with the names of the NTP servers in the pool which you will use.

server 0.centos.pool.ntp.org server 1.centos.pool.ntp.org server 2.centos.pool.ntp.org server 3.centos.pool.ntp.org broadcast 224.0.1.1 ttl 4 enable auth monitor driftfile /var/ntp/ntp.drift statsdir /var/ntp/ntpstats/ filegen peerstats file peerstats type day enable filegen loopstats file loopstats type day enable filegen clockstats file clockstats type day enable keys /etc/inet/ntp.keys trustedkey 0 requestkey 0 controlkey 0

This very standard and simple setup provides you with four servers from which to obtain NTP data and also rebroadcasts this data on the local network via multicast using the NTP standard multicast address of 224.0.1.1. Feel free to remove or comment out the broadcast line if you have no desire to have any machines locally getting their NTP data from this machine. The ease of which you can republish NTP locally via multicast is just too simple to pass up.

Now that we have a working configuration file, we need to fire up NTP and let it sync up with our chosen servers. The best practice here is to use the ntpdate command a few times to get the box date and time as close as reasonable to accurate before turning NTP loose to do its thing. The NTP daemon is designed to slowly adjust the clock whereas ntpdate will set it correctly immediately so this gets the initial time correct right away.

# ntpdate pool.ntp.org; ntpdate pool.ntp.org

# svcadm enable ntp

At this point, the NTP Daemon should be running and your time should be extremely accurate. You can verify that NTP is running by looking in the process pool for /usr/lib/inet/xntpd which is the actual name of the NTP Daemon running on Solaris 10.

Installing Subversion on RHEL5

Scott Alan Miller — Sun, 16 Nov 2008 00:57:20 +0000

Subversion (SVN) is a popular open source, source code change control package. Today we are going to install and configure Subversion on Red Hat Enterprise Linux 5.2 (a.k.a. RHEL 5.2), I will actually be doing my testing on CentOS 5.2 but the process should be completely identical.

Installing Subversion on Linux

Installation of subversion is very simple if you are using yum. In addition to Subversion itself, you will also want to install Apache as you will most likely want to access Subversion through a WebDAV interface. You can simple run:

yum -y install subversion httpd mod_dav_svn

Once Subversion is successfully installed we need to create the initial repository. This can be done on the local file system but I prefer to keep high priority and highly volatile data stored directly on the NAS filer as this is far more appropriate for this type of data.

As an aside, I like to keep low volatility data (say, website HTML) stored on local discs in general for performance reasons and since backups are not as difficult to take using traditional backups methods (e.g. tar, cpio, Amanda, Bacula, etc.) High volatility files I prefer to be on dedicated network storage units where backups can be easily taken using more advanced methods like Solaris 10’s ZFS snapshot capability. It is not always clear when data makes sense to keep locally or to store remotely but I feel that you can gauge a lot of the decision on two factors: frequency of data changes – that is changes to existing files not the addition of new files necessarily and the level to which the data is the focus of the storage – that is if the data is incidental or key to the application. In the case of Subversion the entire application is nothing but a complex filesystem frontend so we are clearly on the side of “data focused” application.

I started writing this article on RHEL4 on a system with a small, local file system. When I returned to the uncompleted article and continued with it I was implementing this on a RHEL5 system with massive local storage and decided to keep my Subversion repository local on a dedicated logical volume for easy Linux based snapshots.

Subversion has two optional backend storage solutions. The original method of storing Subversion data was with the venerable Berkley DataBase, known as BDB, which is now a product of Oracle. The newer method, and the method which has been the default choice since Subversion 1.2, is FSFS (I don’t know exactly for what its initials stand) which uses the native filesytem mechanisms for storage. In my example here and for my own use I choose FSFS as I think it is more often the better choice. Of most important note, FSFS supports remote filesystems over NFS and CIFS while BSB does not. FSFS is also easier to deal with when it comes to creating backups. My feeling is that unless you really know why you want to use BDB, stick with the default FSFS, there is a reason that it was selected as the default.

Another note about creating Subversion repositories: some sources recommend putting Subversion repos under /opt. All I have to say is “No No No!” The /opt filesystem is not appropriate for regularly changing data. Any data that is expected to change on a regular basis (e.g. log files, source code repos, etc.) belongs in /var. This is the entire purpose of the /var filesystem. It stands for “variable” and is purposed for regular filesystem changes. Files going to /var is another indicator that external network filesystem may be appropriate as well.

mkdir -p /var/projects/svn

At this point you can either use /var/svn as a normal or mounted remotely in some manner such as NFS, CIFS or iSCSI. Regardless of how the repository is set up the rest of this document will function identically.

We are now in a position to use svnadmin to create our repository directory:

svnadmin create /var/projects/svn/

At this point, Subversion should already be working for you. If you are new to Subversion, we will do a simple import to test our installation. To perform this test, create a directory called “testproject” and put it in the /tmp directory. Now touch a couple of files inside that directory so that we have something with which to work. Then we will do our first Subversion import.

mkdir /tmp/testproject; cd /tmp/testproject; touch test1 test2 test3

svn import tmp/testproject/ file:///var/projects/svn/test -m “First Import”

Your Subversion installation is now working, but few people will be happy accessing their Subversion repositories only from the local machine as we have done here. If you are used to working from the UNIX (Linux, Mac OSX, Cygwin, etc.) command line you may want to try accessing your new Subversion repository using SVN+SSH. Here is an example taken from an OpenSUSE workstation with the Subversion client installed:

svn list svn+ssh://myserver/var/projects/svn
testproject/

At this point you now have access from your external machines and can perform a checkout to get a working copy of your code. To make the process really simple be sure to set up your OpenSSH keys so that you are not prompted for a password. For many users, most notably Windows users, you are going to want access over the HTTP protocol since Windows does not natively support the SSH protocol.

The first thing that you are going to need to do, if you are running SELinux and Firewall security on your RHEL server like I am, is to open ports 80 and 443 in your firewall so that Apache is enabled. Normally I shy away from management tools but this one I like. Just use “system-config-securitylevel-tui” and select the appropriate services to allow.

You will also need to allow the Apache web server to write to the Subversion repository location within SELinux. To do so we can use the command:

restorecon -R /var/projects/svn/

We have one little trick that we need to perform. This trick is necessary because of what appears to be a bug in the way that Subversion sets the user ID when it runs. This is not necessary for all users but can be a pretty tough sticking point for anyone who runs into it and is not aware of what can me done to remedy the situation.

cp -r /root/.subversion/* ~apache/.subversion/

Configuring Apache 2 on Red Hat 5 is a little tricky so we will walk through it together. The first thing that needs to be added is the LoadModule line for the WebDAV protocol. This goes into the LoadModule section of the mail /etc/httpd/conf/httpd.conf configuration file.

LoadModule dav_module         modules/mod_dav.so

The rest of our configuration changes for Apache 2 will go into a dedicated configuration file just for our subversion repository: /etc/httpd/conf.d/subversion.conf

I am including here my entire configuration file sans comments. You will need to modify your SVNPath variable accordingly, of course.

# grep -v \# /etc/httpd/conf.d/subversion.conf

LoadModule dav_svn_module       modules/mod_dav_svn.so

  DAV svn
  SVNPath /var/projects/svn/

At this stage you should now not only have a working Subversion repository but should be able to access it via the web. You can test web access from you local box with the svn command. Here is an example:

svn list http://localhost/svn/

References:

Mason, Mike. “Pragmatic Version Control Using Subversion, 2nd Edition“, Pragmatic Programmers, The. 2006.

Installing Subversion on Apache by Marc Grabanski

Subversion Setup on Red Hat by Paul Valentino

Setting Up Subversion and Trac As Virtual Hosts on Ubuntu Server, How To Forge

The SVN Book, RedBean

Additional Material:

Subversion Version Control: Using the Subversion Version Control System in Development Projects

Twitter from the Linux Command Line

Scott Alan Miller — Fri, 31 Oct 2008 14:52:47 +0000

Okay, so you are a crazy BASH or Korn shell nut (DASH, ASH, TCSH, CSH, ZSH, etc., etc. yes, I mean all of you) and you totally want to be able to Tweet on your Twitter feed without going to one of those krufty GUI utilities. Such overkill for such a simple task. I feel your pain. When I found this little nugget of command line coolness I just had to share it with all of you. Special thanks to Marco Kotrotsos from Incredicorp who published this on IBM Developer Works.

If you have curl installed, all you need to do is:

curl -u username:pass -d status=”text” http://twitter.com/statuses/update.xml

So, to give you a real world example, if you are “bobtheuser” and your password is “pass1234” and you want to say “Hey, my first UNIX Shell Tweet.” then you just need to:

curl -u bobtheuser:pass1234 -d status="Hey, my first UNIX Shell Tweet." \
http://twitter.com/statuses/update.xml

You will get some feedback in the form of a response XML file. Happy Tweeting!

Disclaimer: I realize that using “Linux” in the subject is misleading. This is not a Linux specific post but will apply to FreeBSD, OpenBSD, NetBSD, Mac OSX, UNIX, Solaris, AIX, Windows with Cygwin or just about any system with a command line and the curl utility installed.

I use this as the basis for my Ruby based Twitter Client for the command line.

Updating Zimbra on Linux

Scott Alan Miller — Sat, 13 Sep 2008 04:22:52 +0000

Having been a Zimbra Administrator for some time and having always worked on the Zimbra Open Source platform I have found that documentation on the update process has been very much lacking. The process is actually quite simple and straightforward under most circumstances but for someone without direct experience with the process it can be rather daunting.

My personal experience with Zimbra, this far, is running the 4.5.x series on CentOS 4 (RHEL 4). Using CentOS instead of actual Red Hat Enterprise Linux presents a few extra issues with the installer but have no fear, the process does work.

While this document is based on the Red Hat Enterprise Linux version of Zimbra, I expect that non-RPM based systems will behave similarly.

To upgrade an existing installation of Zimbra, first do a complete backup. I cannot state the importance of having a complete and completely up-to-date backup of your entire system. Zimbra is a massive package that is highly complex. You will want to be absolutely sure that you are backed up and prepared for disaster. If you use the open source version of Zimbra, as I do, that means taking Zimbra offline so that a backup can be performed. I won’t go into backup details here but LVM or virtual instances of your server will likely be your best friend for regular backups. Email systems can get very large very quickly.

Go to the Zimbra website and download the latest package for your platform. If you use CentOS, get your matching RHEL package. It will work fine for you. I find that the easiest way to move the package to your Zimbra server is with wget. Downloading to /tmp is fine as long as you have enough space.

Unpack your fresh Zimbra package. Zimbra downloads as a tarball (gzip’ed tar package) but contains little more than a handy installation script that automates RPM deployments. It is actually a very nice package.

tar -xzvf zimbra-package.tar.gz

You can cd into your newly unpacked directory and inside you will find that there is a script, install. Yes, the installation process is really that simple. If you are on most platforms you may simple run the install script. If you are on CentOS, rather than RHEL, you will need one extra parameter: –platform-override.

./install.sh –platform-override

Be prepared for this process to run for quite some time, by which, I mean easily an hour or more. Depending on the version of the platform that you are upgrading from and to you may find that this process can run for quite some time. Also, depending on the size of your mail store, that may impact the speediness of the process as well.

The installation script will fire off checking for currently installed instances of Zimbra, checking your platform for compatibility (be sure to check this manually if using the override option but CentOS users can rest assured that RHEL packages work perfectly for them), performing an integrity check on your database and checking prerequisite packages. Chances are that you will need to do something in order to prepare your system for the upgrade.

In my case, upgrading from 4.5.9 to 5.0.9, I needed to install the libtool-libs package.

yum install libtool-libs

While there are processes here that can certainly go wrong, the Zimbra upgrade process is very simple and straightforward. As long as you have good backups (make sure not to start Zimbra and receive new mail after having made you last backup) you should not be afraid to upgrade your Zimbra Open Source system.

You can also purchase a support contract from Yahoo/Zimbra so that you can move to the Network version of Zimbra and Zimbra support staff are happy to walk you through the process. Having someone there to make sure everything is okay is always nice.

References:

Linux Zimbra Upgrade HowTo from GeekZine

Linux’ kscand

Scott Alan Miller — Wed, 02 Apr 2008 21:54:40 +0000

In Linux the kscand process is a kernel process which is a key component of the virtual memory (vm) system. According to Unix Tech Tips & Tricks’ excellent Understanding Virtual Memory article “The kscand task periodically sweeps through all the pages in memory, taking note of the amount of time the page has been in memory since it was last accessed. If kscand finds that a page has been accessed since it last visited the page, it increments the page’s age counter; otherwise, it decrements that counter. If kscand finds a page with its age counter at zero, it moves the page to the inactive dirty state.”

For the majority of Linux users and even system administrators on large servers this kernel process requires no intervention. It is a simple process that works in the background doing its job well. Nonetheless, under certain circumstances it can become necessary to tune kscand in order to improve system performance in a desirable way.

Issues with kscand are most likely to arise in a situation where a Linux box has an extremely large amount of memory and will be even more noticeable on boxes with slower memory. The most notable is probably the HP Proliant DL585 G1 which can support 128GB or memory but in doing so drops bandwidth to a paltry 266MHz. I first came across this particular issue on a server with 32GB of memory with approximately 31.5GB of it in use. No swap space was being used and most of the memory was being used for cache so there was no strain on the memory system but the total amount of memory being scanned by the kscand process is where the issue truly lies.

Even on a busy server with gobs of memory (that’s the technical term) it would be extremely rare that kscand would cause any issues. It is a very light process that runs quite quickly. You are most likely to see kscand as a culprit when investigating problems with latency sensitive applications on memory intensive servers. The first time that I came across the need to tune kscand was while diagnosing a strange latency pattern of network traffic going to a high-performance messaging bus. The latency was minor but small spikes were causing concern in the very sensitive environment. kscand was spotted as the only questionable process receiving much system attention during the high latency periods.

Under normal conditions, that is default tuning, kscand will run every thirty seconds and will scan 100% of the system memory looking for memory pages that can be freed. This sweep is quick but can easily cause measurable system latency if you look carefully. Through carefull tuning we can reduce the latency caused by this process but we do so as a tradeoff with memory utilization efficiency. If you have a box with significantly extra memory or extremely static memory, such as large cache sizes that change very slowly, you can safely tune away from memory efficiency towards low latency with nominal pain and good results.

kscand is controlled by the proc filesystem with just the single setting of /proc/sys/vm/kscand_work_percent. Like any kernel setting this can be changed on the fly to a live system (be careful) or can be set to persist through reboots by adding it to your /etc/sysctl.conf file. Before we make any permanent changes we will want to do some testing. This kernel parameter tells kscand what percentage of the system memory to scan each time that a memory scan is performed. Since it is normally set to 100 kscand normally scans all in-use memory each time that it is called. You can verify you current setting quite easily.

cat /proc/sys/vm/kscand_work_percent

A good starting point with kscand_work_percent is to set to 50. A very small adjustment may not be noticeable so seeing 100 and then 50 should provide a good starting point for evaluating the changes in system performance. It is not recommended to set kscand_work_percent below 10 and I would be quite wary of dropping even below 20 unless you truly have a tremendous amount of unused memory and your usage is quite static.

echo 50 > /proc/sys/vm/kscand_work_percent

Once you have determined the best balance of latency and memory utilization that makes sense for your environment you can make you changes permanent. Be sure to only use the echo technique if this is the first time that this will be added to the file. You will need to edit it by hand after that.

echo "kscand_work_percent = 50" >> /etc/sysctl.conf

Keep in mind that the need to edit this particular kernel parameter is extremely uncommon and will need to be done only under extraordinary circumstances. You will not need to do this in normal, everyday Linux life and even a senior Linux administrator could easily never have need to modify this setting. On very specific conditions will cause this performance characteristic to be measurable or its modification to be desirable.

All of my testing was done on Red Hat Enterprise Linux 3 Update 6. This parameter is the same across many versions although the performance characteristics of kscand vary between kernel revisions so do not assume that the need to modify the parameters in one situation will mean that it is needed in another.

RHEL 3 prior to update 6 had a much less efficient kscand process and much greater benefit is likely to be found moving to a later 2.4 family kernel revision. RHEL 4 and later, on the 2.6 series kernels, is completely different and the latency issues are, I believe, less pronounced. In my own testing the same application on the same servers moving from RHEL 3 U6 to RHEL 4.5 removed all need for this tweak even under identical load. [Edit – In RHEL 4 and later (kernel series 2.6) the kscand process has been removed and replaced with kswapd and pdflush.]

Things that are likely to impact the behavior of kscand that you should consider include the following:

Total Used Memory Size, regardless of total available memory size. The more you have the more kscand will impact you. Determined by: free -m | grep Mem | awk '{print $3}'
Memory Latency, check with your memory hardware vendor. Higher latency will cause kscand to have a larger impact.
Memory Bandwidth. Currently in speeds ranging from 266MHz to 1066MHz. The slower the memory the more likely a scan will impact you and tuning will be useful.
Value in kscand_work_percent. The lower the value the lower the latency. The higher the value the better the memory utilization.
Memory Access Hops. Number of system bus hops necessary to access memory resources. For example a two socket AMD Opteron server (HP Proliant DL385) never has more than one hop. But a four socket AMD Opteron server (HPProliant DL585) can have two hops increasing effective memory latency. So a DL585 is more likely to be affected than a DL385 with all other factors being equal (as long as all three or four processor sockets are occupied.)

Linux Memory Monitoring

Scott Alan Miller — Tue, 12 Feb 2008 22:52:50 +0000

As a Linux System Administrator one question that I get asked quite often is to look into memory issues. The Linux memory model is rather more complex that many other systems and looking into memory utilization is not always as straightforward as one would hope, but this is for a good reason that will be apparently once we discuss it.

The Linux Virtual Memory System (or VM colloquially) is a complex beast and my objective here is to provide a practitioner’s overview and not to look at it from the standpoint of an architect.

The Linux VM handles all memory within Linux. Traditionally we think of memory as being either “in use” or “free” but this view is far too simplistic with modern virtual memory management systems (modern means since Digital Equipment Corporation produced VMS around 1975.) In a system like Linux memory is not simply either “in use” or “free” but can also be being used as buffer or cache space.

Memory buffers and memory cache are advanced virtual memory techniques used by Linux, and many other operating systems, to make the overall system perform better by making more efficient use of the memory subsystem. Memory space used as cache, for example, is not strictly “in use” in the traditional sense of the term as no userspace process is holding it open and that space, should it be requested by an application, would become available. It is used as cache because the VM believes that this memory will be used again before the space is needed and that it would be more efficient to keep that memory cached rather than to flush to disk and need to reload which is a very slow process.

On the other hand there are times when there is not enough true memory available in the system for everything that we want to have loaded to fit into at the same time. When this occurs the VM looks for the portions of memory that it believes are the least likely to be used or those that can be moved onto and off of disk most effectively and transfers these out to the swap space. Anytime that we have to use swap instead of real memory we are taking a performance hit but this is far more effective than having the system simply “run out of memory” and either crash or stop loading new processes. By using swap space the system is degrading gracefully under excessive load instead of failing completely. This is very important when we consider that heavy memory utilization might only last a few seconds or minutes when a spike of usage occurs.

In the recent past memory was traditionally an extreme bottleneck for most systems. Memory was expensive. Today most companies as well as most home users are able to afford memory far in excess of what their systems need on a daily basis. In the 1990s we would commonly install as little memory as we could get away with and expect the machine to swap constantly because disk space was cheap in comparison. But over time more and more people at home and all but the most backwards businesses have come to recognize the immense performance gains made by supply the computer with ample memory resources. Additionally, having plenty of memory means that your disks are not working as hard or as often leading to lower power consumption, higher reliability and lower costs for parts replacement.

Because systems often have so much memory resources today, instead of seeing system working to move less-needed memory areas out to disk we instead see the system looking for likely-to-be-used sections of disk and moving them into memory. This reversal has proved to be incredibly effective at speeding up our computing experiences but it has also been very confusing to many users who are not prepared to look so deeply into their memory subsystems.

Now that we have an overview of the basics behind the Linux VM (this has been a very simplistic overview hitting just the highlights of how the system works) lets look at what tools we have to discover how out VM is behaving at any given point in time. Our first and most basic tool, but probably the most useful, is “free”. The “free” command provides us with basic usage information about true memory as well as swap space also known as “virtual memory”. The most common flag to use with “free” is “-m” which displays memory in megabytes. You can also use “-b” for bytes, “-k” for kilobytes or “-g” for gigabytes.

# free -m
             total  used  free  shared  buffers  cached
Mem:          3945   967  2977       0       54     725
-/+ buffers/cache:   187  3757
Swap:         4094     0  4094

From this output we can gain a lot of insight into our Linux system. On the first line we can see that our server has 4GB of memory (Mem/Total). These numbers are not always absolutely accurate so rounding may be in order. The mem/used amount here is 967MB and mem/free is 2977MB or ~3GB. So, according to the top line of output, our system is using ~1GB out of a total of 4GB. But also on the first like we see that mem/buffers is 54MB and that mem/cached is 725MB. Those are significant amounts compared to our mem/used number.

In the second line we see the same output but without buffers and cache being added in to the used and free metrics. This is the line that we really want to pay attention to. Here we see the truth, that the total “used” memory – in the traditional use of the term – is only a mere 187MB and that actually have ~3.76GB free! Quite a different picture of our memory utilization.

According to these measurements, approximately 81% of all memory in use in our system is for performance enhancing buffer/cache and not for traditional uses and less than 25% of our total memory is even in use for that.

In the third line we see basic information about our swap space. In this particular case we can see that we have 4GB total and that none of it is in use at all. Very simple indeed.

When reading information about the swap space, keep in mind that if memory utilization spikes and some memory has to be “swapped out” that some of that data may not be needed for a very long time, even if memory utilization drops and swap is no longer needed. This means that some amount of your swap space may be “in use” even though the VM is not actively reading or writing to the swap space. So seeing that swap is “in use” is only an indication that further investigation may be warranted and does not provide any concrete information on its own. If you see that some of your swap is used but there is plenty of free memory in line two then your current situation is fine but you need to be aware that you memory utilization likely spiked in the recent past.

Often all you want to know is what the current “available” memory is on your Linux system. Here is a quick one liner that will provide you with this information. You can save it in a script called “avail” and place it in /usr/sbin and you can use it anytime you need to just see how much headroom your memory system has without touching the swap space. Just add “#!/bin/bash” at the top of your script and away you go.

echo $(free -m | grep 'buffers/' | awk '{print $4}') MB

This one liner is especially effective for non-administrator users to have at their disposal as it provides the information that they are usually looking for quickly and easily. This is generally preferred over giving extra information and having to explain how to decipher what is needed.

Moving on from “free” – not that we know the instantaneous state of our memory system we can look at more detailed information about the subsystem’s ongoing activities. The tool that we use for this is, very appropriately, named “vmstat”.

“vmstat” is most useful for investigating VM utilization when swap space is in use to some degree. This is where “vmstat” becomes particularly useful. When running “vmstat” we generally want to pass it a few basic parameters to get the most useful output from it. If you run the command without any options you will get some basic statistics collected since the time that the system was last rebooted. Interesting but not what we are looking for in particular. The real power of “vmstat” lies in its ability to output several instances (count) of activity over a period (delay.) We will see this in our example.

I prefer to use “-n” which suppresses header reprints. Do a few large “vmstat”s and you will know what I mean. I also like to translate everything to megabytes which we do with “-S M” on some systems and “-m” on others. To make “vmstat” output more than a single line of output we need to feed it a count and a delay. A good starting point is 10 and 5. This will take 40 seconds to complete ( count-1 * delay ). I always like to run “free” just before my “vmstat” just so that I have all of the information on my screen at one time. Let’s take a look at a memory intensive server.

free -m

             total     used      free    shared   buffers    cached
Mem:         64207    64151        56         0        73     24313
-/+ buffers/cache:    39763     24444
Swap:        32767        2     32765

vmstat -n -S M 10 5

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 7  1      2     53     75  24313    0    0  8058    62   19     1 10  7 71 12
 2  2      2     44     74  24325    0    0 103251   209 2512 20476 15 13 54 18
 3  1      2     43     74  24325    0    0 105376   216 2155 16790 11 12 59 17
 2  1      2     50     74  24320    0    0 107762   553 2205 16819 12 10 60 17
 2  2      2     49     73  24322    0    0 105104   302 2323 18009 23  9 51 17

That is a lot of data. From the “free” command we see that we are dealing with a machine with 64GB of memory that has almost 40GB of that in use and 24GB of it being used for cache. A busy machine with a lot of memory but definitely not out of memory.

The “vmstat” output is rather complex and we will break it down into sections. Notice that the stats in the very first row are significantly different from the stats in the subsequent rows. This is because the first row is average stats since the box last rebooted. And each subsequent row is stats over the last delay period which, in this case, is ten seconds.

The first section is “procs”. In this section we have “r” and “b”. “R” is for runnable processes – that is, processes waiting for their turn to execute in the CPU. “B” is for blocked processes – those that are in uninterruptible sleep. This is process, not memory, data.

The next section is “memory” and this provides the same information that we can get from the free command. “Swpd” is the same as swap/used in “free”. “Free” is the same as mem/free. “Buff” is the same as mem/buffers and “cache” is the same as mem/cached. This view will help you see how the different levels are constantly in motion on a busy system.

If you system is being worked very hard them the real interesting stuff starts to come in the “swap” section. Under “swap” we have “si” for swap in and “so” for swap out. Swapping in means to move virtual memory off of disk and into real memory. Swapping out means to take real memory and to store it on disk. Seeing an occasional swap in or swap out event is not significant. What you should be interested in here is if there is a large amount of constant si/so events. If si/so is constantly quite busy then you have a system that is actively swapping. This does not mean that the box is overloaded but it does indicate that you need to be paying attention to your memory subsystem because it definitely could be overloaded. At the very least performance is beginning to degrade even if still gracefully and possibly imperceptibly.

Under “io” we see “bi” and “bo” which are blocks in and blocks out. This refers to total transfers through the systems block devices. This is to help you get a feel for how “busy” the i/o subsystem is in regards to memory, CPU and other statistics. This helps to paint an overall picture of system health and performance. But it is not directly memory related so be careful not to read the bi/bo numbers and attempt to make direct memory usage correlations.

The next section is “system” and has “in” for interrupts per second. The “in” number includes clock interrupts so even the most truly idle system will still have some interrupts. “Cs” is context switches per second. I will not go into details about interrupts and context switching at this time.

The final section is “cpu” which contains fairly standard user “us”, system “sy”, idle “id” and waiting on i/o “wa” numbers. These are percentage numbers that should add up to roughly 100 but will not always do so due to rounding. I will refrain from going into details about CPU utilization monitoring here. That is a separate topic entirely but it is important to be able to study memory, processor and i/o loads all in a single view which “vmstat” provides.

If you will be using a script to automate the collection of “vmstat” output you should use “tail” to trim off the header information and the first line of numerical output since that is cumulative since system reboot. It is the second numerical line that you are interested in if you will be collecting data. In this example one-liner we take a single snapshot of a fifteen second interval that could be used easily in a larger script.

vmstat 15 2 | tail -n1

This will provide just a single line of useful output for sending to a log, attaching to a bug report or for filing into a database.

Additional, deeper, memory information can be found from a few important sources. Firstly let’s look at “vmstat -s”. This is a very different use of this command.

vmstat -s

  32752428  total memory
  32108524  used memory
  13610016  active memory
  17805412  inactive memory
    643904  free memory
    152168  buffer memory
  23799404  swap cache
  25165812  total swap
       224  used swap
  25165588  free swap
   4464930 non-nice user cpu ticks
         4 nice user cpu ticks
   1967484 system cpu ticks
 193031159 idle cpu ticks
    308124 IO-wait cpu ticks
     37364 IRQ cpu ticks
    262095 softirq cpu ticks
  69340924 pages paged in
  86966611 pages paged out
        15 pages swapped in
        62 pages swapped out
 488064743 interrupts
  9370356 CPU context switches
1202605359 boot time
   1304028 forks

As you can see we get a lot of information from this “-s” summary option. Many of these numbers are static such as “total memory” and will not change on a running system. Others, such as “interrupts” is a counter that is used to generate the “since reboot” statistics for the regular “vmstat” options.

The “free” and “vmstat” commands draw their data from the /proc file system as you would expect. You can see their underlying information through:

cat /proc/meminfo cat /proc/vmstat cat /proc/slabinfo

Of course, no talk of Linux performance monitoring can be complete without mentioning the king of monitoring the “top” command. However, the memory information available from “top” is little more than we can see more concisely in “vmstat” and “free”. At the top of “top” we get the “free” summary is a slightly modified form but containing the same data. If this does not appear you can toggle it on and off with the “m” command. You can then sort top‘s listing with “M” to force it to sort processes by memory utilization instead of processor utilization. Very handy for finding memory hogging applications.

Armed with this information you should now be able to diagnose a running Linux system to determine how its memory is being used, when is it being used, how much is being used, how much headroom is available and when more memory should be added if necessary.

References:

Monitoring Virtual Memory with vmstat from Linux Journal
vmstat Man Page on Die.net
free Man Page on Die.net
/proc/meminfo from Red Hat