Linux Memory Monitoring – Sheep Guarding Llama

As a Linux System Administrator one question that I get asked quite often is to look into memory issues. The Linux memory model is rather more complex that many other systems and looking into memory utilization is not always as straightforward as one would hope, but this is for a good reason that will be apparently once we discuss it.

The Linux Virtual Memory System (or VM colloquially) is a complex beast and my objective here is to provide a practitioner’s overview and not to look at it from the standpoint of an architect.

The Linux VM handles all memory within Linux. Traditionally we think of memory as being either “in use” or “free” but this view is far too simplistic with modern virtual memory management systems (modern means since Digital Equipment Corporation produced VMS around 1975.) In a system like Linux memory is not simply either “in use” or “free” but can also be being used as buffer or cache space.

Memory buffers and memory cache are advanced virtual memory techniques used by Linux, and many other operating systems, to make the overall system perform better by making more efficient use of the memory subsystem. Memory space used as cache, for example, is not strictly “in use” in the traditional sense of the term as no userspace process is holding it open and that space, should it be requested by an application, would become available. It is used as cache because the VM believes that this memory will be used again before the space is needed and that it would be more efficient to keep that memory cached rather than to flush to disk and need to reload which is a very slow process.

On the other hand there are times when there is not enough true memory available in the system for everything that we want to have loaded to fit into at the same time. When this occurs the VM looks for the portions of memory that it believes are the least likely to be used or those that can be moved onto and off of disk most effectively and transfers these out to the swap space. Anytime that we have to use swap instead of real memory we are taking a performance hit but this is far more effective than having the system simply “run out of memory” and either crash or stop loading new processes. By using swap space the system is degrading gracefully under excessive load instead of failing completely. This is very important when we consider that heavy memory utilization might only last a few seconds or minutes when a spike of usage occurs.

In the recent past memory was traditionally an extreme bottleneck for most systems. Memory was expensive. Today most companies as well as most home users are able to afford memory far in excess of what their systems need on a daily basis. In the 1990s we would commonly install as little memory as we could get away with and expect the machine to swap constantly because disk space was cheap in comparison. But over time more and more people at home and all but the most backwards businesses have come to recognize the immense performance gains made by supply the computer with ample memory resources. Additionally, having plenty of memory means that your disks are not working as hard or as often leading to lower power consumption, higher reliability and lower costs for parts replacement.

Because systems often have so much memory resources today, instead of seeing system working to move less-needed memory areas out to disk we instead see the system looking for likely-to-be-used sections of disk and moving them into memory. This reversal has proved to be incredibly effective at speeding up our computing experiences but it has also been very confusing to many users who are not prepared to look so deeply into their memory subsystems.

Now that we have an overview of the basics behind the Linux VM (this has been a very simplistic overview hitting just the highlights of how the system works) lets look at what tools we have to discover how out VM is behaving at any given point in time. Our first and most basic tool, but probably the most useful, is “free”. The “free” command provides us with basic usage information about true memory as well as swap space also known as “virtual memory”. The most common flag to use with “free” is “-m” which displays memory in megabytes. You can also use “-b” for bytes, “-k” for kilobytes or “-g” for gigabytes.

# free -m
             total  used  free  shared  buffers  cached
Mem:          3945   967  2977       0       54     725
-/+ buffers/cache:   187  3757
Swap:         4094     0  4094

From this output we can gain a lot of insight into our Linux system. On the first line we can see that our server has 4GB of memory (Mem/Total). These numbers are not always absolutely accurate so rounding may be in order. The mem/used amount here is 967MB and mem/free is 2977MB or ~3GB. So, according to the top line of output, our system is using ~1GB out of a total of 4GB. But also on the first like we see that mem/buffers is 54MB and that mem/cached is 725MB. Those are significant amounts compared to our mem/used number.

In the second line we see the same output but without buffers and cache being added in to the used and free metrics. This is the line that we really want to pay attention to. Here we see the truth, that the total “used” memory – in the traditional use of the term – is only a mere 187MB and that actually have ~3.76GB free! Quite a different picture of our memory utilization.

According to these measurements, approximately 81% of all memory in use in our system is for performance enhancing buffer/cache and not for traditional uses and less than 25% of our total memory is even in use for that.

In the third line we see basic information about our swap space. In this particular case we can see that we have 4GB total and that none of it is in use at all. Very simple indeed.

When reading information about the swap space, keep in mind that if memory utilization spikes and some memory has to be “swapped out” that some of that data may not be needed for a very long time, even if memory utilization drops and swap is no longer needed. This means that some amount of your swap space may be “in use” even though the VM is not actively reading or writing to the swap space. So seeing that swap is “in use” is only an indication that further investigation may be warranted and does not provide any concrete information on its own. If you see that some of your swap is used but there is plenty of free memory in line two then your current situation is fine but you need to be aware that you memory utilization likely spiked in the recent past.

Often all you want to know is what the current “available” memory is on your Linux system. Here is a quick one liner that will provide you with this information. You can save it in a script called “avail” and place it in /usr/sbin and you can use it anytime you need to just see how much headroom your memory system has without touching the swap space. Just add “#!/bin/bash” at the top of your script and away you go.

echo $(free -m | grep 'buffers/' | awk '{print $4}') MB

This one liner is especially effective for non-administrator users to have at their disposal as it provides the information that they are usually looking for quickly and easily. This is generally preferred over giving extra information and having to explain how to decipher what is needed.

Moving on from “free” – not that we know the instantaneous state of our memory system we can look at more detailed information about the subsystem’s ongoing activities. The tool that we use for this is, very appropriately, named “vmstat”.

“vmstat” is most useful for investigating VM utilization when swap space is in use to some degree. This is where “vmstat” becomes particularly useful. When running “vmstat” we generally want to pass it a few basic parameters to get the most useful output from it. If you run the command without any options you will get some basic statistics collected since the time that the system was last rebooted. Interesting but not what we are looking for in particular. The real power of “vmstat” lies in its ability to output several instances (count) of activity over a period (delay.) We will see this in our example.

I prefer to use “-n” which suppresses header reprints. Do a few large “vmstat”s and you will know what I mean. I also like to translate everything to megabytes which we do with “-S M” on some systems and “-m” on others. To make “vmstat” output more than a single line of output we need to feed it a count and a delay. A good starting point is 10 and 5. This will take 40 seconds to complete ( count-1 * delay ). I always like to run “free” just before my “vmstat” just so that I have all of the information on my screen at one time. Let’s take a look at a memory intensive server.

free -m

             total     used      free    shared   buffers    cached
Mem:         64207    64151        56         0        73     24313
-/+ buffers/cache:    39763     24444
Swap:        32767        2     32765

vmstat -n -S M 10 5

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 7  1      2     53     75  24313    0    0  8058    62   19     1 10  7 71 12
 2  2      2     44     74  24325    0    0 103251   209 2512 20476 15 13 54 18
 3  1      2     43     74  24325    0    0 105376   216 2155 16790 11 12 59 17
 2  1      2     50     74  24320    0    0 107762   553 2205 16819 12 10 60 17
 2  2      2     49     73  24322    0    0 105104   302 2323 18009 23  9 51 17

That is a lot of data. From the “free” command we see that we are dealing with a machine with 64GB of memory that has almost 40GB of that in use and 24GB of it being used for cache. A busy machine with a lot of memory but definitely not out of memory.

The “vmstat” output is rather complex and we will break it down into sections. Notice that the stats in the very first row are significantly different from the stats in the subsequent rows. This is because the first row is average stats since the box last rebooted. And each subsequent row is stats over the last delay period which, in this case, is ten seconds.

The first section is “procs”. In this section we have “r” and “b”. “R” is for runnable processes – that is, processes waiting for their turn to execute in the CPU. “B” is for blocked processes – those that are in uninterruptible sleep. This is process, not memory, data.

The next section is “memory” and this provides the same information that we can get from the free command. “Swpd” is the same as swap/used in “free”. “Free” is the same as mem/free. “Buff” is the same as mem/buffers and “cache” is the same as mem/cached. This view will help you see how the different levels are constantly in motion on a busy system.

If you system is being worked very hard them the real interesting stuff starts to come in the “swap” section. Under “swap” we have “si” for swap in and “so” for swap out. Swapping in means to move virtual memory off of disk and into real memory. Swapping out means to take real memory and to store it on disk. Seeing an occasional swap in or swap out event is not significant. What you should be interested in here is if there is a large amount of constant si/so events. If si/so is constantly quite busy then you have a system that is actively swapping. This does not mean that the box is overloaded but it does indicate that you need to be paying attention to your memory subsystem because it definitely could be overloaded. At the very least performance is beginning to degrade even if still gracefully and possibly imperceptibly.

Under “io” we see “bi” and “bo” which are blocks in and blocks out. This refers to total transfers through the systems block devices. This is to help you get a feel for how “busy” the i/o subsystem is in regards to memory, CPU and other statistics. This helps to paint an overall picture of system health and performance. But it is not directly memory related so be careful not to read the bi/bo numbers and attempt to make direct memory usage correlations.

The next section is “system” and has “in” for interrupts per second. The “in” number includes clock interrupts so even the most truly idle system will still have some interrupts. “Cs” is context switches per second. I will not go into details about interrupts and context switching at this time.

The final section is “cpu” which contains fairly standard user “us”, system “sy”, idle “id” and waiting on i/o “wa” numbers. These are percentage numbers that should add up to roughly 100 but will not always do so due to rounding. I will refrain from going into details about CPU utilization monitoring here. That is a separate topic entirely but it is important to be able to study memory, processor and i/o loads all in a single view which “vmstat” provides.

If you will be using a script to automate the collection of “vmstat” output you should use “tail” to trim off the header information and the first line of numerical output since that is cumulative since system reboot. It is the second numerical line that you are interested in if you will be collecting data. In this example one-liner we take a single snapshot of a fifteen second interval that could be used easily in a larger script.

vmstat 15 2 | tail -n1

This will provide just a single line of useful output for sending to a log, attaching to a bug report or for filing into a database.

Additional, deeper, memory information can be found from a few important sources. Firstly let’s look at “vmstat -s”. This is a very different use of this command.

vmstat -s

  32752428  total memory
  32108524  used memory
  13610016  active memory
  17805412  inactive memory
    643904  free memory
    152168  buffer memory
  23799404  swap cache
  25165812  total swap
       224  used swap
  25165588  free swap
   4464930 non-nice user cpu ticks
         4 nice user cpu ticks
   1967484 system cpu ticks
 193031159 idle cpu ticks
    308124 IO-wait cpu ticks
     37364 IRQ cpu ticks
    262095 softirq cpu ticks
  69340924 pages paged in
  86966611 pages paged out
        15 pages swapped in
        62 pages swapped out
 488064743 interrupts
  9370356 CPU context switches
1202605359 boot time
   1304028 forks

As you can see we get a lot of information from this “-s” summary option. Many of these numbers are static such as “total memory” and will not change on a running system. Others, such as “interrupts” is a counter that is used to generate the “since reboot” statistics for the regular “vmstat” options.

The “free” and “vmstat” commands draw their data from the /proc file system as you would expect. You can see their underlying information through:

cat /proc/meminfo cat /proc/vmstat cat /proc/slabinfo

Of course, no talk of Linux performance monitoring can be complete without mentioning the king of monitoring the “top” command. However, the memory information available from “top” is little more than we can see more concisely in “vmstat” and “free”. At the top of “top” we get the “free” summary is a slightly modified form but containing the same data. If this does not appear you can toggle it on and off with the “m” command. You can then sort top‘s listing with “M” to force it to sort processes by memory utilization instead of processor utilization. Very handy for finding memory hogging applications.

Armed with this information you should now be able to diagnose a running Linux system to determine how its memory is being used, when is it being used, how much is being used, how much headroom is available and when more memory should be added if necessary.

References:

Monitoring Virtual Memory with vmstat from Linux Journal
vmstat Man Page on Die.net
free Man Page on Die.net
/proc/meminfo from Red Hat

Join the Conversation

6 Comments

John Griffin says:

January 21, 2010 at 9:14 am

This article is well written and contains lots of good information needed to trouble shoot memory issues for clients or otherwise.
Scott Alan Miller says:

April 13, 2010 at 8:23 am

Or an update, to get the free memory in GB rather than MB, just do this:

echo $(free -g | grep ‘buffers/’ | awk ‘{print $4}’) GB
Weston says:

November 19, 2010 at 8:04 am

Excellent article. Well worth the read!
Cybergavin says:

November 2, 2011 at 1:17 pm

Thank you for the useful article. Concise and clear. In your last paragraph, you meant to say that “you should be able”, but accidentally added a not! Your article surely helps somebody monitor memory on a Linux system.
Scott Alan Miller says:

November 2, 2011 at 4:09 pm

Not is now now. 🙂
Pingback: [server] Linux 서버는 60 %의 메모리 만 사용하고 스와핑 - 리뷰나라

You must be logged in to post a comment.

Join the Conversation

Leave a comment