High IOWait on VMWare Server on Linux

In using VMWare Server running on Red Hat Enterprise Linux 5 (CentOS 5) I discovered a rather difficult problem.  My setup includes Red Hat Linux 5.2, Solaris 10 and Windows Server 2003 guests running on a Red Hat 5.2 host server all 64bit except for Windows running on AMD Opteron multicore processors on an HP Proliant DL145 G3.

The issue that I found was that the Windows guest was exhibiting serious performance issues.  The box would freeze regularly, networking would stop although pings continued but remote desktop (RDP) would be interrupted.  In the logs I consistently found symmpi errors in the System Event Log:

The device, \Device\Scsi\symmpi1, did not respond within the timeout period.

Because the issues were only exhibiting on Windows and not on Linux or Solaris guests I was convinced that the issue was Windows related.  I could see that the Linux host operating system was showing massive IOWait states (you can see this in top or with the iostat command from the sysstat package.)  I assumed that this was being caused by the Windows guest; it was not.

I turned off all three guest operating systems and noticed almost no drop in the IOWait levels, however if I turned of the VMWare Server process (/etc/init.d/vmware stop) the IOWait would drop almost instantly and return again as soon as I restart the process even without starting any virtual machine images.  Clearly the issue was with VMWare Server itself.

My first thought was to make sure that VMWare Server was up to date.  I have been running VMWare Server 1.0.7 and so downloaded and updated the very recent 1.0.8 update just to be sure that this issue was not addressed in that package.  It was not.  I am aware that the 2.0 series is available now but as this box is used a bit I am not interested yet in moving to the new series unless absolutely necessary.

Once I narrowed down that the issue was a problem with VMWare Server on Linux I was able to track down a solution.  Special thanks to Mr. Pointy for publishing the solution to this for Gutsy Gibson.  Red Hat and Ubuntu are sharing a problem in this case.

The issue is with memory configuration defaults with VMWare Server on this platform.  Very likely this will apply to Novell SUSE Linux, OpenSUSE, Fedora and others, but I have not tested it.  In the main VMWare Server configuration file (/etc/vmware/config) the following changes should be added:

prefvmx.useRecommendedLockedMemSize = “TRUE”
prefvmx.minVmMemPct = “100″

Then, in each of the individual virtual machine configuration files (*.vmx) you need to add:

sched.mem.pshare.enable = “FALSE”
mainMem.useNamedFile = “FALSE”
MemTrimRate = “0″
MemAllowAutoScaleDown = “FALSE”

These changes are taken directly from Mr. Pointy’s blog.  Once the changes are made you can restart VMWare Server (/etc/init.d/vmware restart) and the difference should be immediately visible.  Mr. Pointy posted his own sar results and here are mine.  You can clearly see the change in the %iowait column at 10:10pm when I restarted VMWare with the new configuration.  The numbers are low around 7:00pm because I had VMWare off much of that hour.

06:40:01 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
06:50:01 PM       all      0.16      0.00      1.77     43.15      0.00     54.92
07:00:01 PM       all      2.83      0.00      6.83      9.51      0.00     80.82
07:10:01 PM       all      0.10      0.00      1.38      4.93      0.00     93.59
07:20:01 PM       all      0.11      0.20      1.84     14.78      0.00     83.07
07:30:01 PM       all      0.10      0.00      2.08      8.84      0.00     88.98
07:40:02 PM       all      0.11      0.00      2.36     26.84      0.00     70.70
07:50:01 PM       all      0.11      0.00      2.32     28.54      0.00     69.04
08:00:01 PM       all      0.10      0.00      2.13     30.63      0.00     67.14
08:10:01 PM       all      0.10      0.00      2.06     22.74      0.00     75.10
08:20:01 PM       all      0.09      0.20      2.02     22.75      0.00     74.94
08:30:04 PM       all      0.09      0.00      2.21     23.22      0.00     74.48
08:40:01 PM       all      0.09      0.00      3.03     25.06      0.00     71.81
08:50:01 PM       all      0.09      0.00      3.09     27.21      0.00     69.61
09:00:01 PM       all      0.10      0.00      3.13     29.40      0.00     67.37
09:10:01 PM       all      0.09      0.00      3.11     25.56      0.00     71.23
09:20:01 PM       all      0.09      0.19      3.07     23.79      0.00     72.86
09:30:01 PM       all      0.09      0.00      2.98     21.50      0.00     75.43
09:40:01 PM       all      0.10      0.00      2.97     25.94      0.00     70.99
09:50:01 PM       all      0.10      0.00      3.28     32.70      0.00     63.93
10:00:01 PM       all      0.20      0.00      4.96     40.73      0.00     54.11
10:10:01 PM       all      0.69      0.00      8.57      1.23      0.00     89.50
10:20:01 PM       all      0.88      0.21      6.34      0.67      0.00     91.90
10:30:01 PM       all      0.81      0.00      6.04      0.26      0.00     92.89
10:40:01 PM       all      0.78      0.00      5.55      0.20      0.00     93.47
10:50:01 PM       all      0.77      0.00      5.47      0.07      0.00     93.69

After the change was complete I had no problem running i/o system intensive operations like disk compression, defragmentation, etc.

Original solution from: Mr. Pointy – Gutsy and VMWare Server – You’re In for Some Pain.

Join the Conversation

2 Comments

  1. Thanks….Thanks…Thanks….Thanks… You are awesome..you save my life..:)

    This blog is really good …

Leave a comment