server – Sheep Guarding Llama

Third Party Hard Drive for HP Proliant DL185 G5

Scott Alan Miller — Sun, 31 May 2009 18:06:25 +0000

This document applies directly to the Hewlett Packard Proliant DL185 G5 server. I have tested this with the twelve front bay configuration and will test shortly with the rear-facing drive configuration as well. [Edit – Tested with fourteen drive configuration and it checked out just fine.]

When buying a hot-swap SAS or SATA 3.5″ hard drive for use in your new HP Proliant DL185 G5 you can acquire them directly from HP with the drive carrier (or sled, caddy) already attached. This is the easiest method. If you are like me and prefer to select your own drives from third party makers (in my case, I want to use low power, high capacity Seagate Barracuda LP drives) then you must purchase your hot swap drive sleds separately. Finding the correct part number from HP can be quite a hassle. Even calling them for support can be tricky as almost no one buys this part directly.

If you wish to get your drive trays separately and not through HP you may be in tough shape. HP does not stock this part and, in fact, is unable to even look up this part number for you. I spent some time working with HP in the US on this issue and they were able to provide a visual confirmation on the part for me but could not verify the quality or the usability of the third party drives that I was able to find. So I was stuck taking a risk to see if these drives would work. For some machines HP can provide a part number and sometimes can even sell the caddy themselves, but not in the case of the 185 G5. I have taken the time both with HP and with the third party vendors and with the server in-hand to verify these parts so you do not have to do so.

The part that you need to purchase is HP Part Number: 373211-001. This part is generally priced around $35 USD. You will need as many as fourteen of them to fully populate the DL185 G5 drive with the two optional large drive bays (twelve in front and two in back) but you can use them individually as well, or course. I have had good luck and have gotten a good price getting these trays from Discount Technology: DL185 G5 Hot Swap Drive Tray.

Beware of shops attempting to sell you a much lower cost alternative to this part number. Quite often the lower cost part is actually a drive blank. A drive blank is simply a plastic air dam that corrects airflow through the server chassis when a drive is not present. Many of these drive blanks should ship with your DL185 G5 when it is new. They are readily available and very inexpensive but, mostly, useless.

The big advantage of working with third party sleds and drives is that the DL185 G5 can be populated for thousands of dollars less and can house as much as 28TB of storage in a tiny 2U server. This is possibly the second densest storage unit on the market when used with the Seagate Barracude LP 2TB drives – the densest is the Sun x4500 “Thumper” 4U storage server at many times the cost of the DL185.

Also, when ordering a DL185 G5 you should be aware that if you get the larger twelve hard drive front drive bay that you cannot also have a front loading optical drive and will need to get your optical drive rear facing. If you get the optional dual hot swap rear facing drive option then you cannot have a rear facing optical drive. If you choose both of these options you must use a USB-based optical drive in order to boot from optical media. This is not always obvious when you are attempting to order one of these machines.

Third Party Hard Drive for HP Proliant DL385 G5

Scott Alan Miller — Sat, 28 Feb 2009 17:41:37 +0000

This document applies directly to the Hewlett Packard Proliant DL385 G2 and DL385 G5 servers which share a physical chassis. To the best of my knowledge, this will also apply to the DL585 G2 and DL585 G5 which should share an eight bay drive cage with their 3xx series cousins. I also believe that this applies to the Intel based DL380 G5 as well as the DL580 G5. (The DL380 G4 and the DL580 G4 use different drive configurations as does the DL385 G5p.)

When buying a hot-swap SAS or SATA 2.5″ hard drive for use in your new HP DL385 G5 you can acquire them directly from HP with the drive carrier (or sled, caddy) already attached. This is the easiest method. If you are like me and prefer to select your own drives from third party makers (in my case, I want to use high performance Seagate drives) then you must purchase your hot swap drive sleds separately. Finding the correct part number from HP can be quite a hassle. Even calling them for support can be tricky as almost no one buys this part directly.

I have already done the legwork to find the correct part number and have purchased and tested this part to be sure that it is correct. The part that you need to purchase is HP Part Number: 378343-002. This part is generally priced around $50 USD. You will need eight of them to fully populate the DL385 G5 drive housing but you can use them individually as well, or course.

Beware of shops attempting to sell you a much lower cost alternative to this part number. Quite often the lower cost part is actually a drive blank. A drive blank is simply a plastic air dam that corrects airflow through the server chassis when a drive is not present. Seven of these drive blanks should ship with your DL385 G5 when it is new. They are readily available and very inexpensive but, mostly, useless.

If you need to reach HP’s Parts Store directly you can call them at (800) 227-8164 in the US.

The Case Against SAN

Scott Alan Miller — Sat, 16 Aug 2008 14:36:12 +0000

Despite an inflamatory post title, I believe that SAN (Storage Area Networks) is a great technology with numerous scenarious where it is the exact right technology and several scenarios that only exist because of SAN’s availability. However, that being said, many enterprises today use SAN without doing any proper strategy, architecture or engineering. It is being chosen as a technology not because of its appropriatness to the task at hand but simly because technology managers see it as easier, or more popular, to use it broadly than to carefully evaluate each system in question based on technical and financial factors.

SAN is an amazing technology that wonderfully compliments virtualization, clustering and other advanced use case scenarios. Not every machine is using these types of scenarios and SAN has many downsides that need to be carefully considered before implementing it blindly.

SAN is Complex. Simply by chosing to use SAN we introduce another layer of complexity into the server equation. (I am assuming server use situations here as SAN is nearly unheard of in the desktop space. That being said, I use SAN on my own desktop.) Having SAN means that either your system administrators need to wear yet another hat or you need to hire and maintain a dedicated storage administration, and possibly engineering, staff.

It also means that you will probably need to deal with sourcing and managing a fibre channel network along with the associated HBAs, fiber optics, etc. Servers that would otherwise have just three simple Ethernet connections (I’m generalizing horribly here) are suddenly up to five or more connections making your datacenter folks oh so happy.

SAN is Expensive. Unless you opt to use a shared network SAN technology like iSCSI (or Z-SAN) then SAN introduces an expensive array of proprietary networking hardware, cabling and host bus adapters. Only after all of those expenses must we consider the cost of the SAN itself. SAN systems are generally quite expensive and only begin to approach being cost effective when utilization rates are extremely high and the systems are very large. Heavy up front investments can make SAN difficult to cost justify even if long term utilization rates might be high.

SAN is Not Performant. High speed SAN networks, massive switching fabrics and huge drive arrays all play into an expensive and mostly futile attempt to get SAN technologies to perform at or near traditional direct attached storage technologies. During the Parallel SCSI and PATA drive era, fibre channel SAN had an advantage over most local drives simply because of the high performance of its networking infrastructure. Today this is not the case.

Unlike shared bandwidth technologies like Parallel SCSI and Parallel ATA (PATA), SAS and SATA drives have dedicated, full duplex bandwidth per device providing greatly increased transfer rates while lowering latency. Only the largest, most expensive of high performance SAN systems could hope to overcome this gap in technology.

Typical SAN systems tend to use, in my experience, SATA devices traditionally running at 7,200 RPM. Local drives are often SAS drives running at 15,000 RPM. Often, especially in the AMD and Intel server worlds, local drives are handled via high powered RAID controller cards with dedicated processors and their own cache. These cards move the cache closer to the system memory making their burstable throughput far greater than can normally be acheived in a SAN situation.

SAN is Not Easily Tunable. In most situations, SAN is managed as a single, giant storage entity. Tuning is performed to an entire array but little thought is generally given to small segments within an array.

This is made nearly impossible and definitely impractical by the simple fact that physical drive resources are often shared and the concerns of each accessing system must be considered. The obvious solution is to just tune for “average” use given no special considerations to any particular system. If drive resources are not dedicated then we must question where the value of the SAN comes into play.

Drives located on a local machine can easily be tuned for cost and performance as needed. Careful consideration of high speed SAS versus large volume SATA can be made on a volume by volume basis by the system engineer. Drives can be grouped as needed into carefully chosen RAID levels such as 0 for raw performance, 5 for high speed random access with some additional reliability, 1 for good sequential access with full redundancy, 6 for additional redundancy over 5, etc.

Drive volumes can also be isolated so that drive systems often accessed simultaneously do not share command paths. Carefully filesystem design can greatly reduce drive contention and minimize drive head movement for increased performance and reliability.

SAN is Often Political. Simpy by introducing SAN to a large organization we risk introducing new management, new skill sets, new job descriptions and, inevitably, confusion and paperwork. By separating the storage from the server we create another point of coordination keeping the system administrator from being a single point of contact and troubleshooting for system issues.

Anytime that we introduce a separation of duties we introduce company politics and a chain of communication. Instead of troubleshooting a single system when a server goes down we have to, in the case of SAN, now consider the server, the SAN box and the connecting network plus the peripheral pieces like the host bus adapters and the local configuration. What might otherwise be simple, almost meaningless changes like the addition of another drive to expand a server’s capacity by a terabyte, can suddenly scale into major enterprise issues requiring much lead time, planning and expenditure, and, of course, a system outage that used to take minutes to repair could easily become hours as company departments seek shelter rather than simply fixing the issues at hand.

SAN uses Additional Datacenter Footprint. Because almost any server already comes with internal storage capacity, the datacenter space needed by SAN equipment is generally redundant. Until additional storage capacity is needed beyond that which can fit inside of the existing server chassis the SAN storage is completely additional within what are generally cramped and overutilized data centers. In many cases when a server needs additional drive capacity SAN is still not necessarily a good option from a footprint perspective as many external drive array systems can be locally attached and use very little datacenter space.

SAN systems require more than simply physical space within the datacenter for their switching and storage pieces, they also require additional power and cooling. In an era when we are fighting to make our datacenters as green as possible, SAN needs to be considered carefully with respect to its overall power draw.

SAN does not address Solid State Drives. Solid State Drive technology, or SSD, poses yet another obstacle for SAN in the enterprise. SSD drives are much smaller capacity, currently, than traditional spindle based hard drives but often provide better performance at a fraction of the power consumption. A traditional hard drive generally draws roughly fifteen watts while a standard SSD generally draws around one watt – a very significant power reduction indeed.

SSDs often have very high burstable transfer rates which swing the performance balance far in favor of the locally attached storage options based on their greatly superior throughput. For example, a standard Hewlett-Packard DL385 G5 server, a very popular model, as eight 3Gb/s SAS channels available to it for a total aggregate of 24Gb/s. Six times that of the most common SAN connections.

SANs which choose to use SSD, which is likely to take quite some time because SANs generally lean towards large capacity over performance, will suffer from a lack of throughput available but will have the benefit of eliminating almost all issues mentioned early in regards to drive contention from shared drive resources.

SAN is Confusing. While this factor comes into play less often, it still holds true that a majority of server “customers”, those people who utilize servers but are not the server or storage administrators, have a very poor understanding of SAN, NAS, DAS or filesystems in general and by introducing SAN we can inadvertantly introduce forms of complexity that cause communications and support issues. While not an issue with SAN itself, in some cases technical confusion can impede adoption even when the technology is appropriate.

Bottom Line. SAN suffers from performance, organization, cost and issues of complexity while local storage is well understood, extremely inexpensive, simple to manage and offers extreme performance. With rare exception, SAN, in my opinion, has little place competing with traditional direct attached storage options until DAS is unable to deliver necessary features such as resource sharing, certain types of replication, distance or capacity.

August 10, 2008: The Blissful Life of the Unemployed

Scott Alan Miller — Mon, 11 Aug 2008 04:28:37 +0000

Our high stress weekend continues. Nothing has changed – and that is the source of the stress. On Friday evening, when talking to real people with real influence, you get the sense that everything is fine and that come Monday morning we will be able to work things out and have a good resolution to the issue at hand. But then spending the weekend with no communications (even though we were not expecting any communications) gives ample time to sit around considering all of the things that could go wrong and to worry that things won’t go well Monday morning. Inaction, at least for me, is a huge source of stress.

Oreo had a great time at the party last night. He had a whole yard and house in which to run around freely and two dogs to play with. The one collie was eleven and very aged so they could not play but was very friendly and looking for attention from everyone. It is very sad seeing a sweet dog get so old.

Dudley was there, Katie’s dog, and he and Oreo spent a lot of time running around together. Oreo does not often get wide open space so it was a nice change for him. They played pretty well until some kabobs were given to the dogs and some territoriality came into play. In a surprise move, Duds, who is close to three times Oreo’s size, and a little argument with Oreo and in a flash Oreo was flipped over on his back and panicking. We had to pull them apart pretty quickly. That was the end of the fun night for Oreo. After that he just wanted to be held and to relax.

We had to sleep in a bit this morning just to make up for getting in so late last night. It was around ten thirty when we finally got out of bed. I did a little work in the office but only a tiny bit. Today is my last official day with a contract so I figured that I should at least do something, even if it was only symbolic.

We found out this morning that the Mazda PR5 is not going to be purchased as we had hoped. We have been waiting for the final approval of the purchase for two weeks, or so, thinking that everything was pretty much finalized and then today, in the midst of everything else, found out that they weren’t actually interested in it. Of course, bolstering my already hearty dislike for people’s concepts of “vacations”, we would have known this quite some time ago but people went “on vacation” and stopped communicating to the outside world – ignoring obligations because somehow some parts of society have approved the idea of a “vacation” as exempting the vacationers not only from their work obligations but from their personal ones as well.

I think that this concept is probably quite old. When I was a child (and obviously any time before that) going on a vacation (one that involved travel, at least) meant going to a remote location where postal mail and telephones were impossible to get or unreasonably expensive for anything less than a full emergency. But that world has past and today with the Internet, mobile phones, BlackBerries, etc. you are no less accessible while in a remote location than when sitting in your living room. Today, having a telephone that doesn’t reach you everywhere actually costs you more, usually, than one that does not reach you everywhere.

Basically, we live in a world when the traditional concept of escapism in vacations is no longer an intrinsic feature of travel but now requires active, intentional ingnorance (in the tradition, true meaning of the word as a derivitive of the word ignore.) You have to ignore people trying to reach you. You have to avoid responding to people. It is a completely different animal these days. And this phenominon is not new. Mobile phones have been making this shift occur since the early 1990s and the Internet has been changing it since the late 1990s. It has been roughly eight years now, a decently long time, that there has been little to no excuse to ever be out of reach for more than half a day or less. And now that most people use instant messaging and text messaging via mobile devices all day long any breach in ongoing communications because of a “vacation” has to be completely intentional.

I am not suggesting that people never stop working and never take a break from work. Moreso I am saying that personal responsibilities are not curtailed in any way by a claim of “vacationing” or being out of town. People have traditional used the idea of vacationing as a way to avoid responsibilities and communications because it was a difficult claim to dispute. No one would be able to know if you were truly stuck in a situation without communications or not. Today that is not true and there are so many, free or nominal cost communications modes and so little change between home, office and hotel in relation to those modes that not responding to responsibilities while away is exactly the same as not responding to them when standing face to face with someone.

If you want some sympathy from me in reference to you being helplessly out of reach you had better be backpacking through Kyrgystan and even there you will likely have intermittent phone and Internet access. There are very, very few places left on earth where you are truly out of touch and fewer and fewer people who are comfortable being in those situations. Most people today desperately want to keep in contact via email, phone, web, etc. Recently I even had a conversation with my friend David while he was hanging out in a cafe in Tunisia. He was just checking up on his email, FaceBook, etc. It’s far more interesting, I think, vacationing in places when you can still communicate to the outside world instead of just “disappearing” for a few days and then returning with some pictures.

All of that aside, we are rather happy that we are not selling the car as we think that we will most likely want to have it once the baby arrives in November. We need a car that can haul some things and will easily fit the baby’s car seat, Oreo, both of us and the baby’s things. The PR5 also gets good gas mileage and has amazing snow tires. It just had a bit of work done to it and has been sitting all summer not getting any older so its value to us is probably much higher than its street value and we had been planning on selling it at rather a bargain. So, other than a certain desperation for cash right at the moment because of the house, we would prefer to hold on to the car.

My afternoon was spent writing a very large BASH script that will take our newly built Castile Christian Academy workstations and turn them into fully ready desktops. It has to remove all of the unnecessary and inappropriate packages, change repositories, add in needed educational packages, change system files, detect the system’s identity and do all of our standard customizations. It is rather involved.

I got some word, finally, from the consulting firm this afternoon but it wasn’t encouraging. Basically, they claim that their hands are tied and they have no contracts to protect them. It would appear that doing the “right thing” is way too much effort and so instead they see me as a scape goat and are just passing the cuts on to me… including massive monetary gains for themselves. The original cut was just 7.5% but it escalated to 15.73% by the time that it reached me. That means that while there was a cut (which was at their discretion and they opted to take) at the beginning I am taking more of a cut than anyone and the only person losing here is me.

In fact, everyone else is making a fortune on the deal – coming completely out of my pockets. In addition, I took the furlough earlier in the year which was an additional 3.5% or so. So my total cut, between March and August comes to 19.5%!! This is insane. And they wonder why I won’t even discuss the possibility of accepting the cut. To make things even more stressful I have a very large amount of comp time and 401K money on the line that could very easily be taken away. At least things look promising to have my contract moved to another pass-through vendor, but who knows what all impacts there could be along the way. I think I need ulcer medication

For dinner we ordered in Brazilian Pizza again. It was awesome. We ate pizza and watched two episodes of Frasier. We are on the third season still, I think.

The weather is cooler today than it has been in a while so we decided to open the windows and let some fresh air into the apartment. The apartment has gotten musty and stale. The air conditioning units did not get cleaned like they are supposed to be because our bed takes up the entire room and there was no way to clear space to do the cleaning. Or at least we imagine that that is the reason. Nothing was said to us so we are giving the building the benefit of the doubt that the cleaning process even occurred. It might easily have not taken place at all.

I was doing some shopping on eBay and discovered an amazing price on a high effeciency Hewlett-Packard DL145 G3 rack mount AMD Opteron based server. It even comes with the rack mounting kit which is nice.

Andy called and we talked for an hour or so this evening. Then it was time to walk Oreo, wrap up SGL, do a little work for the office (in the minutes running up to the end of my contract), answer emails, update Twitter and head off for bed.

No wonder it is hard for me to ever actually make it to bed!

This coming Saturday, Dominica and I have Nadine and Clarence’s wedding to attend. So we will be gone for most of the day. Every moment that we are not gone I am scheduled to be working – although that is obviously in some question at this point.

Choosing a Linux Distro in the Enterprise

Scott Alan Miller — Thu, 10 Jul 2008 17:27:22 +0000

Linux is popular in big business today. No longer, and not for a long time now, has Linux been the purview of the geek community but it is a solid, core piece of today’s mainstream IT infrastructure. That being said, Linux is still plagued by confusion over its plethora of distributions. This being the case I have decided to weigh in with some guidance for businesses looking to use Linux in their organizations.

For those unfamiliar with the landscape, Linux is a family of operating systems that are generally considered to fall under the Unix umbrella although Linux is legally not Unix just highly Unix-like. Individual Linux packages are referred to as distributions or distros, for short. Unlike Windows or Mac OS X which come from a single vendor, Linux is available from many commercial vendors as well as from non-profit groups and individual distribution makers. Instead of there being just one Linux there are actually hundreds or thousands of different distributions. Each one is different in some way. This creates choice but also confusion. To make matters even worse some major vendors such as Red Hat and Novell release more than one Linux distribution targeted at different markets, and within a single distribution will often package features separately. This myriad of choices, before even acquiring your first installation disc does not help make Linux uptake in companies go any faster.

In reality the choices for business use are few and obvious with a little bit of research. To make things easier for you, I will just tell you what you need to know. Problem solved. Now if managing your Linux environment could be so easy!

Before we get started I want to stress that this article is about using Linux for enterprise infrastructure – that is, as a server operating system in a business. I am not looking into desktop Linux or high performance computational clusters and grid or specialty applications or home use. This article is about standard, traditional server applications that require stability, up time, reliability, accessibility, manageability, etc. If you are looking for my guide to the “ultimate Linux desktop environment”, this isn’t it. Desktops, even in the enterprise, do not necessarily have the same criteria as servers. They might, but not necessarily so.

When choosing a distribution for servers we must first consider the target purpose of the distro. Only a handful of Linux distros are built with the primary purpose of being used as a server. If your distro maintainer does not have the same principles in mind that you do it is probably best to avoid that distro for this particular purpose. Server distributions target longer time between releases, security over features, stability over features, rapid patching, support, documentation, etc.

In addition to targeting the distribution in harmony with our own goals we also need to work with a company that is reliable, has the resources necessary to support the product and has a track record with a successful product. Choosing a distribution is a vendor selection process. There are three key enterprise players in the Linux space: Red Hat, Novell and Canonical.

For many Red Hat is synonymous with Linux, having been one of the earliest American Linux distributions and having been a driving force behind the enterprise adoption of Linux globally. Red Hat makes “Red Hat Enterprise Linux”, known widely as RHEL, as well as Fedora Linux. Red Hat is the biggest Linux vendor and important in any Linux vendor discussion.

Novell is the second big Linux vendor having purchased German Linux vendor SUSE some years ago. Novell makes two products as well, Suse Linux Enterprise and OpenSUSE.

The third big Linux enterprise vendor is Canonical well known for the Ubuntu family of Linux distributions. While the Ubuntu distro family includes many members we are only interested in discussing Canonical’s own Ubuntu LTS distribution. LTS stands for “Long Term Support” and is effectively Canonical’s server offering. Their approach to versioning and packaging is quite different from Red Hat and Novell and can be rather confusing.

Before we become overwhelmed with choices (we have presented five so far) we have one here that we can further eliminate. Red Hat’s Fedora is not an “enterprise targeted” distributions. This is a “testing” and “community” platform designed primarily as a desktop and research vehicle and not as a stable server operating system. To be sure it is extremely valuable and a great contribution to the Linux community and has its place but as server operating system it does not shine. Nevertheless, without Fedora as a proving ground for new technologies it is unlikely that Red Hat Enterprise Linux would be as robust and capable as it is.

We can also effectively eliminate OpenSUSE. OpenSUSE is the unsupported, community driven sibling to Novell SUSE Linux Enterprise. However, unlike Fedora which is an independant product from RHEL, OpenSUSE is the same code base as SUSE Linux Enterprise but without Novell’s support. This is a great advantage to the SUSE product line as there is a very large base of home and hobby users in addition to the enterprise users all using the exact same code and finding bugs for each other. Going forward we will only consider SUSE Linux Enterprise as support is a key factor in the enterprise. But OpenSUSE, for shops not needing commercial support from the vendor, is a great option as the product is the same, stable release as the supported version.

So we are left with three serious competitors for your enterprise Linux platform: Red Hat Linux, Novell Suse Linux and Ubuntu LTS. All three of these competitors are solid, reliable offerings for the enterprise. Red Hat and Novell obviously have the advantage of having been in the server operating system market for a long time and have experience on their side. But Canonical has really made a lot of headway in the last few years and is definitely worth considering.

Red Hat Linux and Suse Linux Enterprise have a few key advantages over Ubuntu. The first is that they both share the standard RPM package management system. Because RPM is the standard in the enterprise it is well tested and understood and most Linux administrators are well versed in its functionality. Ubuntu uses the Debian based package format which is far less common and finding administrators with existing knowledge of it is far less likely – although this is changing rapidly as Ubuntu has become the leading home desktop Linux distribution recently.

In general, Red Hat Linux and Suse Linux Enterprise have more in common with each other making them able to share resources more easily and giving administrators a broader platform to focus skills upon. This is a significant advantage when it comes time to staff up and support your infrastructure.

Ubuntu suffers from having a directly tie to a “non-enterprise” operating system that is particularly popular with the desktop “tweaking” crowd. Unlike Red Hat and Suse, Ubuntu is coming at the enterprise from the home market and brings a stigma with it. Administrators trained on RHEL, for example, tend to be taught enterprise type tasks performed in a business like manner. Administrators with Ubuntu experience tend to be home users who have been running Linux for their own desktop and entertainment tasks. This makes the interview and hiring process that much more difficult. This is in no way a slight against the Ubuntu LTS product which is an amazing, enterprise-ready operating system which should seriously be considered, but shops need to be aware that the vast majority of Ubuntu users are not enterprise system administrators and their experience may be mostly from a non-critical desktop focused role. It is rare to find anyone running RHEL or Suse Linux in this manner.

In my own experience, having software popular with home users in the enterprise also brings in factors of misguided user expectations. Users expect the enterprise installations to include any package that the users can install at home and that update cycles be similar. This can cause additional headaches although the Windows world has been dealing with these issues since the beginning.

At this point you have probably noticed that choosing either Suse and Ubuntu leaves you with the option of both free and fully supported versions, direct from the vendors. This is a major feature of these distributions because it provides a great cost savings and greater flexibility. For example, development machines can be run on OpenSUSE and production machines on Suse Enterprise lowering the overall cost if full support isn’t necessary for development environments. You can run labs from free versions for learning and testing or only pay for support for critical infrastructure pieces. Or, if you are really looking to save money or feel that your internal support is good enough, running completely on the free, unsupported versions is a viable option because you are still using the stable, enterprise-class code base.

Red Hat, as a vendor, does not supply a freely available edition of Red Hat Enterprise Linux. Instead, they make their code repositories available to the public and expect interested parties to build their own version of RHEL using these repositories. If you are interested in a freely available version of RHEL, look no further than CentOS.

CentOS, or the Community ENTerprise Operating System, is a code identical rebuild of RHEL. It is identical in everyway except for branding. CentOS is completely free – but unsupported. CentOS is used in organizations of all sizes exactly like a free copy of RHEL would be expected to be used and many businesses choose to run CentOS exclusively. As RHEL is the most popular Linux distribution in large businesses and as the commercially support version is rather expensive, CentOS also provides a very important resource to the community by allowing new administrators to experience RHEL at home without the expense of unneeded support.

Choosing between the Red Hat, Suse and Ubuntu families is much more difficult than whittling the list down to these three. In many cases choosing between these three will be based upon cost, application demands, existing administration experience and features. It is not uncommon for larger businesses to use two or possibly all three of these distributions as features are needed, but most commonly a single distribution is chosen for ease of management. All three distributions are solid and capable.

Another potentially deciding factor is if your enterprise is considering using Linux on the desktop. While RHEL can be used as a desktop operating system it is generally considered to be substantially weaker than Suse and Ubuntu when it comes to desktop environments. Because of this, Fedora is generally seen as Red Hat’s desktop option but this is not supported by Red Hat nor does it share a code base with RHEL causing support to be somewhat less than unified even though to two are very similar.

For mixed server and desktop environments, Suse and Ubuntu have a very strong lead. Both of these distributions focus a great many resources onto their desktop systems and they keep these components very much up to date and pay great attention to the user experience. For a small company that can manage to use only one single distribution on every machine that they own this can be a major advantage. Homogeneous environments can be extremely cost effective as a much narrower skill set is needed to manage and support them.

In conclusion: Red Hat Enterprise Linux, Novel Suse Enterprise and Ubuntu LTS, in both their supported versions as well as in their free versions (CentOS in the case of RHEL and OpenSUSE in the case of Novell, Ubuntu uses the same package) all represent great opportunities for the data center. Do not be lulled into using non-enterprise Linux distributions because they are cool, flashy or popular. Linux lends itself to being in the news often and to generating excess hype. None of these things are good indicators of data center stability. The data center is a serious business component and should not be treated lightly. Linux is a great choice for the corporate IT department but you will be very unhappy if you pick your backbone server architecture based on its popularity as a gaming platform rather than on its uptime and management cost.

Linux’ kscand

Scott Alan Miller — Wed, 02 Apr 2008 21:54:40 +0000

In Linux the kscand process is a kernel process which is a key component of the virtual memory (vm) system. According to Unix Tech Tips & Tricks’ excellent Understanding Virtual Memory article “The kscand task periodically sweeps through all the pages in memory, taking note of the amount of time the page has been in memory since it was last accessed. If kscand finds that a page has been accessed since it last visited the page, it increments the page’s age counter; otherwise, it decrements that counter. If kscand finds a page with its age counter at zero, it moves the page to the inactive dirty state.”

For the majority of Linux users and even system administrators on large servers this kernel process requires no intervention. It is a simple process that works in the background doing its job well. Nonetheless, under certain circumstances it can become necessary to tune kscand in order to improve system performance in a desirable way.

Issues with kscand are most likely to arise in a situation where a Linux box has an extremely large amount of memory and will be even more noticeable on boxes with slower memory. The most notable is probably the HP Proliant DL585 G1 which can support 128GB or memory but in doing so drops bandwidth to a paltry 266MHz. I first came across this particular issue on a server with 32GB of memory with approximately 31.5GB of it in use. No swap space was being used and most of the memory was being used for cache so there was no strain on the memory system but the total amount of memory being scanned by the kscand process is where the issue truly lies.

Even on a busy server with gobs of memory (that’s the technical term) it would be extremely rare that kscand would cause any issues. It is a very light process that runs quite quickly. You are most likely to see kscand as a culprit when investigating problems with latency sensitive applications on memory intensive servers. The first time that I came across the need to tune kscand was while diagnosing a strange latency pattern of network traffic going to a high-performance messaging bus. The latency was minor but small spikes were causing concern in the very sensitive environment. kscand was spotted as the only questionable process receiving much system attention during the high latency periods.

Under normal conditions, that is default tuning, kscand will run every thirty seconds and will scan 100% of the system memory looking for memory pages that can be freed. This sweep is quick but can easily cause measurable system latency if you look carefully. Through carefull tuning we can reduce the latency caused by this process but we do so as a tradeoff with memory utilization efficiency. If you have a box with significantly extra memory or extremely static memory, such as large cache sizes that change very slowly, you can safely tune away from memory efficiency towards low latency with nominal pain and good results.

kscand is controlled by the proc filesystem with just the single setting of /proc/sys/vm/kscand_work_percent. Like any kernel setting this can be changed on the fly to a live system (be careful) or can be set to persist through reboots by adding it to your /etc/sysctl.conf file. Before we make any permanent changes we will want to do some testing. This kernel parameter tells kscand what percentage of the system memory to scan each time that a memory scan is performed. Since it is normally set to 100 kscand normally scans all in-use memory each time that it is called. You can verify you current setting quite easily.

cat /proc/sys/vm/kscand_work_percent

A good starting point with kscand_work_percent is to set to 50. A very small adjustment may not be noticeable so seeing 100 and then 50 should provide a good starting point for evaluating the changes in system performance. It is not recommended to set kscand_work_percent below 10 and I would be quite wary of dropping even below 20 unless you truly have a tremendous amount of unused memory and your usage is quite static.

echo 50 > /proc/sys/vm/kscand_work_percent

Once you have determined the best balance of latency and memory utilization that makes sense for your environment you can make you changes permanent. Be sure to only use the echo technique if this is the first time that this will be added to the file. You will need to edit it by hand after that.

echo "kscand_work_percent = 50" >> /etc/sysctl.conf

Keep in mind that the need to edit this particular kernel parameter is extremely uncommon and will need to be done only under extraordinary circumstances. You will not need to do this in normal, everyday Linux life and even a senior Linux administrator could easily never have need to modify this setting. On very specific conditions will cause this performance characteristic to be measurable or its modification to be desirable.

All of my testing was done on Red Hat Enterprise Linux 3 Update 6. This parameter is the same across many versions although the performance characteristics of kscand vary between kernel revisions so do not assume that the need to modify the parameters in one situation will mean that it is needed in another.

RHEL 3 prior to update 6 had a much less efficient kscand process and much greater benefit is likely to be found moving to a later 2.4 family kernel revision. RHEL 4 and later, on the 2.6 series kernels, is completely different and the latency issues are, I believe, less pronounced. In my own testing the same application on the same servers moving from RHEL 3 U6 to RHEL 4.5 removed all need for this tweak even under identical load. [Edit – In RHEL 4 and later (kernel series 2.6) the kscand process has been removed and replaced with kswapd and pdflush.]

Things that are likely to impact the behavior of kscand that you should consider include the following:

Total Used Memory Size, regardless of total available memory size. The more you have the more kscand will impact you. Determined by: free -m | grep Mem | awk '{print $3}'
Memory Latency, check with your memory hardware vendor. Higher latency will cause kscand to have a larger impact.
Memory Bandwidth. Currently in speeds ranging from 266MHz to 1066MHz. The slower the memory the more likely a scan will impact you and tuning will be useful.
Value in kscand_work_percent. The lower the value the lower the latency. The higher the value the better the memory utilization.
Memory Access Hops. Number of system bus hops necessary to access memory resources. For example a two socket AMD Opteron server (HP Proliant DL385) never has more than one hop. But a four socket AMD Opteron server (HPProliant DL585) can have two hops increasing effective memory latency. So a DL585 is more likely to be affected than a DL385 with all other factors being equal (as long as all three or four processor sockets are occupied.)