A few weeks ago I started down the road to building a server that boots from a 32GB MicroSD card. See the post at
http://blog.smallbizthoughts.com/2016/08/booting-hp-proliant-server-from-sd-card.html.
In that post I promised a follow-up on using the Integrated Lights Out (ILO) card to manage the machine and run basic diagnostics on the MicroSD card. So here we go.
If you haven't used an ILO (or Dell's DRAC - Dell Remote Access Card), I highly recommend you take a look. These devices basically have two levels of management. Basic management is configured by default and allows you to see a bit of what's going on, manage the power, boot up remotely, and get some logging. Advanced management can give you total machine control and sophisticated remote management at both the console level and once the machine is booted up and logged on.
We're going to talk about the basic management. And in particular, the SD card diagnostics.
Note: I have no connection to HP whatsoever. If you want the official word on what's what with ILO, start here:
https://www.hpe.com/us/en/servers/integrated-lights-out-ilo.html.
Physical Stuff
The ILO can be accessed from one of two ports. There is a dedicated ILO port on the HP MicroServer. But if you have limited network connections, you can also configure one of the two standard GB NICs to also access the ILO port. I never do this. We always install servers in racks or right next to the network equipment. So I can always run an extra cable to the ILO port.
You can port-forward from an Internet address to the ILO port if you wish to access the machine remotely. A very handy way to do this, even if you only have one public IP, is to translate ports at the firewall. So, for example, xxx.151.126.12 is the public address. Configure the firewall to forward ports 80 and 443 to the Server's ILO port. Or, better yet, translate other ports such as xxx.151.126.12:9999 and xxx.151.126.12:9998 to :80 and :443 respectively.
Once you log into the ILO, you can manage power. This is pretty basic stuff. You can set the "default" configuration of power to be always on, always off, or in last state after a power outage. You can also turn the power on and boot up the machine remotely. This is extremely handy if a machine is off and you are remote.
Server Console
Once the machine starts to boot, you can use ILO to access the console view. There are several options for this. I've had the most luck with the basic Java web app. Click the button and up comes the console. This allows you to press the relevant keys to configure the RAID array, CMOS, and anything else you might be managing if you were sitting in front of the machine.
One great example of how you might use this is to break a mirror and boot from the good drive. That configuration change is identical whether you're sitting at the machine or connecting remotely.
If you only have the basic management license, note that the console will disconnect once the operating system boots. If you want to continue remote management after that you'll need the upgraded ILO license or a remote management tool such as Team Viewer or VNC. Of course you may also use Remote Desktop Connection or another remote access tool built into the operating system.
Diagnostics
The other nice thing about ILO is that you can get some good information about how smoothly your server is operating at the hardware level. From basics like to temperature to troubleshooting the storage array, there's lots of information to be had - even at the basic management level.
In the previous blog post I was concerned about the lifetime of the poor little 32GB MicroSD card I was using to boot the machine. Recall that my O.S. is Windows Server 2012 R2 Essentials with the sysvol and swap files on the "D" drive (a spinning SATA drive).
The MicroSD card is rated as having a lifetime of 13,107,200,000 committed block writes. HP has designed their MicroSD cards so that they have error correcting code that allows them to do a bit of caching and reduce the number of write commits. Still, I was worried about how long this setup would last. So I started a little tracking.
That's when a little weirdness reared it's head.
First, let's look at some lifetime estimates. With just over 13 billion write commits, we would expect the following lifespan for this little card:
5 Million writes per day = 1.825 Bil writes per year. Expected lifespan = 7.18 years.
10 Million writes per day = 3.65 Bil writes per year. Expected lifespan = 3.59 years.
20 Million writes per day = 7.3 Bil writes per year. Expected lifespan = 1.8 years.
50 Million writes per day = 9.7 Bil writes per year. Expected lifespan = .72 years.
Of course I have no idea how many write commits a server performs in a year. That's never mattered to me before. BUT I have ILO! So one of the easiest things I can do is bring up the page above and record the "Write Counter" stat for the MicroSD card.
In the server build process and the first two weeks of life, I was adding software, setting up users, applying updates, etc. In other words, I was doing lots of work related to
building the machine as opposed to
using the machine for daily use. During this period, daily write commits as reported by the ILO screen above ranged from from 2 million/day to 26 million/day.
That's good. The peak of 26.5M/day would give me an expected lifespan of 1.36 years. That's a bit short in my opinion. I really want to set up a server and not perform a major operating like moving the operating system to new media for at least three years. Our plan is to replace it in three years, so I want an ideal lifespan of four years in case there are delays.
It looks like the overall average for the machine in daily use has settled down to about 8-10 million writes per day. So that gives me about 3.5-4.0 years of service. Good enough!
Weirdness
Sadly, there's a problem with the diagnostics.
On three different occasions, the write counter for the MicroSD card reset itself! That's not good, especially when I'm counting down a precious resource critical to the life of the machine.
At first I thought this might be due to a hardware change. I noticed one reset the day I installed extra RAM.
The reset is definitely not related to shutting down, rebooting, unplugging, or moving from one place to another after a shutdown. All those things appear in the ILO event log (well, except moving from one place to another). :-)
There are notes about the ILO being reset, but these do not correlate with the diagnostic counter being reset. I believe the ILO reset is literally a hardware reset. When I click the ILO Reset button on the diagnostics page, I get kicked out and can get right back in. The Write Counter is not affected.
I verified that all HPE firmware updates were applied when I set up the server. They were. But you never know when there's an update while you're doing something.
Bottom Line (for now)
I'm going to keep an eye on this. My great fear is that, with the counter resets, I have no idea what the expected lifetime of the server's operating system disc is. I feel very confident that it's at least a year. Maybe two. But I don't have confidence that it will last three or four years.
Again, my ideal goal would be to find a setup I can "set and forget" for at least three and preferably four years.
My biggest concern when I started down this road was the write-commit limit of the SD card. And that is exactly the concern I continue to have with a working machine.
In case you're interested, the machine is nice and fast and works perfectly.
Just don't know for how long.
:-)
- - - - -
Questions and comments welcome.
Check Out the #1 Best-Selling book on Managed Services ever!
Managed Services in A Month
by Karl W. Palachuk
3nd Edition - Newly Revised and Updated with TEN new chapters
|
Paperback - Ebook - Audio Book
Unlike some books with old copyrights that sell for $60 or more, this book is 100% up to date and is only $29.95.
Now includes information on making cloud services part of your managed service offering!
Learn More!
|