Tuesday, September 06, 2016

Followup: HP MicroSD Boot and ILO Diagnostics

A few weeks ago I started down the road to building a server that boots from a 32GB MicroSD card. See the post at http://blog.smallbizthoughts.com/2016/08/booting-hp-proliant-server-from-sd-card.html.

In that post I promised a follow-up on using the Integrated Lights Out (ILO) card to manage the machine and run basic diagnostics on the MicroSD card. So here we go.

If you haven't used an ILO (or Dell's DRAC - Dell Remote Access Card), I highly recommend you take a look. These devices basically have two levels of management. Basic management is configured by default and allows you to see a bit of what's going on, manage the power, boot up remotely, and get some logging. Advanced management can give you total machine control and sophisticated remote management at both the console level and once the machine is booted up and logged on.

We're going to talk about the basic management. And in particular, the SD card diagnostics.

Note: I have no connection to HP whatsoever. If you want the official word on what's what with ILO, start here: https://www.hpe.com/us/en/servers/integrated-lights-out-ilo.html.


Physical Stuff

The ILO can be accessed from one of two ports. There is a dedicated ILO port on the HP MicroServer. But if you have limited network connections, you can also configure one of the two standard GB NICs to also access the ILO port. I never do this. We always install servers in racks or right next to the network equipment. So I can always run an extra cable to the ILO port.

You can port-forward from an Internet address to the ILO port if you wish to access the machine remotely. A very handy way to do this, even if you only have one public IP, is to translate ports at the firewall. So, for example, xxx.151.126.12 is the public address. Configure the firewall to forward ports 80 and 443 to the Server's ILO port. Or, better yet, translate other ports such as xxx.151.126.12:9999 and xxx.151.126.12:9998 to :80 and :443 respectively.

Once you log into the ILO, you can manage power. This is pretty basic stuff. You can set the "default" configuration of power to be always on, always off, or in last state after a power outage. You can also turn the power on and boot up the machine remotely. This is extremely handy if a machine is off and you are remote.


Server Console

Once the machine starts to boot, you can use ILO to access the console view. There are several options for this. I've had the most luck with the basic Java web app. Click the button and up comes the console. This allows you to press the relevant keys to configure the RAID array, CMOS, and anything else you might be managing if you were sitting in front of the machine.

One great example of how you might use this is to break a mirror and boot from the good drive. That configuration change is identical whether you're sitting at the machine or connecting remotely.

If you only have the basic management license, note that the console will disconnect once the operating system boots. If you want to continue remote management after that you'll need the upgraded ILO license or a remote management tool such as Team Viewer or VNC. Of course you may also use Remote Desktop Connection or another remote access tool built into the operating system.



Diagnostics

The other nice thing about ILO is that you can get some good information about how smoothly your server is operating at the hardware level. From basics like to temperature to troubleshooting the storage array, there's lots of information to be had - even at the basic management level.

In the previous blog post I was concerned about the lifetime of the poor little 32GB MicroSD card I was using to boot the machine. Recall that my O.S. is Windows Server 2012 R2 Essentials with the sysvol and swap files on the "D" drive (a spinning SATA drive).

The MicroSD card is rated as having a lifetime of 13,107,200,000 committed block writes. HP has designed their MicroSD cards so that they have error correcting code that allows them to do a bit of caching and reduce the number of write commits. Still, I was worried about how long this setup would last. So I started a little tracking.

That's when a little weirdness reared it's head.

First, let's look at some lifetime estimates. With just over 13 billion write commits, we would expect the following lifespan for this little card:

5 Million writes per day = 1.825 Bil writes per year. Expected lifespan = 7.18 years.

10 Million writes per day = 3.65 Bil writes per year. Expected lifespan = 3.59 years.

20 Million writes per day = 7.3 Bil writes per year. Expected lifespan = 1.8 years.

50 Million writes per day = 9.7 Bil writes per year. Expected lifespan = .72 years.

Of course I have no idea how many write commits a server performs in a year. That's never mattered to me before. BUT I have ILO! So one of the easiest things I can do is bring up the page above and record the "Write Counter" stat for the MicroSD card.

In the server build process and the first two weeks of life, I was adding software, setting up users, applying updates, etc. In other words, I was doing lots of work related to building the machine as opposed to using the machine for daily use. During this period, daily write commits as reported by the ILO screen above ranged from from 2 million/day to 26 million/day.

That's good. The peak of 26.5M/day would give me an expected lifespan of 1.36 years. That's a bit short in my opinion. I really want to set up a server and not perform a major operating like moving the operating system to new media for at least three years. Our plan is to replace it in three years, so I want an ideal lifespan of four years in case there are delays.

It looks like the overall average for the machine in daily use has settled down to about 8-10 million writes per day. So that gives me about 3.5-4.0 years of service. Good enough!


Weirdness

Sadly, there's a problem with the diagnostics.

On three different occasions, the write counter for the MicroSD card reset itself! That's not good, especially when I'm counting down a precious resource critical to the life of the machine.

At first I thought this might be due to a hardware change. I noticed one reset the day I installed extra RAM.

The reset is definitely not related to shutting down, rebooting, unplugging, or moving from one place to another after a shutdown. All those things appear in the ILO event log (well, except moving from one place to another). :-)

There are notes about the ILO being reset, but these do not correlate with the diagnostic counter being reset. I believe the ILO reset is literally a hardware reset. When I click the ILO Reset button on the diagnostics page, I get kicked out and can get right back in. The Write Counter is not affected.

I verified that all HPE firmware updates were applied when I set up the server. They were. But you never know when there's an update while you're doing something.


Bottom Line (for now)

I'm going to keep an eye on this. My great fear is that, with the counter resets, I have no idea what the expected lifetime of the server's operating system disc is. I feel very confident that it's at least a year. Maybe two. But I don't have confidence that it will last three or four years.

Again, my ideal goal would be to find a setup I can "set and forget" for at least three and preferably four years.

My biggest concern when I started down this road was the write-commit limit of the SD card. And that is exactly the concern I continue to have with a working machine.

In case you're interested, the machine is nice and fast and works perfectly.

Just don't know for how long.

:-)

- - - - -

Questions and comments welcome.

4 comments:

  1. Hi Karl,

    The SDCard slot in the HP servers is intended for provisioning an OS (i.e. boot to the SDCard, which deploys an OS to more write-tolerant media), at least according to the HP support people I've dealt with.

    The only way I'd be running a production OS off an SDCard is if that OS was ESXi, any other *nix based system where the file systems on the SDCard were mounted read-only, or Windows Embedded/Windows 10 with the Write Filter enabled. There's no way I'd put a Windows Server image on an SDCard, even if I could redirect everything off to other storage, because the write profile of Windows sucks.

    As a reference, I'm onto my third SDCard in as many years on my Raspberry Pi that pulls in weather data every 5 minutes, generating graphs and web pages, as well as retrieving stats from my solar panels and pushing them to pvoutput.
    It's not a big deal, as I've scripted an SDCard image build from the backups of the Raspberry Pi. Glad I did this, as the failure is sudden and ugly.

    You might be better off with a large USB flash drive that uses SLC memory and reducing the partition size down to a comfortable minimum, thereby giving further over-provisioning capacity for the wear levelling algoritms to extend the lifetime of the flash drive. You'd still want to push as much write-intensive stuff off to other storage though.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. Hey Chris. Thanks for the note. The intelligent provisioning for the SD card has only a few options, including Windows and their favorite flavor of Linux. It does not have an option to configure the SD card as you say. Since you can choose to boot to any medium on the machine, including the internal USB, I'm not sure why you'll need an SD card to bootstrap that.

    Booting to USB doesn't appeal to me since only USB 2 is visible by the hardware until after the OS is loaded. So that would be a pretty slow interface compared to SATA (SSD or spinning).

    When I get back from Colombia I'll update stats on this just for fun.

    ReplyDelete
  4. OK. Back from Columbia.
    Now have an additional two weeks data.

    The write counter is now at 341,123,548. I assume there has not been a resent since the last one reported above. There is literally no activity in the ILO event log for the last 15 days. So that's about 22 million write commits per day, giving an estimated remaining lifetime of about 18 months.

    That's enough to tell me that this is a great solution for one year, but not for an expected lifespan of three years.

    So I'll chalk it up to a delightful learning experience and set the system to boot from the hard drives again.

    This may be a technology with no practical purpose since it's being eclipsed right now by SSD and Hybrid SSD drives.

    ... It's never a dull moment in the world of technology!
    - kp

    ReplyDelete

Feedback Welcome

Please note, however, that spam will be deleted, as will abusive posts.

Disagreements welcome!