Wednesday, July 05, 2017

Storage Server: FreeNAS: use your SSD efficiently

FreeNAS: use your SSD efficiently

ZIL and Cache

Not open for discussion; I think it is a complete waste of resources to use a 120, or 250GB SSD for logs, let alone cache, as FreeNAS will (and should!) use RAM for that. So, I searched and found a way to create two partitions on a single SSD, and expose these as ZIL (ZFS Intended Log) and cache to the pool.
Mind you - there are performance tests around, questioning the efficiency of ZIL and cache. I am not going to test this, I will just add ZIL and cache - I have an SSD especially for this purpose, but I am just mentioning the fact that the use case might differ for you.

The pool?

Erhm, in the mean time, I have created a pool, existing of two virtual devices, existing of 6 harddisks each. According to several sources on the interweb, this will deliver resiliance, as well as performance.
I set this as goal to begin with. In the FreeNAS GUI, it looks like:
  First of all, find start the command line interface (CLI). You may opt for a remote SSH session, or use the IPMI, or use the "Shell" link in the GUI.I'd use something that allows copy/paste from the screen - a remote SSH would allow for that.
Now, to find your SSD:
root@store1:~ # camcontrol devlist <ATA TOSHIBA MQ03ABB3 0U> at scbus0 target 0 lun 0 (pass0,da0) <ATA TOSHIBA MQ03ABB3 0U> at scbus0 target 1 lun 0 (pass1,da1) <ATA TOSHIBA MQ03ABB3 0U> at scbus0 target 2 lun 0 (pass2,da2) <ATA TOSHIBA MQ03ABB3 0U> at scbus0 target 3 lun 0 (pass3,da3) <ATA TOSHIBA MQ03ABB3 0U> at scbus0 target 4 lun 0 (pass4,da4) <ATA TOSHIBA MQ03ABB3 0U> at scbus0 target 5 lun 0 (pass5,da5) <ATA TOSHIBA MQ03ABB3 0U> at scbus0 target 6 lun 0 (pass6,da6) <ATA TOSHIBA MQ03ABB3 0U> at scbus0 target 7 lun 0 (pass7,da7) <TOSHIBA DT01ACA300 MX6OABB0> at scbus1 target 0 lun 0 (pass8,ada0) <TOSHIBA DT01ACA300 MX6OABB0> at scbus2 target 0 lun 0 (pass9,ada1) <TOSHIBA DT01ACA300 MX6OABB0> at scbus3 target 0 lun 0 (pass10,ada2) <TOSHIBA DT01ACA300 MX6OABB0> at scbus4 target 0 lun 0 (pass11,ada3) <Samsung SSD 850 EVO 250GB EMT02B6Q> at scbus5 target 0 lun 0 (pass12,ada4) <TOSHIBA DT01ACA300 MX6OABB0> at scbus6 target 0 lun 0 (pass13,ada5) <Kingston DT microDuo PMAP> at scbus8 target 0 lun 0 (pass14,da8) <USB Flash Disk 1100> at scbus9 target 0 lun 0 (pass15,da9)
OK - my SSD is at ada4. Check if it is formatted, or partitioned
root@store1:~ # gpart show ada4 gpart: No such geom: ada4.
I isn't. If it were, I would have to destroy that. Now create GPT-based partitions, align them on 4k, leave the first 128 byte alone, so BSD/ZFS can do their magic in the first 128 byte of the disk, and finally, make it a freebsd partition type:
root@store1:~ # gpart create -s gpt ada4 ada4 created root@store1:~ # gpart add -a 4k -b 128 -t freebsd-zfs -s 20G ada4 ada4p1 added root@store1:~ # gpart add -a 4k -t freebsd-zfs -s 80G ada4 ada4p2 added
Note, I only specifiy the starting block once, on the first partition to be created.
Also, I used one 20GB partition, and one 80GB, still leaving about 120GB untouched. That's overprovisioning for ya!
The 20GB will be LOG, and the 80GB will be cache.
Now, let's find the guid's of the partitions, in order to add them to the pool:
root@store1:~ # gpart list ada4 Geom name: ada4 modified: false state: OK fwheads: 16 fwsectors: 63 last: 488397127 first: 40 entries: 128 scheme: GPT Providers: 1. Name: ada4p1 Mediasize: 21474836480 (20G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: 3d70bd91-5a52-11e7-ab6a-d05099c1356a rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 21474836480 offset: 65536 type: freebsd-zfs index: 1 end: 41943167 start: 128 2. Name: ada4p2 Mediasize: 85899345920 (80G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: 6b11bc08-5a52-11e7-ab6a-d05099c1356a rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 85899345920 offset: 21474902016 type: freebsd-zfs index: 2 end: 209715327 start: 41943168 Consumers: 1. Name: ada4 Mediasize: 250059350016 (233G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0
I will use the rawuuid codes, so a session with copy-n-paste functionality would be handy... Let's check, shorthand (note the 128 byte offset)
root@store1:~ # gpart show ada4 => 40 488397088 ada4 GPT (233G) 40 88 - free - (44K) 128 41943040 1 freebsd-zfs (20G) 41943168 167772160 2 freebsd-zfs (80G) 209715328 278681800 - free - (133G)
Now, let's add 20GB LOG to the tank1 volume, and 80GB CACHE:
root@store1:~ # zpool add tank1 log gptid/3d70bd91-5a52-11e7-ab6a-d05099c1356a root@store1:~ # zpool add tank1 cache gptid/6b11bc08-5a52-11e7-ab6a-d05099c1356a
Now, the pool looks like this:
root@store1:~ # zpool list -v NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT freenas-boot 7.06G 744M 6.34G - - 10% 1.00x ONLINE - mirror 7.06G 744M 6.34G - - 10% da8p2 - - - - - - da9p2 - - - - - - tank1 32.5T 1.70M 32.5T - 0% 0% 1.00x ONLINE /mnt raidz2 16.2T 1.09M 16.2T - 0% 0% gptid/04919b88-5a4d-11e7-ab6a-d05099c1356a - - - - - - gptid/0541d551-5a4d-11e7-ab6a-d05099c1356a - - - - - - gptid/05f6e0ac-5a4d-11e7-ab6a-d05099c1356a - - - - - - gptid/072ed4f7-5a4d-11e7-ab6a-d05099c1356a - - - - - - gptid/08553e1a-5a4d-11e7-ab6a-d05099c1356a - - - - - - gptid/0994cc2f-5a4d-11e7-ab6a-d05099c1356a - - - - - - raidz2 16.2T 624K 16.2T - 0% 0% gptid/3a8f5a91-5a4f-11e7-ab6a-d05099c1356a - - - - - - gptid/3b38fc02-5a4f-11e7-ab6a-d05099c1356a - - - - - - gptid/3c30f8f7-5a4f-11e7-ab6a-d05099c1356a - - - - - - gptid/3d2d5c9c-5a4f-11e7-ab6a-d05099c1356a - - - - - - gptid/3e2fff05-5a4f-11e7-ab6a-d05099c1356a - - - - - - gptid/3f3aafe4-5a4f-11e7-ab6a-d05099c1356a - - - - - - log - - - - - - gptid/3d70bd91-5a52-11e7-ab6a-d05099c1356a 19.9G 0 19.9G - 0% 0% cache - - - - - - gptid/6b11bc08-5a52-11e7-ab6a-d05099c1356a 80.0G 1K 80.0G - 0% 0%
So - there it is. 32TB of raw storage, in 2 ZFS2 VDEVs, leaving about 21TB usuable space.
Next entry will be about datasets, snaphots and performance

Storage Server: Firmware

Firmware

The first thing to do, in order to get any software RAID program to run, is to flash the controller out of RAID mode. Only then all of the disks will be seen as just a bunch of disks - nothing else. JBOD that is, for short.
The board I have, comes with a LSI SAS2308 controller, to with I want to connect 12 SATA drives using three SAS-to-SATA breakout cables.

Drivers

There are two locations you can get the drivers for those LSI2308 controllers: the SuperMicro site, which at the time of writing offers the P20 version, and the ASRock Rack site, that describes quite nicely how to flash to P16, in order to support freeNAS.

My current verion is P19, RAID mode:
LSI Corporation SAS2 Flash Utility Version 17.00.00.00 (2013.07.19) Copyright (c) 2008-2013 LSI Corporation. All rights reserved Adapter Selected is a LSI SAS: SAS2308_1(D1) Controller Number : 0 Controller : SAS2308_1(D1) PCI Address : 00:02:00:00 SAS Address : 5d05099-0-0000-5198 NVDATA Version (Default) : 11.00.00.01 NVDATA Version (Persistent) : 11.00.00.01 Firmware Product ID : 0x2714 (IR) Firmware Version : 19.00.00.00 NVDATA Vendor : LSI NVDATA Product ID : Undefined BIOS Version : 07.37.00.00 UEFI BSD Version : N/A FCODE Version : N/A Board Name : ASRSAS2308 Board Assembly : N/A Board Tracer Number : N/A
Note the "(IR)" in the Firmware Product ID section; I want that to read "(IT)". And, as freeNAS now supports P20, I'll go for the P20 version, off the SuperMicro site.

Flashing

Download, unzip and copy the DOS subdirectory contents to a bootable USB stick. I use Rufus and a freedos image for that purpose. Boot from it, and start the flashing:



Never mind the typo :)
After a while, you will see this:


You can find the address as "SAS Address" listed above (which output of the ASRock utility), or you can find it on the controller configuration pages (make sure the controller is marked as bootable, and press CTRL-C)
On this screen copy, you can see the new firmware version, 20, and mode: IT. Which is what I wanted. You can also see the address, by the way, formatted a bit differently. The last nine digit are displayed as '0:00005198'; SuperMicro seems to use a prefix of '5003048', the original ASRock being '5D0599'. We'll find out if it makes a difference in the next sequence: Storage Server: Software - FreeNas!

Storage Server: Software - FreeNAS

Software: FreeNAS

All hardware has been implemented, all 13 harddisks and one SSD are connected, serial numbers, as well as physical, and logical locations noted.
Cramming 4 2.5" disks in the CD bay
Time to add some software. I will install the latest and the greatest(?) FreeNAS software, V11.

Installation

The installation, due to IPMI being capable of mounting remote images, is a walk in the park. With the machine powered down, I mount the CD image:
At the boot process, I press F11 to get the boot options menu, and I choose the virtual CD:
You will be greeted by the FreeNAS installer screen (well... actually, there's a Grub message before that). Just hit the retrun button, and be patient.
At some point, there will be messages scrolling over your sccreen.
Then, you'll see this - just take option 1: Install.
Scroll to your USB drive, or drives. Slight differences in size (yes, amazingly, 8GB drives are not 8GB drives!) do not matter in this stage; FreeNAS will resize the larger one to the size of the smaller. Select the dirve(s) you want to install on, using the spacebar.
Yeah, I know - I selected these two.
Make up a password.
As I have a UEFI motherboard, I select UEFI.
Be patient...the installations process is quite slow; this may be due to the USB drives - don't know.

At some point you can restart the machine, and you will see the menu in IPMI:
You should now be seeting up a fixed IP-address, or two - depending on your requirements. I will create a volume now, and expand on cost-efficient use of the SSD - after all, I'm Dutch ;).
This continues with: Storage Server: FreeNAS: use your SSD efficiently

Sunday, June 25, 2017

Asrock E3C224DI-14S IPMI not reachable

Storage Server: Documentation missing

There's definately some documentation missing on the IPMI settings. I managed to lock myself out of the IPMI (also know as 'Integrated Light Out', or ILO) management interface. Not sure how I managed to do that, but in the quest to find out how to restore devine powers, I noticed quite a lot of people suffered from this. And, the solution is quite simple, when you know it. As usual...

Two configurations

The cause of error probably was me updating the network configuration, using the dashboard, instead of using the BIOS update.
IPMI configuration at BIOS
Please note, in the BIOS, you have eth0 and eth1. Eth0 usually is the first interface, so when you assume this would be the IPMI interface, you assumed as I.
Link to Network configuration on the dashboard
Which is completely and utterly wrong... Eth1, which has the label IPMI for a reason, is the correct one, and is found as Port 8 (above the 2 USB ports).

what's with the eth0/eth1, then?

It turns out there's a nifty, quite undocumented feature, of IPMI fallback. the BIOS eth0/NCSI item can be used as IPMI fallback - in case eth1 is not connected... I found this because after the lock out, I could actually use the IPMI when all cables were plugged in. The NCSI port is port 6, or LAN1 (designated as such in the manual), or eth0 (as seen in the BIOS BMC configuration). For completeness sake, LAN2 is not mentioned in the BIOS, only in the Megarac SP configuration (and hard to find).

My recommendation

Stay away from the megarac SP network configration items. Use the BIOS, which takes precedence over the Mearace SP settings anyway, only, and only configure IPMI/eth1 for a fixed ip-adddress. You can always use arp -a to find out the DCHP-assigned ip address to the other interfaces. You can find the MAC-addresses of your LAN1 and LAN2 interfaces at the BMC configuration section of the BIOS, under BMC MAC Restore Tool:
Hope this helps anyone.

Wednesday, June 21, 2017

Storage server

Storage server

Hardware

Aiming at 2 VDEVs of 5 or 6 disks each, I'd need a motherboard capable of running 12 disks.I used a SuperMicro board in the ESXi build, mainly because virtualization using bare metal hypervisors was quite new to me. However, these boards have quite a steep price.
There's a new motherboard by SuperMicro, that screams NAS, but that has not yet hit the shops.

So, I ended up with:

Assembly

It starts off with placing the processor and memory on the motherboard. This is best done outside the case:
One of the reasons I love Fractal cases is the disks cages; not only can you replace the cage with hot plug ones, you may also relocate one or both cages. And the disk frames just slide out - no tools needed.

And, there's room for two SSD's at the back of the motherboard:

The case assemply starts with adding the power supply.
Then, place the motherboard, and attach the power cables. Some cable management is in order, but will be done after all disks have been installed and hooked up.
Time to put these components to the test. Download Memtestx86, or the commercial version, and let it run for a while.
OK, that'll do pig, that'll do.

To be continued with part two of the storage server: Firmware

Monday, June 19, 2017

Things to do after you cloned a Virtual Machine

Clean up a cloned VM

After you made a clone of your (base) VM, you will need to do some stuff.

MAC-address

First of all, I suspect you have a different MAC-address than the original machine. VMWare does that, as long as you have your MAC address assigned automatically. VirtualBox will ask you whether to re-initialize the MAC-address while cloning.
The problem is the udev process, responsable for handling devices. This uses confuguration files, located in/etc/udev/rules.d directory. The file
70-persistent-net.rules will have an entry, based on your original machine. An entry looks like:

# PCI device 0x15ad:0x07b0 (vmxnet3) (custom name provided by external tool) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:0c:29:71:3c:be", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
You see the MAC-address, as wel as the assigned link (eth0). I already altered this line to reflect the correct MAC-address, and link name. The easiest way turns out to be to remove this file completely; it will be re-generated when absent:
cd /etc/udev/rules.d/ rm -f 70-persistent-net.rules reboot now reboot

eth0 or eth1

Another problem may be the fact your assigned link names went wrong; you may have extra rules in your udev file for eth1. And, your existing eth0 may still carry the wrong MAC-address, which may cause errors like
ifup eth0 eth0 does not seem to be present,delaying initialization.
Check if your MAC-addresses are still lingering around with:
/sbin/ifconfig | grep "^eth" eth0 Link encap:Ethernet HWaddr 00:0C:29:71:3C:BE
This is a correct output - MAC-addresses match. If not, change the /etc/sysconfig/network-scripts/ifcfg-eth0 file to reflect the correct MAC-address:
DEVICE=eth0 TYPE=Ethernet ONBOOT=yes NM_CONTROLLED=yes BOOTPROTO=none HWADDR=00:0C:29:71:3C:BE
Depending on the number of NIC's, ifcfg-eth1 may require some tweaking, as well. You should now successfully be able to start networking services. 

hostname

Change the hostname; edit /etc/sysconfig/network, and adapt /etc/hosts

Sunday, June 18, 2017

Now, here's an idea...

Gaining control

Or rather - regaining control. Over my own data, and what's done with it.

Currently, I use several services, of which I know they are monitored. Several of these services fall under US legislation, although I'm not a US citizen. This allows several agencies to go through my documents, email and other stuff, whether I like that or not (I do not).

Of course, for some of this, I gave permission - blogging on a google platform undoubly allows google to scan, "in order to enhance services rendered". Or something similar. Using gmail: ditto. Drobox: ditto. MS Windows: Ditto.

And all of these firms store data on US territory, or are US based, which basically tells me my data is being scanned.

Now, I am aware of this, but not overly comfortable with it. I like my privacy. I like the idea of being innocent until proven otherwise.

So, how about taking matters in one's own hands? How about setting up my own email and cloud services?


Services wanted

Just freewheeling here, but how about: 


  • replace gmail by dovecot 
  • replace dropbox by nextcloud (successor/fork of owncloud) 
  • create some virtual/cloud computing platform to replace ESXi. My own Azure, so to speak. 

I do have a previous (not documented) ESXi server build, and I run some 10 virtual machines on it, one of them being FreeNAS - because of it's native ZFS.
This works as a charm, despite the FreeNAS community being.... let's say sceptical, about the idea.
The only problem is that ESXi looses connection, and that is canof hard to re-establish.
So, I felt the urge to build a dedicated, 24/7 storage server. 
This would also have to take care of some laptop storage, and runs 24/7, so it better be energy-efficient. Of course, I'm not able, budget wise, to go for the ultimate option, SSD-only. A mix of SSD and 2.5" drives should do, and I could probably salvage some 3TB disks of the ESXi build. 

Enter Sub-project 1: storage server.