File systems
ZFS
zetabyte file system It is not part of linux distributions but you can use the user space version zfs-fuse which seems to work fine (or use openzfs)
iscsi with zfs
You can use zfs fuse on iscsi disks, however i had problems with the array after rebooting. Mostly if you see that all or most disk are reporting corrupt data or missing labels what probably is the case is you created the pool like :
However iscsi does not guarantee that sdb will be sdb after a reboot. As a matter of fact you can just as well have that problem with you BIOS !! The solution is to use the alternate paths for disks. For instance your disks are listed in /dev by id, path or uuid.
disk id
>ls /dev/disk/by-id
....
ata-SAMSUNG_HD322HJ_S17AJ90Q409427-part1
ata-SAMSUNG_HD322HJ_S17AJ90Q409427-part2
scsi-SATA_HDS728080PLA380_PFDB30S2TEAE8M
scsi-SATA_HDS728080PLA380_PFDB30S2TEAE8M-part1
scsi-SATA_HDS728080PLA380_PFDB30S2TEAE8M-part2
....
disk path
>ls /dev/disk/by-path
pci-0000:00:11.0-scsi-0:0:0:0
pci-0000:00:11.0-scsi-0:0:0:0-part1
pci-0000:00:11.0-scsi-0:0:0:0-part2
pci-0000:00:11.0-scsi-0:0:0:0-part3
pci-0000:00:11.0-scsi-1:0:0:0
disk uuid
this is a real unique disk id
>ls /dev/disk/by-uuid
3e9b7634-4769-4188-817a-155d434842d4 a37d8d46-3bcf-4f43-871a-34c306031d39
6b2fc0c6-e83a-4053-bbcd-07f938999a55 ae96e8a3-a8d4-44d6-be07-4f6dda206cf4
disk label
Sometimes also by label
I used by path for the iscsi disks, they looked like this
ip-10.10.1.14:3260-iscsi-iqn.2010-11.net.almende:storage1.disk1-lun-0
ip-10.10.1.14:3260-iscsi-iqn.2010-11.net.almende:storage1.disk1-lun-0-part1
ip-10.10.1.14:3260-iscsi-iqn.2010-11.net.almende:storage1.disk2-lun-1
ip-10.10.1.14:3260-iscsi-iqn.2010-11.net.almende:storage1.disk2-lun-1-part1
ip-10.10.1.15:3260-iscsi-iqn.2010-11.net.almende:storage2.disk1-lun-0
ip-10.10.1.15:3260-iscsi-iqn.2010-11.net.almende:storage2.disk1-lun-0-part1
ip-10.10.1.15:3260-iscsi-iqn.2010-11.net.almende:storage2.disk2-lun-1
...
So that's specific enough not to get switched around.
iscsi
Iscsi is scsi over a network. For linux admin's this will be rather explanatory : 'you can fdisk a disk on another machine' And of course you can do almost anything ales you would with anything you can 'fdisk'. I myself used it to create a zfs (-fuse) raidz array over multiple machines ;-). So it might be best to describe how to do that ?
setting up iscsi
Of course you need an operating system to run the software, so i chose a simple debian lenny netinst. From there i will be very terse and am for the commands issued, but one thing in advance for installation iscsitarget is the server side software, so to provide iscsi disks. open-iscsi is the client side, so it' to use iscsi disks. In iscsi the client is often called initiator, and the server the target.
server
The server is called iSCSI Enterprise Target Daemon, and called ietd. The admin command therefore also is called ietadm.
The result from that last command will give you a clue about what modules to install, for instance mine was :
It will warn about not being started up, because it is disabled in /etc/default/iscsitarget. First solve that by editing the file or if ye'r really lazy :
At your own risk, it is currently the only line there so it works fine but you are warned. But you have to alter /etc/ietd.conf it has some reasonable defaults but of course your disks should be defined. Take the example and alter/copy it. I did it without authentication, so my only alterations to the default section where :
...
Target iqn.2010-11.net.almende:storage1.disk1
...
Lun 0 Path=/dev/hda1,Type=fileio
...
Target iqn.2010-11.net.almende:storage1.disk2
Lun 1 Path=/dev/hdb1,Type=fileio
The iqn is the iscsi qualified name, it has to be globally and chronologically unique. So it was made to look like this :
The yyyy.mm date is any date that you owned the domain you use. The idea is that you have an unique identifier for an Internet domain at a certain time and it could be sold to another one in which case the date kicks in. I just did the current date because i know we still own almende.net. Play around with the option if you like, i did not. Of course you decide which partitions (or even file it seems ?) you want to export as a target. I made two equally sized partitions because they are going to act as part a zfs array later on. Now restart the target and watch for any errors
/etc/init.d/iscsitarget restart
Removing iSCSI enterprise target devices: succeeded.
Stopping iSCSI enterprise target service: succeeded.
Removing iSCSI enterprise target modules: succeeded.
Starting iSCSI enterprise target service: succeeded.
On the server you can issue some commands like:
cat /proc/net/iet/session
tid:2 name:iqn.2010-11.net.almende:storage1.disk2
tid:1 name:iqn.2010-11.net.almende:storage1.disk1
/home/kees# cat /proc/net/iet/volume
tid:2 name:iqn.2010-11.net.almende:storage1.disk2
lun:1 state:0 iotype:fileio iomode:wt path:/dev/hdb1
tid:1 name:iqn.2010-11.net.almende:storage1.disk1
lun:0 state:0 iotype:fileio iomode:wt path:/dev/hda1
To view the status, or go on to the client section to view over the network.
client
The client is also called the initiator, and since you tend to be either a client or a server, there is a different package to install :
Now to spare you some work alter the startup in /etc/iscsi/iscsi.conf to 'automatic' see below. And run this command to find iscsi targets :
- -m means mode so discovery mode
- -t is type, sendtarget or st for short
- -p is portal , so give the ip address of a target
It will return :
10.10.1.14:3260,1 iqn.2010-11.net.almende:storage1.disk1
10.10.1.14:3260,1 iqn.2010-11.net.almende:storage1.disk2
After this, you are logged in and discovery will return :
After doing all 3 machine it looks like :
10.10.1.16:3260 via sendtargets
10.10.1.15:3260 via sendtargets
10.10.1.14:3260 via sendtargets
With 'node' mode you can view all nodes :
iscsiadm -m node
10.10.1.16:3260,1 iqn.2010-11.net.almende:storage3.disk1
10.10.1.15:3260,1 iqn.2010-11.net.almende:storage2.disk1
10.10.1.14:3260,1 iqn.2010-11.net.almende:storage1.disk2
10.10.1.14:3260,1 iqn.2010-11.net.almende:storage1.disk1
10.10.1.16:3260,1 iqn.2010-11.net.almende:storage3.disk2
10.10.1.15:3260,1 iqn.2010-11.net.almende:storage2.disk2
Also the settings for each node are made permanent in /etc/iscsi/nodes, there is a directory there for each of the nodes. But still we need to login in to the remote target, so use :
iscsiadm -m node --targetname "iqn.2010-11.net.almende:storage1.disk1" --portal "10.10.1.14:3260" --login
If you use no authentication, like i did above, you can now do :
And hey you get an extra harddisk /dev/sdb !! You can do this for all targets but to automate it at boot time see next section.
auto startup
Alter /etc/iscsi/iscsi.conf :
# To request that the iscsi initd scripts startup a session set to "automatic".
node.startup = automatic
#
# To manually startup the session set to "manual". The default is manual.
# node.startup = manual
Change the default from manual to automatic, this only affects targets discovered 'after' you set this, so here is a manual way of doing it. You can edit the node files yourself, but this is the correct way of doing it :
iscsiadm -m node --targetname "iqn.2010-11.net.almende:storage2.disk1" --portal "10.10.1.15:3260" --op update -n node.conn[0].startup -v automatic
You may notice there is also a 'node.startup=manual' in the node files , but that does not seem to affect startup. And nowhere in the entire documentation and Internet is there anyone mentioning the difference between the two so i gave up. See [[Zfs_linux]] for how to create a useful system out of these. But here is the command i used for 3 machine with each 2 disks :
rename pool
I named my test pool 'share' which is not very recognizable as zfs pool. So i wanted to rename it to 'tank' because most examples use that term and therefore very recognizable. There is no rename command, so this is the fastest way :
restoring a failed disk
Note that the first time i tried this it failed because i did not created the pool correctly.
zpool status
pool: share
state: ONLINE
scan: scrub repaired 0B in 00:00:01 with 0 errors on Sun Apr 14 00:24:02 2024
config:
NAME STATE READ WRITE CKSUM
share ONLINE 0 0 0
ata-Hitachi_HDP725032GLAT80_GE2330RC1165AB ONLINE 0 0 0
ata-SAMSUNG_HD321KJ_S0ZEJ1MP807222 ONLINE 0 0 0
ata-SAMSUNG_HD322HJ_S17AJ90Q409426 ONLINE 0 0 0
ata-SAMSUNG_HD322HJ_S17AJ90Q409427 ONLINE 0 0 0
ata-SAMSUNG_HD322HJ_S17AJ9BS104161 ONLINE 0 0 0
errors: No known data errors
Note that share does not mention raidz anywhere. If you unhook a disk (or loose one) you cannot recreate the data.
import may be betterzpool import
pool: tank
id: 17581964764773003445
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:
tank UNAVAIL insufficient replicas
ata-Hitachi_HDP725032GLAT80_GE2330RC1165AB ONLINE
ata-SAMSUNG_HD321KJ_S0ZEJ1MP807222 ONLINE
ata-SAMSUNG_HD322HJ_S17AJ90Q409426 ONLINE
ata-SAMSUNG_HD322HJ_S17AJ90Q409427 ONLINE
ata-SAMSUNG_HD322HJ_S17AJ9BS104161 UNAVAIL
Note that it says insufficient replicas , because we just created a 'striped' raid with 5 disks. Reboot with the disk attached and you will see that the size is
So that's 5 times 320 (1600) not 4 times 320 (1280).
Recreate the pool like this
The -v means exclude all line having 'part' in it : the partition lines. There are also 'wvn' lines in there so this command leaves :
/dev/disk/by-id/ata-Hitachi_HDP725032GLAT80_GE2330RC1165AB
/dev/disk/by-id/ata-_NEC_DVD_RW_ND-3520A
/dev/disk/by-id/ata-SAMSUNG_HD321KJ_S0ZEJ1MP807222
/dev/disk/by-id/ata-SAMSUNG_HD322HJ_S17AJ90Q409426
/dev/disk/by-id/ata-SAMSUNG_HD322HJ_S17AJ90Q409427
/dev/disk/by-id/ata-SAMSUNG_HD322HJ_S17AJ9BS104161
/dev/disk/by-id/ata-SanDisk_SDSSDP064G_144632401890
Also remove the root system and DVD and the complete command will become.
zpool create tank raidz \
/dev/disk/by-id/ata-Hitachi_HDP725032GLAT80_GE2330RC1165AB \
/dev/disk/by-id/ata-SAMSUNG_HD321KJ_S0ZEJ1MP807222 \
/dev/disk/by-id/ata-SAMSUNG_HD322HJ_S17AJ90Q409426 \
/dev/disk/by-id/ata-SAMSUNG_HD322HJ_S17AJ90Q409427 \
/dev/disk/by-id/ata-SAMSUNG_HD322HJ_S17AJ9BS104161
raidz has 1 disk redundancy, raidz2, and raidz3 have 2 and 3. The disk space is now correct :
replace a disk
If you now switch the SATA cables with another disk it will complain but still say you can continue.
zpool status
pool: tank
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ata-Hitachi_HDP725032GLAT80_GE2330RC1165AB ONLINE 0 0 0
ata-SAMSUNG_HD321KJ_S0ZEJ1MP807222 ONLINE 0 0 0
ata-SAMSUNG_HD322HJ_S17AJ90Q409426 ONLINE 0 0 0
ata-SAMSUNG_HD322HJ_S17AJ90Q409427 ONLINE 0 0 0
6263933613393558152 UNAVAIL 0 0 0 was /dev/disk/by-id/ata-SAMSUNG_HD322HJ_S17AJ9BS104161-part1
errors: No known data errors
Also note that it renamed the disk to 6263933613393558152. So use that in the replace command : zpool replace
The pool is now working again and 'resilvered'
zpool status
pool: tank
state: ONLINE
scan: resilvered 89K in 00:00:02 with 0 errors on Fri Apr 26 12:21:19 2024
config:
...
I just left it that way and now the old disk is the 'spare' disk.
reimport
To be sure we can also reconstruct the pool on another machine, to be sure just reimport the pool :