[OmniOS-discuss] iSCSI traffic suddenly comes to a halt and then resumes
Matej Zerovnik
matej at zunaj.si
Wed May 27 06:58:16 UTC 2015
Hello Josten,
> On 26 May 2015, at 22:18, Anon <anon at omniti.com> wrote:
>
> Hi Matej,
>
> Do you have sar running on your system? I'd recommend maybe running it at a short interval so that you can get historical disk statistics. You can use this info to rule out if its the disks or not. You can also use iotop -P to get a real time view of %IO to see if it's the disks. You can also use zpool iostat -v 1.
I didn’t have sar or iotop running, but I had 'iostat -xn' and 'zpool iostat -v 1' running when things stopped working, but there is nothing unusual in there. Write ops suddenly fall to 0 and that’s it. Reads are still happening and according to network traffic, there is outgoing traffic when I’m unable to write to the ZFS FS (even locally on the server). I created a simple text file, so next time system hangs, I will be able to check if system is readable (currently, I only have iscsi volumes, so I’m unable to check that locally on server).
>
> Also, do you have baseline benchmark of performance and know if you're meeting/exceeding it? The baseline should be for random and sequential IO; you can use bonnie++ to get this information.
I can, with 99,99% say, I’m exceeding performance of the pool itself. It’s a single raidz2 vdev with 50 hard drives and 70 connected clients. some are idling, but 10-20 clients are pushing data to server. I know zpool configuration is very bad, but that’s a legacy I can’t change easily. I’m already syncing data to another 7 vdev server, but since this server is so busy, transfers are happening VERY SLOW (read, zfs sync doing 10MB/s).
>
> Are you able to share your ZFS configuration and iSCSI configuration?
Sure! Here are zfs settings:
zfs get all data:
NAME PROPERTY VALUE SOURCE
data type filesystem -
data creation Fri Oct 25 20:26 2013 -
data used 104T -
data available 61.6T -
data referenced 1.09M -
data compressratio 1.08x -
data mounted yes -
data quota none default
data reservation none default
data recordsize 128K default
data mountpoint /volumes/data received
data sharenfs off default
data checksum on default
data compression off received
data atime off local
data devices on default
data exec on default
data setuid on default
data readonly off local
data zoned off default
data snapdir hidden default
data aclmode discard default
data aclinherit restricted default
data canmount on default
data xattr on default
data copies 1 default
data version 5 -
data utf8only off -
data normalization none -
data casesensitivity sensitive -
data vscan off default
data nbmand off default
data sharesmb off default
data refquota none default
data refreservation none default
data primarycache all default
data secondarycache all default
data usedbysnapshots 0 -
data usedbydataset 1.09M -
data usedbychildren 104T -
data usedbyrefreservation 0 -
data logbias latency default
data dedup off local
data mlslabel none default
data sync standard default
data refcompressratio 1.00x -
data written 1.09M -
data logicalused 98.1T -
data logicalreferenced 398K -
data filesystem_limit none default
data snapshot_limit none default
data filesystem_count none default
data snapshot_count none default
data redundant_metadata all default
data nms:dedup-dirty on received
data nms:description datauporabnikov received
I’m not sure what iSCSI configuration do you want/need? But as far as I figured out in the last 'freeze', iSCSI is not the problem, since I’m unable to write to ZFS volume even if I’m local on the server itself.
>
> For iSCSI, can you take a look at this: http://docs.oracle.com/cd/E23824_01/html/821-1459/fpjwy.html#fsume <http://docs.oracle.com/cd/E23824_01/html/821-1459/fpjwy.html#fsume>
Interesting. I tried running 'iscsiadm list target' but it doesn’t return anything. There is also nothing in /var/adm/messages as usual:) But target service is online (according to svcs), clients are connected and having traffic.
>
> Do you have detailed logs for the clients experiencing the issues? If not are you able to enable verbose logging (such as debug level logs)?
I have clients logs, but they mostly just report loosing connections and reconnecting:
Example 1:
Apr 29 10:33:53 eee kernel: connection1:0: detected conn error (1021)
Apr 29 10:33:54 eee iscsid: Kernel reported iSCSI connection 1:0 error (1021 - ISCSI_ERR_SCSI_EH_SESSION_RST: Session was dropped as a result of SCSI error recovery) state (3)
Apr 29 10:33:56 eee iscsid: connection1:0 is operational after recovery (1 attempts)
Apr 29 10:36:37 eee kernel: connection1:0: detected conn error (1021)
Apr 29 10:36:37 eee iscsid: Kernel reported iSCSI connection 1:0 error (1021 - ISCSI_ERR_SCSI_EH_SESSION_RST: Session was dropped as a result of SCSI error recovery) state (3)
Apr 29 10:36:40 eee iscsid: connection1:0 is operational after recovery (1 attempts)
Apr 29 10:36:50 eee kernel: sd 3:0:0:0: Device offlined - not ready after error recovery
Apr 29 10:36:51 eee kernel: sd 3:0:0:0: Device offlined - not ready after error recovery
Apr 29 10:36:51 eee kernel: sd 3:0:0:0: Device offlined - not ready after error recovery
Example 2:
Apr 16 08:41:40 vf kernel: connection1:0: pdu (op 0x5e itt 0x1) rejected. Reason code 0x7
Apr 16 08:43:11 vf kernel: connection1:0: pdu (op 0x5e itt 0x1) rejected. Reason code 0x7
Apr 16 08:44:13 vf kernel: connection1:0: pdu (op 0x5e itt 0x1) rejected. Reason code 0x7
Apr 16 08:45:51 vf kernel: connection1:0: detected conn error (1021) Apr 16 08:45:51 317 iscsid: Kernel reported iSCSI connection 1:0 error (1021 - ISCSI_ERR_SCSI_EH_SESSION_RST: Session was dropped as a result of SCSI error recovery) state (3)
Apr 16 08:45:53 vf iscsid: connection1:0 is operational after recovery (1 attempts)
I’m already in contact with OmniTI regarding our new build, but in the mean time, I would love for our clients to be able to use the storage so I’m trying to resolve the current issue somehow…
Matej
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150527/1c170f52/attachment-0001.html>
More information about the OmniOS-discuss
mailing list