[OmniOS-discuss] Slow Drive Detection and boot-archive

Schweiss, Chip chip at innovates.com
Wed Jul 29 20:50:36 UTC 2015


I have an OmniOS box with all the same hardware except the server and hard
disks.  I would wager this something to do with the WD disks and something
different happening in the init.

This is a stab in the dark, but try adding "power-condition:false" in
/kernel/drv/sd.conf for the WD disks.

-Chip



On Wed, Jul 29, 2015 at 12:48 PM, Michael Talbott <mtalbott at lji.org> wrote:

> Here's the specs of that server.
>
> Fujitsu RX300S8
>  -
> http://www.fujitsu.com/fts/products/computing/servers/primergy/rack/rx300/
> 128G ECC DDR3 1600 RAM
> 2 x Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
> 2 x LSI 9200-8e
> 2 x 10Gb Intel NICs
> 2 x SuperMicro 847E26-RJBOD1 45 bay JBOD enclosures
>  - http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm
>
> The enclosures are not currently set up for multipathing. The front and
> rear backplane each have a single independent SAS connection to one of the
> LSI 9200s.
>
> The two enclosures are fully loaded with 45 x 4TB WD4001FYYG-01SL3 drives
> each (90 total).
> http://www.newegg.com/Product/Product.aspx?Item=N82E16822236353
>
> Booting the server up in Ubuntu or CentOS does not have that 8 second
> delay. Each drive is found in a fraction of a second (activity LEDs on the
> enclosure flash on and off really quick as the drives are scanned). On
> OmniOS, the drives seem to be scanned in the same order, but, instead of it
> spending a fraction of a second on each drive, it spends 8 seconds on 1
> drive (led of only one drive rapidly flashing during that process) before
> moving on to the next x 90 drives.
>
> Is there anything I can do to get more verbosity in the boot messages that
> might just reveal the root issue?
>
> Any suggestions appreciated.
>
> Thanks
>
> ________________________
> Michael Talbott
> Systems Administrator
> La Jolla Institute
>
> On Jul 29, 2015, at 7:51 AM, Schweiss, Chip <chip at innovates.com> wrote:
>
>
>
> On Fri, Jul 24, 2015 at 5:03 PM, Michael Talbott <mtalbott at lji.org> wrote:
>
>> Hi,
>>
>> I've downgraded the cards (LSI 9211-8e) to v.19 and disabled their boot
>> bios. But I'm still getting the 8 second per drive delay after the kernel
>> loads. Any other ideas?
>>
>>
> 8 seconds is way too long.   What JBODs and disks are you using?   Could
> it be they are powered off and the delay in waiting for the power on
> command to complete?   This could be accelerated by using lsiutils to send
> them all power on commands first.
>
> While I still consider it slow, however, my OmniOS systems with  LSI HBAs
> discover about 2 disks per second.   With systems with LOTS of disk all
> multipathed it still stacks up to a long time to discover them all.
>
> -Chip
>
>
>>
>> ________________________
>> Michael Talbott
>> Systems Administrator
>> La Jolla Institute
>>
>> > On Jul 20, 2015, at 11:27 PM, Floris van Essen ..:: House of Ancients
>> Amstafs ::.. <info at houseofancients.nl> wrote:
>> >
>> > Michael,
>> >
>> > I know v20 does cause lots of issue's.
>> > V19 , to the best of my knowledge doesn't contain any, so I would
>> downgrade to v19
>> >
>> >
>> > Kr,
>> >
>> >
>> > Floris
>> > -----Oorspronkelijk bericht-----
>> > Van: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com]
>> Namens Michael Talbott
>> > Verzonden: dinsdag 21 juli 2015 4:57
>> > Aan: Marion Hakanson <hakansom at ohsu.edu>
>> > CC: omnios-discuss <omnios-discuss at lists.omniti.com>
>> > Onderwerp: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
>> >
>> > Thanks for the reply. The bios for the card is disabled already. The 8
>> second per drive scan happens after the kernel has already loaded and it is
>> scanning for devices. I wonder if it's due to running newer firmware. I did
>> update the cards to fw v.20.something before I moved to omnios. Is there a
>> particular firmware version on the cards I should run to match OmniOS's
>> drivers?
>> >
>> >
>> > ________________________
>> > Michael Talbott
>> > Systems Administrator
>> > La Jolla Institute
>> >
>> >> On Jul 20, 2015, at 6:06 PM, Marion Hakanson <hakansom at ohsu.edu>
>> wrote:
>> >>
>> >> Michael,
>> >>
>> >> I've not seen this;  I do have one system with 120 drives and it
>> >> definitely does not have this problem.  A couple with 80+ drives are
>> >> also free of this issue, though they are still running OpenIndiana.
>> >>
>> >> One thing I pretty much always do here, is to disable the boot option
>> >> in the LSI HBA's config utility (accessible from the during boot after
>> >> the BIOS has started up).  I do this because I don't want the BIOS
>> >> thinking it can boot from any of the external JBOD disks;  And also
>> >> because I've had some system BIOS crashes when they tried to enumerate
>> >> too many drives.  But, this all happens at the BIOS level, before the
>> >> OS has even started up, so in theory it should not affect what you are
>> >> seeing.
>> >>
>> >> Regards,
>> >>
>> >> Marion
>> >>
>> >>
>> >> ================================================================
>> >> Subject: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
>> >> From: Michael Talbott <mtalbott at lji.org>
>> >> Date: Fri, 17 Jul 2015 16:15:47 -0700
>> >> To: omnios-discuss <omnios-discuss at lists.omniti.com>
>> >>
>> >> Just realized my typo. I'm using this on my 90 and 180 drive systems:
>> >>
>> >> # svccfg -s boot-archive setprop start/timeout_seconds=720 # svccfg -s
>> >> boot-archive setprop start/timeout_seconds=1440
>> >>
>> >> Seems like 8 seconds to detect each drive is pretty excessive.
>> >>
>> >> Any ideas on how to speed that up?
>> >>
>> >>
>> >> ________________________
>> >> Michael Talbott
>> >> Systems Administrator
>> >> La Jolla Institute
>> >>
>> >>> On Jul 17, 2015, at 4:07 PM, Michael Talbott <mtalbott at lji.org>
>> wrote:
>> >>>
>> >>> I have multiple NAS servers I've moved to OmniOS and each of them
>> have 90-180 4T disks. Everything has worked out pretty well for the most
>> part. But I've come into an issue where when I reboot any of them, I'm
>> getting boot-archive service timeouts happening. I found a workaround of
>> increasing the timeout value which brings me to the following. As you can
>> see below in a dmesg output, it's taking the kernel about 8 seconds to
>> detect each of the drives. They're connected via a couple SAS2008 based LSI
>> cards.
>> >>>
>> >>> Is this normal?
>> >>> Is there a way to speed that up?
>> >>>
>> >>> I've fixed my frustrating boot-archive timeout problem by adjusting
>> the timeout value from the default of 60 seconds (I guess that'll work ok
>> on systems with less than 8 drives?) to 8 seconds * 90 drives + a little
>> extra time = 280 seconds (for the 90 drive systems). Which means it takes
>> between 12-24 minutes to boot those machines up.
>> >>>
>> >>> # svccfg -s boot-archive setprop start/timeout_seconds=280
>> >>>
>> >>> I figure I can't be the only one. A little googling also revealed:
>> >>> https://www.illumos.org/issues/4614
>> >>> <https://www.illumos.org/issues/4614>
>> >>>
>> >>> Jul 17 15:40:15 store2 genunix: [ID 583861 kern.info] sd29 at
>> >>> mpt_sas3: unit-address w50000c0f0401bd43,0: w50000c0f0401bd43,0 Jul
>> >>> 17 15:40:15 store2 genunix: [ID 936769 kern.info] sd29 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bd4
>> >>> 3,0 Jul 17 15:40:16 store2 genunix: [ID 408114 kern.info]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bd4
>> >>> 3,0 (sd29) online Jul 17 15:40:24 store2 genunix: [ID 583861
>> >>> kern.info] sd30 at mpt_sas3: unit-address w50000c0f045679c3,0:
>> >>> w50000c0f045679c3,0 Jul 17 15:40:24 store2 genunix: [ID 936769
>> >>> kern.info] sd30 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f045679c
>> >>> 3,0 Jul 17 15:40:24 store2 genunix: [ID 408114 kern.info]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f045679c
>> >>> 3,0 (sd30) online Jul 17 15:40:33 store2 genunix: [ID 583861
>> >>> kern.info] sd31 at mpt_sas3: unit-address w50000c0f045712b3,0:
>> >>> w50000c0f045712b3,0 Jul 17 15:40:33 store2 genunix: [ID 936769
>> >>> kern.info] sd31 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f045712b
>> >>> 3,0 Jul 17 15:40:33 store2 genunix: [ID 408114 kern.info]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f045712b
>> >>> 3,0 (sd31) online Jul 17 15:40:42 store2 genunix: [ID 583861
>> >>> kern.info] sd32 at mpt_sas3: unit-address w50000c0f04571497,0:
>> >>> w50000c0f04571497,0 Jul 17 15:40:42 store2 genunix: [ID 936769
>> >>> kern.info] sd32 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0457149
>> >>> 7,0 Jul 17 15:40:42 store2 genunix: [ID 408114 kern.info]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0457149
>> >>> 7,0 (sd32) online Jul 17 15:40:50 store2 genunix: [ID 583861
>> >>> kern.info] sd33 at mpt_sas3: unit-address w50000c0f042ac8eb,0:
>> >>> w50000c0f042ac8eb,0 Jul 17 15:40:50 store2 genunix: [ID 936769
>> >>> kern.info] sd33 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f042ac8e
>> >>> b,0 Jul 17 15:40:50 store2 genunix: [ID 408114 kern.info]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f042ac8e
>> >>> b,0 (sd33) online Jul 17 15:40:59 store2 genunix: [ID 583861
>> >>> kern.info] sd34 at mpt_sas3: unit-address w50000c0f04571473,0:
>> >>> w50000c0f04571473,0 Jul 17 15:40:59 store2 genunix: [ID 936769
>> >>> kern.info] sd34 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0457147
>> >>> 3,0 Jul 17 15:40:59 store2 genunix: [ID 408114 kern.info]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0457147
>> >>> 3,0 (sd34) online Jul 17 15:41:08 store2 genunix: [ID 583861
>> >>> kern.info] sd35 at mpt_sas3: unit-address w50000c0f042c636f,0:
>> >>> w50000c0f042c636f,0 Jul 17 15:41:08 store2 genunix: [ID 936769
>> >>> kern.info] sd35 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f042c636
>> >>> f,0 Jul 17 15:41:08 store2 genunix: [ID 408114 kern.info]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f042c636
>> >>> f,0 (sd35) online Jul 17 15:41:17 store2 genunix: [ID 583861
>> >>> kern.info] sd36 at mpt_sas3: unit-address w50000c0f0401bf2f,0:
>> >>> w50000c0f0401bf2f,0 Jul 17 15:41:17 store2 genunix: [ID 936769
>> >>> kern.info] sd36 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bf2
>> >>> f,0 Jul 17 15:41:17 store2 genunix: [ID 408114 kern.info]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bf2
>> >>> f,0 (sd36) online Jul 17 15:41:25 store2 genunix: [ID 583861
>> >>> kern.info] sd38 at mpt_sas3: unit-address w50000c0f0401bc1f,0:
>> >>> w50000c0f0401bc1f,0 Jul 17 15:41:25 store2 genunix: [ID 936769
>> >>> kern.info] sd38 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bc1
>> >>> f,0 Jul 17 15:41:26 store2 genunix: [ID 408114 kern.info]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bc1
>> >>> f,0 (sd38) online
>> >>>
>> >>>
>> >>> ________________________
>> >>> Michael Talbott
>> >>> Systems Administrator
>> >>> La Jolla Institute
>> >>>
>> >>
>> >> _______________________________________________
>> >> OmniOS-discuss mailing list
>> >> OmniOS-discuss at lists.omniti.com
>> >> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>> >>
>> >>
>> >
>> > _______________________________________________
>> > OmniOS-discuss mailing list
>> > OmniOS-discuss at lists.omniti.com
>> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>> > ...:: House of Ancients ::...
>> > American Staffordshire Terriers
>> >
>> > +31-628-161-350
>> > +31-614-198-389
>> > Het Perk 48
>> > 4903 RB
>> > Oosterhout
>> > Netherlands
>> > www.houseofancients.nl
>>
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150729/4c7d376e/attachment-0001.html>


More information about the OmniOS-discuss mailing list