[OmniOS-discuss] Slow Drive Detection and boot-archive
Michael Talbott
mtalbott at lji.org
Wed Jul 29 22:02:17 UTC 2015
Gave that a shot. No dice. Still getting the 8 second lag. It reminds me of raid cards that do staggered spinups that sequentially spin up 1 drive at a time. Only, this is happening after the kernel loads and of course, the LSI 9200s are flashed in IT mode with v.19 firmware and bios disabled.
Jul 29 14:57:12 store2 genunix: [ID 583861 kern.info] sd10 at mpt_sas2: unit-address w50000c0f0401c20f,0: w50000c0f0401c20f,0
Jul 29 14:57:12 store2 genunix: [ID 936769 kern.info] sd10 is /pci at 0,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f0401c20f,0
Jul 29 14:57:12 store2 genunix: [ID 408114 kern.info] /pci at 0,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f0401c20f,0 (sd10) online
Jul 29 14:57:20 store2 genunix: [ID 583861 kern.info] sd11 at mpt_sas2: unit-address w50000c0f040075db,0: w50000c0f040075db,0
Jul 29 14:57:20 store2 genunix: [ID 936769 kern.info] sd11 is /pci at 0,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f040075db,0
Jul 29 14:57:21 store2 genunix: [ID 408114 kern.info] /pci at 0,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f040075db,0 (sd11) online
Jul 29 14:57:29 store2 genunix: [ID 583861 kern.info] sd12 at mpt_sas2: unit-address w50000c0f042c684b,0: w50000c0f042c684b,0
Jul 29 14:57:29 store2 genunix: [ID 936769 kern.info] sd12 is /pci at 0,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f042c684b,0
Jul 29 14:57:29 store2 genunix: [ID 408114 kern.info] /pci at 0,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f042c684b,0 (sd12) online
Jul 29 14:57:38 store2 genunix: [ID 583861 kern.info] sd13 at mpt_sas2: unit-address w50000c0f0457149f,0: w50000c0f0457149f,0
Jul 29 14:57:38 store2 genunix: [ID 936769 kern.info] sd13 is /pci at 0,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f0457149f,0
Jul 29 14:57:38 store2 genunix: [ID 408114 kern.info] /pci at 0,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f0457149f,0 (sd13) online
Jul 29 14:57:47 store2 genunix: [ID 583861 kern.info] sd14 at mpt_sas2: unit-address w50000c0f042b1c6f,0: w50000c0f042b1c6f,0
Jul 29 14:57:47 store2 genunix: [ID 936769 kern.info] sd14 is /pci at 0,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f042b1c6f,0
Jul 29 14:57:47 store2 genunix: [ID 408114 kern.info] /pci at 0,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f042b1c6f,0 (sd14) online
________________________
Michael Talbott
Systems Administrator
La Jolla Institute
> On Jul 29, 2015, at 1:50 PM, Schweiss, Chip <chip at innovates.com> wrote:
>
> I have an OmniOS box with all the same hardware except the server and hard disks. I would wager this something to do with the WD disks and something different happening in the init.
>
> This is a stab in the dark, but try adding "power-condition:false" in /kernel/drv/sd.conf for the WD disks.
>
> -Chip
>
>
>
> On Wed, Jul 29, 2015 at 12:48 PM, Michael Talbott <mtalbott at lji.org <mailto:mtalbott at lji.org>> wrote:
> Here's the specs of that server.
>
> Fujitsu RX300S8
> - http://www.fujitsu.com/fts/products/computing/servers/primergy/rack/rx300/ <http://www.fujitsu.com/fts/products/computing/servers/primergy/rack/rx300/>
> 128G ECC DDR3 1600 RAM
> 2 x Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
> 2 x LSI 9200-8e
> 2 x 10Gb Intel NICs
> 2 x SuperMicro 847E26-RJBOD1 45 bay JBOD enclosures
> - http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm <http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm>
>
> The enclosures are not currently set up for multipathing. The front and rear backplane each have a single independent SAS connection to one of the LSI 9200s.
>
> The two enclosures are fully loaded with 45 x 4TB WD4001FYYG-01SL3 drives each (90 total).
> http://www.newegg.com/Product/Product.aspx?Item=N82E16822236353 <http://www.newegg.com/Product/Product.aspx?Item=N82E16822236353>
>
> Booting the server up in Ubuntu or CentOS does not have that 8 second delay. Each drive is found in a fraction of a second (activity LEDs on the enclosure flash on and off really quick as the drives are scanned). On OmniOS, the drives seem to be scanned in the same order, but, instead of it spending a fraction of a second on each drive, it spends 8 seconds on 1 drive (led of only one drive rapidly flashing during that process) before moving on to the next x 90 drives.
>
> Is there anything I can do to get more verbosity in the boot messages that might just reveal the root issue?
>
> Any suggestions appreciated.
>
> Thanks
>
> ________________________
> Michael Talbott
> Systems Administrator
> La Jolla Institute
>
>> On Jul 29, 2015, at 7:51 AM, Schweiss, Chip <chip at innovates.com <mailto:chip at innovates.com>> wrote:
>>
>>
>>
>> On Fri, Jul 24, 2015 at 5:03 PM, Michael Talbott <mtalbott at lji.org <mailto:mtalbott at lji.org>> wrote:
>> Hi,
>>
>> I've downgraded the cards (LSI 9211-8e) to v.19 and disabled their boot bios. But I'm still getting the 8 second per drive delay after the kernel loads. Any other ideas?
>>
>>
>> 8 seconds is way too long. What JBODs and disks are you using? Could it be they are powered off and the delay in waiting for the power on command to complete? This could be accelerated by using lsiutils to send them all power on commands first.
>>
>> While I still consider it slow, however, my OmniOS systems with LSI HBAs discover about 2 disks per second. With systems with LOTS of disk all multipathed it still stacks up to a long time to discover them all.
>>
>> -Chip
>>
>>
>> ________________________
>> Michael Talbott
>> Systems Administrator
>> La Jolla Institute
>>
>> > On Jul 20, 2015, at 11:27 PM, Floris van Essen ..:: House of Ancients Amstafs ::.. <info at houseofancients.nl <mailto:info at houseofancients.nl>> wrote:
>> >
>> > Michael,
>> >
>> > I know v20 does cause lots of issue's.
>> > V19 , to the best of my knowledge doesn't contain any, so I would downgrade to v19
>> >
>> >
>> > Kr,
>> >
>> >
>> > Floris
>> > -----Oorspronkelijk bericht-----
>> > Van: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com <mailto:omnios-discuss-bounces at lists.omniti.com>] Namens Michael Talbott
>> > Verzonden: dinsdag 21 juli 2015 4:57
>> > Aan: Marion Hakanson <hakansom at ohsu.edu <mailto:hakansom at ohsu.edu>>
>> > CC: omnios-discuss <omnios-discuss at lists.omniti.com <mailto:omnios-discuss at lists.omniti.com>>
>> > Onderwerp: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
>> >
>> > Thanks for the reply. The bios for the card is disabled already. The 8 second per drive scan happens after the kernel has already loaded and it is scanning for devices. I wonder if it's due to running newer firmware. I did update the cards to fw v.20.something before I moved to omnios. Is there a particular firmware version on the cards I should run to match OmniOS's drivers?
>> >
>> >
>> > ________________________
>> > Michael Talbott
>> > Systems Administrator
>> > La Jolla Institute
>> >
>> >> On Jul 20, 2015, at 6:06 PM, Marion Hakanson <hakansom at ohsu.edu <mailto:hakansom at ohsu.edu>> wrote:
>> >>
>> >> Michael,
>> >>
>> >> I've not seen this; I do have one system with 120 drives and it
>> >> definitely does not have this problem. A couple with 80+ drives are
>> >> also free of this issue, though they are still running OpenIndiana.
>> >>
>> >> One thing I pretty much always do here, is to disable the boot option
>> >> in the LSI HBA's config utility (accessible from the during boot after
>> >> the BIOS has started up). I do this because I don't want the BIOS
>> >> thinking it can boot from any of the external JBOD disks; And also
>> >> because I've had some system BIOS crashes when they tried to enumerate
>> >> too many drives. But, this all happens at the BIOS level, before the
>> >> OS has even started up, so in theory it should not affect what you are
>> >> seeing.
>> >>
>> >> Regards,
>> >>
>> >> Marion
>> >>
>> >>
>> >> ================================================================
>> >> Subject: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
>> >> From: Michael Talbott <mtalbott at lji.org <mailto:mtalbott at lji.org>>
>> >> Date: Fri, 17 Jul 2015 16:15:47 -0700
>> >> To: omnios-discuss <omnios-discuss at lists.omniti.com <mailto:omnios-discuss at lists.omniti.com>>
>> >>
>> >> Just realized my typo. I'm using this on my 90 and 180 drive systems:
>> >>
>> >> # svccfg -s boot-archive setprop start/timeout_seconds=720 # svccfg -s
>> >> boot-archive setprop start/timeout_seconds=1440
>> >>
>> >> Seems like 8 seconds to detect each drive is pretty excessive.
>> >>
>> >> Any ideas on how to speed that up?
>> >>
>> >>
>> >> ________________________
>> >> Michael Talbott
>> >> Systems Administrator
>> >> La Jolla Institute
>> >>
>> >>> On Jul 17, 2015, at 4:07 PM, Michael Talbott <mtalbott at lji.org <mailto:mtalbott at lji.org>> wrote:
>> >>>
>> >>> I have multiple NAS servers I've moved to OmniOS and each of them have 90-180 4T disks. Everything has worked out pretty well for the most part. But I've come into an issue where when I reboot any of them, I'm getting boot-archive service timeouts happening. I found a workaround of increasing the timeout value which brings me to the following. As you can see below in a dmesg output, it's taking the kernel about 8 seconds to detect each of the drives. They're connected via a couple SAS2008 based LSI cards.
>> >>>
>> >>> Is this normal?
>> >>> Is there a way to speed that up?
>> >>>
>> >>> I've fixed my frustrating boot-archive timeout problem by adjusting the timeout value from the default of 60 seconds (I guess that'll work ok on systems with less than 8 drives?) to 8 seconds * 90 drives + a little extra time = 280 seconds (for the 90 drive systems). Which means it takes between 12-24 minutes to boot those machines up.
>> >>>
>> >>> # svccfg -s boot-archive setprop start/timeout_seconds=280
>> >>>
>> >>> I figure I can't be the only one. A little googling also revealed:
>> >>> https://www.illumos.org/issues/4614 <https://www.illumos.org/issues/4614>
>> >>> <https://www.illumos.org/issues/4614 <https://www.illumos.org/issues/4614>>
>> >>>
>> >>> Jul 17 15:40:15 store2 genunix: [ID 583861 kern.info <http://kern.info/>] sd29 at
>> >>> mpt_sas3: unit-address w50000c0f0401bd43,0: w50000c0f0401bd43,0 Jul
>> >>> 17 15:40:15 store2 genunix: [ID 936769 kern.info <http://kern.info/>] sd29 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bd4
>> >>> 3,0 Jul 17 15:40:16 store2 genunix: [ID 408114 kern.info <http://kern.info/>]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bd4
>> >>> 3,0 (sd29) online Jul 17 15:40:24 store2 genunix: [ID 583861
>> >>> kern.info <http://kern.info/>] sd30 at mpt_sas3: unit-address w50000c0f045679c3,0:
>> >>> w50000c0f045679c3,0 Jul 17 15:40:24 store2 genunix: [ID 936769
>> >>> kern.info <http://kern.info/>] sd30 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f045679c
>> >>> 3,0 Jul 17 15:40:24 store2 genunix: [ID 408114 kern.info <http://kern.info/>]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f045679c
>> >>> 3,0 (sd30) online Jul 17 15:40:33 store2 genunix: [ID 583861
>> >>> kern.info <http://kern.info/>] sd31 at mpt_sas3: unit-address w50000c0f045712b3,0:
>> >>> w50000c0f045712b3,0 Jul 17 15:40:33 store2 genunix: [ID 936769
>> >>> kern.info <http://kern.info/>] sd31 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f045712b
>> >>> 3,0 Jul 17 15:40:33 store2 genunix: [ID 408114 kern.info <http://kern.info/>]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f045712b
>> >>> 3,0 (sd31) online Jul 17 15:40:42 store2 genunix: [ID 583861
>> >>> kern.info <http://kern.info/>] sd32 at mpt_sas3: unit-address w50000c0f04571497,0:
>> >>> w50000c0f04571497,0 Jul 17 15:40:42 store2 genunix: [ID 936769
>> >>> kern.info <http://kern.info/>] sd32 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0457149
>> >>> 7,0 Jul 17 15:40:42 store2 genunix: [ID 408114 kern.info <http://kern.info/>]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0457149
>> >>> 7,0 (sd32) online Jul 17 15:40:50 store2 genunix: [ID 583861
>> >>> kern.info <http://kern.info/>] sd33 at mpt_sas3: unit-address w50000c0f042ac8eb,0:
>> >>> w50000c0f042ac8eb,0 Jul 17 15:40:50 store2 genunix: [ID 936769
>> >>> kern.info <http://kern.info/>] sd33 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f042ac8e
>> >>> b,0 Jul 17 15:40:50 store2 genunix: [ID 408114 kern.info <http://kern.info/>]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f042ac8e
>> >>> b,0 (sd33) online Jul 17 15:40:59 store2 genunix: [ID 583861
>> >>> kern.info <http://kern.info/>] sd34 at mpt_sas3: unit-address w50000c0f04571473,0:
>> >>> w50000c0f04571473,0 Jul 17 15:40:59 store2 genunix: [ID 936769
>> >>> kern.info <http://kern.info/>] sd34 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0457147
>> >>> 3,0 Jul 17 15:40:59 store2 genunix: [ID 408114 kern.info <http://kern.info/>]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0457147
>> >>> 3,0 (sd34) online Jul 17 15:41:08 store2 genunix: [ID 583861
>> >>> kern.info <http://kern.info/>] sd35 at mpt_sas3: unit-address w50000c0f042c636f,0:
>> >>> w50000c0f042c636f,0 Jul 17 15:41:08 store2 genunix: [ID 936769
>> >>> kern.info <http://kern.info/>] sd35 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f042c636
>> >>> f,0 Jul 17 15:41:08 store2 genunix: [ID 408114 kern.info <http://kern.info/>]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f042c636
>> >>> f,0 (sd35) online Jul 17 15:41:17 store2 genunix: [ID 583861
>> >>> kern.info <http://kern.info/>] sd36 at mpt_sas3: unit-address w50000c0f0401bf2f,0:
>> >>> w50000c0f0401bf2f,0 Jul 17 15:41:17 store2 genunix: [ID 936769
>> >>> kern.info <http://kern.info/>] sd36 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bf2
>> >>> f,0 Jul 17 15:41:17 store2 genunix: [ID 408114 kern.info <http://kern.info/>]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bf2
>> >>> f,0 (sd36) online Jul 17 15:41:25 store2 genunix: [ID 583861
>> >>> kern.info <http://kern.info/>] sd38 at mpt_sas3: unit-address w50000c0f0401bc1f,0:
>> >>> w50000c0f0401bc1f,0 Jul 17 15:41:25 store2 genunix: [ID 936769
>> >>> kern.info <http://kern.info/>] sd38 is
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bc1
>> >>> f,0 Jul 17 15:41:26 store2 genunix: [ID 408114 kern.info <http://kern.info/>]
>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f/disk at w50000c0f0401bc1
>> >>> f,0 (sd38) online
>> >>>
>> >>>
>> >>> ________________________
>> >>> Michael Talbott
>> >>> Systems Administrator
>> >>> La Jolla Institute
>> >>>
>> >>
>> >> _______________________________________________
>> >> OmniOS-discuss mailing list
>> >> OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>
>> >> http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>> >>
>> >>
>> >
>> > _______________________________________________
>> > OmniOS-discuss mailing list
>> > OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>
>> > http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>> > ...:: House of Ancients ::...
>> > American Staffordshire Terriers
>> >
>> > +31-628-161-350 <tel:%2B31-628-161-350>
>> > +31-614-198-389 <tel:%2B31-614-198-389>
>> > Het Perk 48
>> > 4903 RB
>> > Oosterhout
>> > Netherlands
>> > www.houseofancients.nl <http://www.houseofancients.nl/>
>>
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com <mailto:OmniOS-discuss at lists.omniti.com>
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150729/eab97739/attachment-0001.html>
More information about the OmniOS-discuss
mailing list