[OmniOS-discuss] Slow Drive Detection and boot-archive

Schweiss, Chip chip at innovates.com
Wed Jul 29 22:07:52 UTC 2015


The only other thing that come to mind is that you mentioned you have only
a single SAS path to these disks.   Have you disabled multipath?  (stmsboot
-d)


-Chip

On Wed, Jul 29, 2015 at 5:02 PM, Michael Talbott <mtalbott at lji.org> wrote:

> Gave that a shot. No dice. Still getting the 8 second lag. It reminds me
> of raid cards that do staggered spinups that sequentially spin up 1 drive
> at a time. Only, this is happening after the kernel loads and of course,
> the LSI 9200s are flashed in IT mode with v.19 firmware and bios disabled.
>
>
> Jul 29 14:57:12 store2 genunix: [ID 583861 kern.info] sd10 at mpt_sas2:
> unit-address w50000c0f0401c20f,0: w50000c0f0401c20f,0
> Jul 29 14:57:12 store2 genunix: [ID 936769 kern.info] sd10 is /pci at 0
> ,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f0401c20f,0
> Jul 29 14:57:12 store2 genunix: [ID 408114 kern.info] /pci at 0
> ,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f0401c20f,0 (sd10)
> online
> Jul 29 14:57:20 store2 genunix: [ID 583861 kern.info] sd11 at mpt_sas2:
> unit-address w50000c0f040075db,0: w50000c0f040075db,0
> Jul 29 14:57:20 store2 genunix: [ID 936769 kern.info] sd11 is /pci at 0
> ,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f040075db,0
> Jul 29 14:57:21 store2 genunix: [ID 408114 kern.info] /pci at 0
> ,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f040075db,0 (sd11)
> online
> Jul 29 14:57:29 store2 genunix: [ID 583861 kern.info] sd12 at mpt_sas2:
> unit-address w50000c0f042c684b,0: w50000c0f042c684b,0
> Jul 29 14:57:29 store2 genunix: [ID 936769 kern.info] sd12 is /pci at 0
> ,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f042c684b,0
> Jul 29 14:57:29 store2 genunix: [ID 408114 kern.info] /pci at 0
> ,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f042c684b,0 (sd12)
> online
> Jul 29 14:57:38 store2 genunix: [ID 583861 kern.info] sd13 at mpt_sas2:
> unit-address w50000c0f0457149f,0: w50000c0f0457149f,0
> Jul 29 14:57:38 store2 genunix: [ID 936769 kern.info] sd13 is /pci at 0
> ,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f0457149f,0
> Jul 29 14:57:38 store2 genunix: [ID 408114 kern.info] /pci at 0
> ,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f0457149f,0 (sd13)
> online
> Jul 29 14:57:47 store2 genunix: [ID 583861 kern.info] sd14 at mpt_sas2:
> unit-address w50000c0f042b1c6f,0: w50000c0f042b1c6f,0
> Jul 29 14:57:47 store2 genunix: [ID 936769 kern.info] sd14 is /pci at 0
> ,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f042b1c6f,0
> Jul 29 14:57:47 store2 genunix: [ID 408114 kern.info] /pci at 0
> ,0/pci8086,e04 at 2/pci1000,3080 at 0/iport at f0/disk at w50000c0f042b1c6f,0 (sd14)
> online
>
>
> ________________________
> Michael Talbott
> Systems Administrator
> La Jolla Institute
>
> On Jul 29, 2015, at 1:50 PM, Schweiss, Chip <chip at innovates.com> wrote:
>
> I have an OmniOS box with all the same hardware except the server and hard
> disks.  I would wager this something to do with the WD disks and something
> different happening in the init.
>
> This is a stab in the dark, but try adding "power-condition:false" in
> /kernel/drv/sd.conf for the WD disks.
>
> -Chip
>
>
>
> On Wed, Jul 29, 2015 at 12:48 PM, Michael Talbott <mtalbott at lji.org>
> wrote:
>
>> Here's the specs of that server.
>>
>> Fujitsu RX300S8
>>  -
>> http://www.fujitsu.com/fts/products/computing/servers/primergy/rack/rx300/
>> 128G ECC DDR3 1600 RAM
>> 2 x Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
>> 2 x LSI 9200-8e
>> 2 x 10Gb Intel NICs
>> 2 x SuperMicro 847E26-RJBOD1 45 bay JBOD enclosures
>>  - http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm
>>
>> The enclosures are not currently set up for multipathing. The front and
>> rear backplane each have a single independent SAS connection to one of the
>> LSI 9200s.
>>
>> The two enclosures are fully loaded with 45 x 4TB WD4001FYYG-01SL3 drives
>> each (90 total).
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16822236353
>>
>> Booting the server up in Ubuntu or CentOS does not have that 8 second
>> delay. Each drive is found in a fraction of a second (activity LEDs on the
>> enclosure flash on and off really quick as the drives are scanned). On
>> OmniOS, the drives seem to be scanned in the same order, but, instead of it
>> spending a fraction of a second on each drive, it spends 8 seconds on 1
>> drive (led of only one drive rapidly flashing during that process) before
>> moving on to the next x 90 drives.
>>
>> Is there anything I can do to get more verbosity in the boot messages
>> that might just reveal the root issue?
>>
>> Any suggestions appreciated.
>>
>> Thanks
>>
>> ________________________
>> Michael Talbott
>> Systems Administrator
>> La Jolla Institute
>>
>> On Jul 29, 2015, at 7:51 AM, Schweiss, Chip <chip at innovates.com> wrote:
>>
>>
>>
>> On Fri, Jul 24, 2015 at 5:03 PM, Michael Talbott <mtalbott at lji.org>
>> wrote:
>>
>>> Hi,
>>>
>>> I've downgraded the cards (LSI 9211-8e) to v.19 and disabled their boot
>>> bios. But I'm still getting the 8 second per drive delay after the kernel
>>> loads. Any other ideas?
>>>
>>>
>> 8 seconds is way too long.   What JBODs and disks are you using?   Could
>> it be they are powered off and the delay in waiting for the power on
>> command to complete?   This could be accelerated by using lsiutils to send
>> them all power on commands first.
>>
>> While I still consider it slow, however, my OmniOS systems with  LSI HBAs
>> discover about 2 disks per second.   With systems with LOTS of disk all
>> multipathed it still stacks up to a long time to discover them all.
>>
>> -Chip
>>
>>
>>>
>>> ________________________
>>> Michael Talbott
>>> Systems Administrator
>>> La Jolla Institute
>>>
>>> > On Jul 20, 2015, at 11:27 PM, Floris van Essen ..:: House of Ancients
>>> Amstafs ::.. <info at houseofancients.nl> wrote:
>>> >
>>> > Michael,
>>> >
>>> > I know v20 does cause lots of issue's.
>>> > V19 , to the best of my knowledge doesn't contain any, so I would
>>> downgrade to v19
>>> >
>>> >
>>> > Kr,
>>> >
>>> >
>>> > Floris
>>> > -----Oorspronkelijk bericht-----
>>> > Van: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com]
>>> Namens Michael Talbott
>>> > Verzonden: dinsdag 21 juli 2015 4:57
>>> > Aan: Marion Hakanson <hakansom at ohsu.edu>
>>> > CC: omnios-discuss <omnios-discuss at lists.omniti.com>
>>> > Onderwerp: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
>>> >
>>> > Thanks for the reply. The bios for the card is disabled already. The 8
>>> second per drive scan happens after the kernel has already loaded and it is
>>> scanning for devices. I wonder if it's due to running newer firmware. I did
>>> update the cards to fw v.20.something before I moved to omnios. Is there a
>>> particular firmware version on the cards I should run to match OmniOS's
>>> drivers?
>>> >
>>> >
>>> > ________________________
>>> > Michael Talbott
>>> > Systems Administrator
>>> > La Jolla Institute
>>> >
>>> >> On Jul 20, 2015, at 6:06 PM, Marion Hakanson <hakansom at ohsu.edu>
>>> wrote:
>>> >>
>>> >> Michael,
>>> >>
>>> >> I've not seen this;  I do have one system with 120 drives and it
>>> >> definitely does not have this problem.  A couple with 80+ drives are
>>> >> also free of this issue, though they are still running OpenIndiana.
>>> >>
>>> >> One thing I pretty much always do here, is to disable the boot option
>>> >> in the LSI HBA's config utility (accessible from the during boot after
>>> >> the BIOS has started up).  I do this because I don't want the BIOS
>>> >> thinking it can boot from any of the external JBOD disks;  And also
>>> >> because I've had some system BIOS crashes when they tried to enumerate
>>> >> too many drives.  But, this all happens at the BIOS level, before the
>>> >> OS has even started up, so in theory it should not affect what you are
>>> >> seeing.
>>> >>
>>> >> Regards,
>>> >>
>>> >> Marion
>>> >>
>>> >>
>>> >> ================================================================
>>> >> Subject: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
>>> >> From: Michael Talbott <mtalbott at lji.org>
>>> >> Date: Fri, 17 Jul 2015 16:15:47 -0700
>>> >> To: omnios-discuss <omnios-discuss at lists.omniti.com>
>>> >>
>>> >> Just realized my typo. I'm using this on my 90 and 180 drive systems:
>>> >>
>>> >> # svccfg -s boot-archive setprop start/timeout_seconds=720 # svccfg -s
>>> >> boot-archive setprop start/timeout_seconds=1440
>>> >>
>>> >> Seems like 8 seconds to detect each drive is pretty excessive.
>>> >>
>>> >> Any ideas on how to speed that up?
>>> >>
>>> >>
>>> >> ________________________
>>> >> Michael Talbott
>>> >> Systems Administrator
>>> >> La Jolla Institute
>>> >>
>>> >>> On Jul 17, 2015, at 4:07 PM, Michael Talbott <mtalbott at lji.org>
>>> wrote:
>>> >>>
>>> >>> I have multiple NAS servers I've moved to OmniOS and each of them
>>> have 90-180 4T disks. Everything has worked out pretty well for the most
>>> part. But I've come into an issue where when I reboot any of them, I'm
>>> getting boot-archive service timeouts happening. I found a workaround of
>>> increasing the timeout value which brings me to the following. As you can
>>> see below in a dmesg output, it's taking the kernel about 8 seconds to
>>> detect each of the drives. They're connected via a couple SAS2008 based LSI
>>> cards.
>>> >>>
>>> >>> Is this normal?
>>> >>> Is there a way to speed that up?
>>> >>>
>>> >>> I've fixed my frustrating boot-archive timeout problem by adjusting
>>> the timeout value from the default of 60 seconds (I guess that'll work ok
>>> on systems with less than 8 drives?) to 8 seconds * 90 drives + a little
>>> extra time = 280 seconds (for the 90 drive systems). Which means it takes
>>> between 12-24 minutes to boot those machines up.
>>> >>>
>>> >>> # svccfg -s boot-archive setprop start/timeout_seconds=280
>>> >>>
>>> >>> I figure I can't be the only one. A little googling also revealed:
>>> >>> https://www.illumos.org/issues/4614
>>> >>> <https://www.illumos.org/issues/4614>
>>> >>>
>>> >>> Jul 17 15:40:15 store2 genunix: [ID 583861 kern.info] sd29 at
>>> >>> mpt_sas3: unit-address w50000c0f0401bd43,0: w50000c0f0401bd43,0 Jul
>>> >>> 17 15:40:15 store2 genunix: [ID 936769 kern.info] sd29 is
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f0401bd4
>>> >>> 3,0 Jul 17 15:40:16 store2 genunix: [ID 408114 kern.info]
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f0401bd4
>>> >>> 3,0 (sd29) online Jul 17 15:40:24 store2 genunix: [ID 583861
>>> >>> kern.info] sd30 at mpt_sas3: unit-address w50000c0f045679c3,0:
>>> >>> w50000c0f045679c3,0 Jul 17 15:40:24 store2 genunix: [ID 936769
>>> >>> kern.info] sd30 is
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f045679c
>>> >>> 3,0 Jul 17 15:40:24 store2 genunix: [ID 408114 kern.info]
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f045679c
>>> >>> 3,0 (sd30) online Jul 17 15:40:33 store2 genunix: [ID 583861
>>> >>> kern.info] sd31 at mpt_sas3: unit-address w50000c0f045712b3,0:
>>> >>> w50000c0f045712b3,0 Jul 17 15:40:33 store2 genunix: [ID 936769
>>> >>> kern.info] sd31 is
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f045712b
>>> >>> 3,0 Jul 17 15:40:33 store2 genunix: [ID 408114 kern.info]
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f045712b
>>> >>> 3,0 (sd31) online Jul 17 15:40:42 store2 genunix: [ID 583861
>>> >>> kern.info] sd32 at mpt_sas3: unit-address w50000c0f04571497,0:
>>> >>> w50000c0f04571497,0 Jul 17 15:40:42 store2 genunix: [ID 936769
>>> >>> kern.info] sd32 is
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f0457149
>>> >>> 7,0 Jul 17 15:40:42 store2 genunix: [ID 408114 kern.info]
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f0457149
>>> >>> 7,0 (sd32) online Jul 17 15:40:50 store2 genunix: [ID 583861
>>> >>> kern.info] sd33 at mpt_sas3: unit-address w50000c0f042ac8eb,0:
>>> >>> w50000c0f042ac8eb,0 Jul 17 15:40:50 store2 genunix: [ID 936769
>>> >>> kern.info] sd33 is
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f042ac8e
>>> >>> b,0 Jul 17 15:40:50 store2 genunix: [ID 408114 kern.info]
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f042ac8e
>>> >>> b,0 (sd33) online Jul 17 15:40:59 store2 genunix: [ID 583861
>>> >>> kern.info] sd34 at mpt_sas3: unit-address w50000c0f04571473,0:
>>> >>> w50000c0f04571473,0 Jul 17 15:40:59 store2 genunix: [ID 936769
>>> >>> kern.info] sd34 is
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f0457147
>>> >>> 3,0 Jul 17 15:40:59 store2 genunix: [ID 408114 kern.info]
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f0457147
>>> >>> 3,0 (sd34) online Jul 17 15:41:08 store2 genunix: [ID 583861
>>> >>> kern.info] sd35 at mpt_sas3: unit-address w50000c0f042c636f,0:
>>> >>> w50000c0f042c636f,0 Jul 17 15:41:08 store2 genunix: [ID 936769
>>> >>> kern.info] sd35 is
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f042c636
>>> >>> f,0 Jul 17 15:41:08 store2 genunix: [ID 408114 kern.info]
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f042c636
>>> >>> f,0 (sd35) online Jul 17 15:41:17 store2 genunix: [ID 583861
>>> >>> kern.info] sd36 at mpt_sas3: unit-address w50000c0f0401bf2f,0:
>>> >>> w50000c0f0401bf2f,0 Jul 17 15:41:17 store2 genunix: [ID 936769
>>> >>> kern.info] sd36 is
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f0401bf2
>>> >>> f,0 Jul 17 15:41:17 store2 genunix: [ID 408114 kern.info]
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f0401bf2
>>> >>> f,0 (sd36) online Jul 17 15:41:25 store2 genunix: [ID 583861
>>> >>> kern.info] sd38 at mpt_sas3: unit-address w50000c0f0401bc1f,0:
>>> >>> w50000c0f0401bc1f,0 Jul 17 15:41:25 store2 genunix: [ID 936769
>>> >>> kern.info] sd38 is
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f0401bc1
>>> >>> f,0 Jul 17 15:41:26 store2 genunix: [ID 408114 kern.info]
>>> >>> /pci at 0,0/pci8086,e06 at 2,2/pci1000,3080 at 0/iport at f
>>> /disk at w50000c0f0401bc1
>>> >>> f,0 (sd38) online
>>> >>>
>>> >>>
>>> >>> ________________________
>>> >>> Michael Talbott
>>> >>> Systems Administrator
>>> >>> La Jolla Institute
>>> >>>
>>> >>
>>> >> _______________________________________________
>>> >> OmniOS-discuss mailing list
>>> >> OmniOS-discuss at lists.omniti.com
>>> >> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>> >>
>>> >>
>>> >
>>> > _______________________________________________
>>> > OmniOS-discuss mailing list
>>> > OmniOS-discuss at lists.omniti.com
>>> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>> > ...:: House of Ancients ::...
>>> > American Staffordshire Terriers
>>> >
>>> > +31-628-161-350
>>> > +31-614-198-389
>>> > Het Perk 48
>>> > 4903 RB
>>> > Oosterhout
>>> > Netherlands
>>> > www.houseofancients.nl
>>>
>>> _______________________________________________
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss at lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150729/8802befc/attachment-0001.html>


More information about the OmniOS-discuss mailing list