[OmniOS-discuss] NFS baulks under load

Tom Robinson tom.robinson at motec.com.au
Wed Dec 11 05:33:26 UTC 2013


Hi,

It would be good to know how to work around this without having to reboot the server.

Anyway, after some time the network/nfs/server timed out:

==> /var/svc/log/network-nfs-server:default.log <==
[ Dec 11 12:05:08 Method or service exit timed out.  Killing contract 123. ]

==> /var/svc/log/svc.startd.log <==
Dec 11 12:05:08/3: svc:/network/nfs/server:default: Method or service exit timed out.  Killing
contract 123.
Dec 11 12:05:08/366: network/nfs/server:default timed out: transitioned to maintenance (see 'svcs
-xv' for details)

==> /var/adm/messages <==
Dec 11 12:05:08 monza.motec.com.au svc.startd[10]: [ID 122153 daemon.warning]
svc:/network/nfs/server:default: Method or service exit timed out.  Killing contract 123.
Dec 11 12:05:08 monza.motec.com.au svc.startd[10]: [ID 748625 daemon.error]
network/nfs/server:default timed out: transitioned to maintenance (see 'svcs -xv' for details)
Dec 11 12:05:08 monza.motec.com.au fmd: [ID 377184 daemon.error] SUNW-MSG-ID: SMF-8000-YX, TYPE:
defect, VER: 1, SEVERITY: major
Dec 11 12:05:08 monza.motec.com.au EVENT-TIME: Wed Dec 11 12:05:08 EST 2013
Dec 11 12:05:08 monza.motec.com.au PLATFORM: X9DR3-F, CSN: 1234567890, HOSTNAME: monza.motec.com.au
Dec 11 12:05:08 monza.motec.com.au SOURCE: software-diagnosis, REV: 0.1
Dec 11 12:05:08 monza.motec.com.au EVENT-ID: ae3c39b1-7f6c-e39e-bef1-977913c867ce
Dec 11 12:05:08 monza.motec.com.au DESC: A service failed - a start, stop or refresh method failed.
Dec 11 12:05:08 monza.motec.com.au   Refer to http://illumos.org/msg/SMF-8000-YX for more information.
Dec 11 12:05:08 monza.motec.com.au AUTO-RESPONSE: The service has been placed into the maintenance
state.
Dec 11 12:05:08 monza.motec.com.au IMPACT: svc:/network/nfs/server:default is unavailable.
Dec 11 12:05:08 monza.motec.com.au REC-ACTION: Run 'svcs -xv svc:/network/nfs/server:default' to
determine the generic reason why the service failed, the location of any logfiles, and a list of
other services impacted.

# svcs -xv svc:/network/nfs/server:default
svc:/network/nfs/server:default (NFS server)
 State: maintenance since 11 December 2013 12:05:08 PM EST
Reason: Start method died on Killed (9).
   See: http://illumos.org/msg/SMF-8000-KS
   See: man -M /usr/share/man -s 1M nfsd
   See: /var/svc/log/network-nfs-server:default.log
Impact: This service is not running.

# svcs -vp svc:/network/nfs/server:default
STATE          NSTATE        STIME    CTID   FMRI
maintenance    -             12:05:08      - svc:/network/nfs/server:default

ps -elf | grep share
 0 S     root  1684  1055   0  50 20        ?    576        ? 12:36:52 pts/1       0:00 grep share
 0 S     root  1134     1   0  40 20        ?   1607        ? 10:05:08 ?           0:00
/usr/sbin/sharemgr stop -P nfs -a
 0 S     root  1485     1   0  40 20        ?   1623        ? 11:05:08 ?           0:00
/usr/sbin/sharemgr start -P nfs -a

The network/nfs/server service couldn't be started so I tried unsuccessfully to get rid of the
sharemgr processes. Maybe my assumption was bad? Are these processes only related to the nfs server?

No joy again, so I rebooted.

What am I not understanding here?

Also, before rebooting, I had set the nfs properties as:

servers=512
lockd_listen_backlog=256
lockd_servers=128
lockd_retransmit_timeout=5
grace_period=90
server_versmin=2
server_versmax=3
client_versmin=2
client_versmax=3
server_delegation=on
nfsmapid_domain=
max_connections=-1
protocol=ALL
listen_backlog=32
device=

I've moved backward to NFSv3 and both client and server seem a lot happier. I've just about
completed a full load test and it all looks pretty stable.

Kind regards,
Tom

Tom Robinson
IT Manager/System Administrator

MoTeC Pty Ltd

121 Merrindale Drive
Croydon South
3136 Victoria
Australia

T: +61 3 9761 5050
F: +61 3 9761 5051   
E: tom.robinson at motec.com.au

On 11/12/13 11:01, Tom Robinson wrote:
> OmniOS v11 r151006
>
> Hi,
>
> I'm having many stability/performance issues with NFS. Server end is OmniOS; client end is CentOS 5.
>
> When the server end is functioning, I can mount OK, but there are really long waits on simple things
> like listing a directory. Often I will get an I/O error. I have been using NFS4 but I'm thinking I
> should just configure the server/client maximum to NFS3.
>
> Currently the NFS server is hosed. This happened yesterday as well. The only way I could bring it
> back to life was to reboot the hardware; something I want to avoid. Is there a way to tidy up the
> network/nfs/server without rebooting?
>
> I had these settings:
> # sharectl get nfs
> servers=16
> lockd_listen_backlog=32
> lockd_servers=20
> lockd_retransmit_timeout=5
> grace_period=90
> server_versmin=2
> server_versmax=4
> client_versmin=2
> client_versmax=4
> server_delegation=on
> nfsmapid_domain=
> max_connections=-1
> protocol=ALL
> listen_backlog=32
> device=
>
> But after reading this: http://virtuallyhyper.com/2013/04/installing-and-configuring-omnios/
> I have changed to these settings to try to improve responsiveness under load:
>
> # sharectl get nfs
> servers=512
> lockd_listen_backlog=256
> lockd_servers=128
> lockd_retransmit_timeout=5
> grace_period=90
> server_versmin=2
> server_versmax=3
> client_versmin=2
> client_versmax=3
> server_delegation=on
> nfsmapid_domain=
> max_connections=-1
> protocol=ALL
> listen_backlog=32
> device=
>
> The problem is I can't re-enable the service and I don't want to reboot to have to fix this.
>
> # svcs -a | grep -e nfs -e rpc
> disabled       17:24:56 svc:/network/nfs/cbd:default
> disabled       17:24:56 svc:/network/nfs/client:default
> disabled       17:24:57 svc:/network/nfs/log:default
> disabled       17:27:55 svc:/network/rpc/meta:default
> disabled       17:27:55 svc:/network/rpc/metamh:default
> disabled       17:27:55 svc:/network/rpc/rex:default
> disabled       17:27:55 svc:/network/rpc/metamed:default
> disabled       17:27:55 svc:/network/rpc/mdcomm:default
> online         17:25:55 svc:/network/rpc/bind:default
> online         17:25:56 svc:/network/rpc/keyserv:default
> online         17:25:56 svc:/network/nfs/status:default
> online         17:27:55 svc:/network/nfs/mapid:default
> online         17:27:56 svc:/network/rpc/gss:default
> online         17:27:56 svc:/network/rpc/smserver:default
> online         17:27:56 svc:/network/nfs/rquota:default
> online*        10:05:07 svc:/network/nfs/server:default
> online         10:32:20 svc:/network/nfs/nlockmgr:default
>
>
> # svcs -xv network/nfs/server
> svc:/network/nfs/server:default (NFS server)
>  State: online since 11 December 2013 10:05:07 AM EST
>    See: man -M /usr/share/man -s 1M nfsd
>    See: /var/svc/log/network-nfs-server:default.log
> Impact: None.
>
> # svcs -vl network/nfs/server
> fmri         svc:/network/nfs/server:default
> name         NFS server
> enabled      true
> state        online
> next_state   offline
> state_time   11 December 2013 10:05:07 AM EST
> logfile      /var/svc/log/network-nfs-server:default.log
> restarter    svc:/system/svc/restarter:default
> contract_id  96
> dependency   require_any/error svc:/milestone/network (online)
> dependency   require_all/error svc:/network/nfs/nlockmgr (online)
> dependency   optional_all/error svc:/network/nfs/mapid (online)
> dependency   require_all/restart svc:/network/rpc/bind (online)
> dependency   optional_all/none svc:/network/rpc/keyserv (online)
> dependency   optional_all/none svc:/network/rpc/gss (online)
> dependency   optional_all/none svc:/network/shares/group (multiple)
> dependency   optional_all/none svc:/system/filesystem/reparse (online)
> dependency   require_all/error svc:/system/filesystem/local (online)
>
> # svcs -vp network/nfs/server
> STATE          NSTATE        STIME    CTID   FMRI
> online         offline       10:05:07     96 svc:/network/nfs/server:default
>                17:27:56      692 nfsd
>                10:05:07     1123 nfs-server
>                10:05:07     1134 sharemgr
>
> # ps -elf | grep -e share -e nfs -e rpc
>  0 S   daemon   465     1   0  40 20        ?    796        ? 17:25:56 ?           0:00
> /usr/sbin/rpcbind
>  0 S   daemon   582     1   0  40 20        ?   1554        ? 17:27:56 ?           0:00
> /usr/lib/nfs/nfsmapid
>  0 S   daemon   545     1   0  40 20        ?    758        ? 17:25:57 ?           0:00
> /usr/lib/nfs/statd
>  0 S   daemon   692     1   0  39  0        ?    732        ? 17:27:56 ?           5:27
> /usr/lib/nfs/nfsd
>  0 S     root  1123    10   0  40 20        ?    946        ? 10:05:08 ?           0:00 /sbin/sh
> /lib/svc/method/nfs-server
>  0 S     root  1134  1123   0  40 20        ?   1607        ? 10:05:08 ?           0:00
> /usr/sbin/sharemgr stop -P nfs -a
>  0 S     root  1462  1318   0  50 20        ?    578        ? 11:00:43 pts/4       0:00 grep -e
> share -e nfs -e rpc
>  0 S   daemon  1274     1   0  39  0        ?    713        ? 10:32:20 ?           0:00
> /usr/lib/nfs/lockd
>
> /var/svc/log/network-nfs-server:default.log
> [ Dec 11 10:05:07 Stopping because service restarting. ]
> [ Dec 11 10:05:07 Executing stop method ("/lib/svc/method/nfs-server stop 96"). ]
>
> I'm really stuck. Any assistance is much appreciated.
>
> Kind regards,
> Tom
>
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20131211/1459ca80/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 263 bytes
Desc: OpenPGP digital signature
URL: <https://omniosce.org/ml-archive/attachments/20131211/1459ca80/attachment-0001.bin>


More information about the OmniOS-discuss mailing list