Multi instance over NETAPP NFS NAS

Discussion:

Jordi Saltiveri Jovells

2013-10-17 10:51:18 UTC

Hello,

We have an issue and would appreciate some advice on what could be causing it.
We are implementing Multi instance queuemanager solution.

The environment have this specs:

· Virtual server on vmware host ESX 4 server

· OS: Red hat 5.9 Enterprise

· MQServer 7.5.0.1

· NFS FileSystem V4 on NetApp Release 8.0.2P4 7-Mode: Tue Nov 15 16:16:47 PST 2011

· NFS mount options: nfs4 bg,soft,nointr,timeo=300,retrans=5,rsize=65940,wsize=65940,noac 0 0

This issue occurs when the storage team wants to move the service at the other NETAPP active node using "takeover" option or "giveback" option, our MQ server loses the file System and switch to standby instance.

The next FDC was generated in the MQServer:

AMQ11708.0.FDC 2013/10/14 23:33:28.593536+2 Installation1 amqzxma0 11708 9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK

!AMQ11740.0.FDC 2013/10/14 23:33:31.073928+2 Installation1 PS006049 psiWaitPSModeChange

AMQ11571.0.FDC 2013/10/14 23:33:37.426462+2 Installation1 amqzxma0 11571 9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK

!AMQ11740.0.FDC 2013/10/14 23:33:42.944821+2 Installation1 PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:42.958556+2 Installation1 amqzmuf0 11740 9 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

!AMQ11740.0.FDC 2013/10/14 23:33:43.246717+2 Installation1 PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:43.261180+2 Installation1 amqzmuf0 11740 10 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

!AMQ11740.0.FDC 2013/10/14 23:33:43.382625+2 Installation1 PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:43.396977+2 Installation1 amqzmuf0 11740 11 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

AMQ11602.0.FDC 2013/10/14 23:35:01.528132+2 Installation1 amqzmuf0 11602 9 PS017094 psiProcessProxySubs MQRC_CONNECTION_BROKEN OK

!AMQ11602.0.FDC 2013/10/14 23:35:01.546072+2 Installation1 PS000035 psiReceivePublications

AMQ11708.0.FDC 2013/10/14 23:35:01.576255+2 Installation1 amqzxma0 11708 1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK

AMQ11571.0.FDC 2013/10/14 23:35:27.463749+2 Installation1 amqzxma0 11571 1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK

AMQ11735.0.FDC 2013/10/14 23:35:37.640100+2 Installation1 amqzmur0 11735 1 ZX086131 amqzmur0 zrcE_UNEXPECTED_ERROR OK

AMQ11735.0.FDC 2013/10/14 23:35:37.651677+2 Installation1 amqzmur0 11735 2 ZX086131 amqzmur0 OK OK

We know that file lock monitor thread stops the qmgr if it not responded for 20 seconds:

http://www-01.ibm.com/support/docview.wss?uid=swg21592501

The MQ queue manager runs a file lock monitor thread, which, every 10 seconds, checks that it has access to the resources it needs over the network. Another health-check thread monitors this one to determine whether it has hung, which is a sign that there is a network problem. The FDC with probe id ZX155001 is from this health-check thread which detected that the file lock monitor thread had not responded for 20 seconds.

I would appreciate if anyone can help me with this.

Thanks!

[Descripción: Descripció: Descripció: Descripció: logo_claim] Jordi Saltiveri
Avda.Diagonal, 605, Planta 4ª - 08028 Barcelona
E-mail: jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org<mailto:jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org>

________________________________

AVISO DE CONFIDENCIALIDAD.
Este correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. everis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.

CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. everis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail.

To unsubscribe, write to LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org and,
in the message body (not the subject), write: SIGNOFF MQSERIES
Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com
Archive: http://listserv.meduniwien.ac.at/archives/mqser-l.html

T.Rob

2013-10-17 13:09:54 UTC

Permalink

I'm not familiar with NETAPP so I have to ask does it really promise to
preserve the locks during failover? As for the 20 seconds, the heartbeat is
probably a blocking call that hangs for a short bit during the NETAPP
failover, yes?

-- T.Rob

From: MQSeries List [mailto:MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org] On Behalf Of
Jordi Saltiveri Jovells
Sent: Thursday, October 17, 2013 6:51 AM
To: MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org
Subject: Multi instance over NETAPP NFS NAS

Hello,

We have an issue and would appreciate some advice on what could be causing
it.

We are implementing Multi instance queuemanager solution.

The environment have this specs:

· Virtual server on vmware host ESX 4 server

· OS: Red hat 5.9 Enterprise

· MQServer 7.5.0.1

· NFS FileSystem V4 on NetApp Release 8.0.2P4 7-Mode: Tue Nov 15
16:16:47 PST 2011

· NFS mount options: nfs4
bg,soft,nointr,timeo=300,retrans=5,rsize=65940,wsize=65940,noac 0 0

This issue occurs when the storage team wants to move the service at the
other NETAPP active node using takeover option or "giveback" option, our
MQ server loses the file System and switch to standby instance.

The next FDC was generated in the MQServer:

AMQ11708.0.FDC 2013/10/14 23:33:28.593536+2 Installation1 amqzxma0 11708 9
ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK
!AMQ11740.0.FDC 2013/10/14 23:33:31.073928+2 Installation1
PS006049 psiWaitPSModeChange
AMQ11571.0.FDC 2013/10/14 23:33:37.426462+2 Installation1 amqzxma0 11571
9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK
!AMQ11740.0.FDC 2013/10/14 23:33:42.944821+2 Installation1
PS058054 psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:42.958556+2 Installation1 amqzmuf0 11740
9 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
!AMQ11740.0.FDC 2013/10/14 23:33:43.246717+2 Installation1
PS058054 psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:43.261180+2 Installation1 amqzmuf0 11740
10 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
!AMQ11740.0.FDC 2013/10/14 23:33:43.382625+2 Installation1
PS058054 psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:43.396977+2 Installation1 amqzmuf0 11740
11 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
AMQ11602.0.FDC 2013/10/14 23:35:01.528132+2 Installation1 amqzmuf0 11602
9 PS017094 psiProcessProxySubs MQRC_CONNECTION_BROKEN OK
!AMQ11602.0.FDC 2013/10/14 23:35:01.546072+2 Installation1
PS000035 psiReceivePublications
AMQ11708.0.FDC 2013/10/14 23:35:01.576255+2 Installation1 amqzxma0 11708
1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK
AMQ11571.0.FDC 2013/10/14 23:35:27.463749+2 Installation1 amqzxma0 11571
1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK
AMQ11735.0.FDC 2013/10/14 23:35:37.640100+2 Installation1 amqzmur0 11735
1 ZX086131 amqzmur0 zrcE_UNEXPECTED_ERROR OK
AMQ11735.0.FDC 2013/10/14 23:35:37.651677+2 Installation1 amqzmur0 11735
2 ZX086131 amqzmur0 OK OK

We know that file lock monitor thread stops the qmgr if it not responded for
20 seconds:

http://www-01.ibm.com/support/docview.wss?uid=swg21592501

The MQ queue manager runs a file lock monitor thread, which, every 10
seconds, checks that it has access to the resources it needs over the
network. Another health-check thread monitors this one to determine whether
it has hung, which is a sign that there is a network problem. The FDC with
probe id ZX155001 is from this health-check thread which detected that the
file lock monitor thread had not responded for 20 seconds.

I would appreciate if anyone can help me with this.

Thanks!

Descripción: Descripció: Descripció: Descripció: logo_claim Jordi
Saltiveri
Avda.Diagonal, 605, Planta 4ª - 08028 Barcelona
E-mail: jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org

_____

AVISO DE CONFIDENCIALIDAD.
Este correo y la información contenida o adjunta al mismo es privada y
confidencial y va dirigida exclusivamente a su destinatario. everis informa
a quien pueda haber recibido este correo por error que contiene información
confidencial cuyo uso, copia, reproducción o distribución está expresamente
prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por
error, le rogamos lo ponga en conocimiento del emisor y proceda a su
eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.

CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private
and confidential and intended exclusively for the addressee. everis informs
to whom it may receive it in error that it contains privileged information
and its use, copy, reproduction or distribution is prohibited. If you are
not an intended recipient of this E-mail, please notify the sender, delete
it and do not read, act upon, print, disclose, copy, retain or redistribute
any portion of this E-mail.

_____

List Archive <http://listserv.meduniwien.ac.at/archives/mqser-l.html> -
Manage Your List Settings
<http://listserv.meduniwien.ac.at/cgi-bin/wa?SUBED1=mqser-l&A=1> -
Unsubscribe
<mailto:LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org?subject=Unsubscribe&BODY=signoff%
20mqseries>

Instructions for managing your mailing list subscription are provided in the
Listserv General Users Guide available at http://www.lsoft.com
<http://www.lsoft.com/resources/manuals.asp>

To unsubscribe, write to LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org and,
in the message body (not the subject), write: SIGNOFF MQSERIES
Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com
Archive: http://listserv.meduniwien.ac.at/archives/mqser-l.html

Hatcher Jeter

2013-10-17 14:34:53 UTC

Permalink

I tried a NETAPP appliance at a customer and they FSCheck never worked. Is
the fschk working for you?

Hatcher H. Jeter, Jr.
Domain Architect - Integration Technologies
IBM Practice
Avnet Services

Mobile: 804-334-4750
***@atech.com
www.atech.com

This message, including files attached to it, may contain confidential
information that is intended only for the use of the ADDRESSEES(S) named
above. If you are not an intended recipient, you are hereby notified that
any dissemination or copying of the information contained in this message,
or the taking of any action in reliance upon the information, is strictly
prohibited. If you have received this message in error, please notify the
sender immediately and delete the message from your system. Thank You.

From:
"T.Rob" <***@IOPTCONSULTING.COM>
To:
***@LISTSERV.MEDUNIWIEN.AC.AT,
Date:
10/17/2013 10:32 AM
Subject:
Re: Multi instance over NETAPP NFS NAS
Sent by:
MQSeries List <***@LISTSERV.MEDUNIWIEN.AC.AT>

I'm not familiar with NETAPP so I have to ask â does it really promise to
preserve the locks during failover? As for the 20 seconds, the heartbeat
is probably a blocking call that hangs for a short bit during the NETAPP
failover, yes?

-- T.Rob

From: MQSeries List [mailto:***@LISTSERV.MEDUNIWIEN.AC.AT] On Behalf
Of Jordi Saltiveri Jovells
Sent: Thursday, October 17, 2013 6:51 AM
To: ***@LISTSERV.MEDUNIWIEN.AC.AT
Subject: Multi instance over NETAPP NFS NAS

Hello,

We have an issue and would appreciate some advice on what could be causing
it.
We are implementing Multi instance queuemanager solution.

The environment have this specs:

Â· Virtual server on vmware host ESX 4 server
Â· OS: Red hat 5.9 Enterprise
Â· MQServer 7.5.0.1
Â· NFS FileSystem V4 on NetApp Release 8.0.2P4 7-Mode: Tue Nov 15
16:16:47 PST 2011
Â· NFS mount options: nfs4
bg,soft,nointr,timeo=300,retrans=5,rsize=65940,wsize=65940,noac 0 0

This issue occurs when the storage team wants to move the service at the
other NETAPP active node using âtakeoverâ option or "giveback" option, our
MQ server loses the file System and switch to standby instance.

The next FDC was generated in the MQServer:

AMQ11708.0.FDC 2013/10/14 23:33:28.593536+2 Installation1 amqzxma0 11708 9
ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK
!AMQ11740.0.FDC 2013/10/14 23:33:31.073928+2 Installation1 PS006049
psiWaitPSModeChange
AMQ11571.0.FDC 2013/10/14 23:33:37.426462+2 Installation1 amqzxma0 11571
9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK
!AMQ11740.0.FDC 2013/10/14 23:33:42.944821+2 Installation1 PS058054
psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:42.958556+2 Installation1 amqzmuf0 11740
9 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
!AMQ11740.0.FDC 2013/10/14 23:33:43.246717+2 Installation1 PS058054
psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:43.261180+2 Installation1 amqzmuf0 11740
10 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
!AMQ11740.0.FDC 2013/10/14 23:33:43.382625+2 Installation1 PS058054
psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:43.396977+2 Installation1 amqzmuf0 11740
11 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
AMQ11602.0.FDC 2013/10/14 23:35:01.528132+2 Installation1 amqzmuf0 11602
9 PS017094 psiProcessProxySubs MQRC_CONNECTION_BROKEN OK
!AMQ11602.0.FDC 2013/10/14 23:35:01.546072+2 Installation1 PS000035
psiReceivePublications
AMQ11708.0.FDC 2013/10/14 23:35:01.576255+2 Installation1 amqzxma0 11708
1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK
AMQ11571.0.FDC 2013/10/14 23:35:27.463749+2 Installation1 amqzxma0 11571
1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK
AMQ11735.0.FDC 2013/10/14 23:35:37.640100+2 Installation1 amqzmur0 11735
1 ZX086131 amqzmur0 zrcE_UNEXPECTED_ERROR OK
AMQ11735.0.FDC 2013/10/14 23:35:37.651677+2 Installation1 amqzmur0 11735
2 ZX086131 amqzmur0 OK OK

We know that file lock monitor thread stops the qmgr if it not responded
for 20 seconds:

http://www-01.ibm.com/support/docview.wss?uid=swg21592501

The MQ queue manager runs a file lock monitor thread, which, every 10
seconds, checks that it has access to the resources it needs over the
network. Another health-check thread monitors this one to determine
whether it has hung, which is a sign that there is a network problem. The
FDC with probe id ZX155001 is from this health-check thread which detected
that the file lock monitor thread had not responded for 20 seconds.

I would appreciate if anyone can help me with this.

Thanks!

Jordi Saltiveri
Avda.Diagonal, 605, Planta 4Âª - 08028 Barcelona
E-mail: ***@everis.com

AVISO DE CONFIDENCIALIDAD.
Este correo y la informaciÃ³n contenida o adjunta al mismo es privada y
confidencial y va dirigida exclusivamente a su destinatario. everis
informa a quien pueda haber recibido este correo por error que contiene
informaciÃ³n confidencial cuyo uso, copia, reproducciÃ³n o distribuciÃ³n estÃ¡
expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe
este correo por error, le rogamos lo ponga en conocimiento del emisor y
proceda a su eliminaciÃ³n sin copiarlo, imprimirlo o utilizarlo de ningÃºn
modo.

CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are
private and confidential and intended exclusively for the addressee.
everis informs to whom it may receive it in error that it contains
privileged information and its use, copy, reproduction or distribution is
prohibited. If you are not an intended recipient of this E-mail, please
notify the sender, delete it and do not read, act upon, print, disclose,
copy, retain or redistribute any portion of this E-mail.

List Archive - Manage Your List Settings - Unsubscribe
Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com

List Archive - Manage Your List Settings - Unsubscribe
Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com

Jordi Saltiveri Jovells

2013-10-17 15:57:17 UTC

Permalink

Hi T-Rob , I sent your comment to storage team.

Hatcher , the amqmfsck works ok:

1.
amqmfsck /MQHA/qmgrs/COI01E01
The tests on the directory completed successfully.

2.
[SERVER1].mqmadmin:/opt/mqm/bin > ./amqmfsck -c /MQHA/qmgrs/COI01E01
Start a second copy of this program with the same parameters on another server.
Writing to test file. This will normally complete within about 60 seconds.
.......................................
The tests on the directory completed successfully.

[SERVER2].mqmadmin:/opt/mqm/bin > ./amqmfsck -c /MQHA/qmgrs/COI01E01
Writing to test file. This will normally complete within about 60 seconds.
.......................
The tests on the directory completed successfully.

3.
[SERVER1].mqmadmin:/opt/mqm/bin > ./amqmfsck -w /MQHA/qmgrs/COI01E01
Start a second copy of this program with the same parameters on another server.
File lock acquired.
Press Enter or terminate this process to release the lock.

File lock released.
The tests on the directory completed successfully.

[SERVER2].mqmadmin:/opt/mqm/bin > ./amqmfsck -w /MQHA/qmgrs/COI01E01
Waiting for the file lock.
Waiting for the file lock.
Waiting for the file lock.
Waiting for the file lock.
Waiting for the file lock.
Waiting for the file lock.
File lock acquired.
Press Enter or terminate this process to release the lock.

File lock released.
The tests on the directory completed successfully.

Thanks for all.

[DescripciÃ³n: DescripciÃ³: DescripciÃ³: DescripciÃ³: logo_claim] Jordi Saltiveri
Avda.Diagonal, 605, Planta 4Âª - 08028 Barcelona
E-mail: ***@everis.com<mailto:***@everis.com>

De: MQSeries List [mailto:***@LISTSERV.MEDUNIWIEN.AC.AT] En nombre de Hatcher Jeter
Enviado el: jueves, 17 de octubre de 2013 16:35
Para: ***@LISTSERV.MEDUNIWIEN.AC.AT
Asunto: Re: Multi instance over NETAPP NFS NAS

I tried a NETAPP appliance at a customer and they FSCheck never worked. Is the fschk working for you?
________________________________

Hatcher H. Jeter, Jr.
Domain Architect - Integration Technologies
IBM Practice
Avnet Services

Mobile: 804-334-4750
***@atech.com<mailto:***@atech.com>
www.atech.com

This message, including files attached to it, may contain confidential information that is intended only for the use of the ADDRESSEES(S) named above. If you are not an intended recipient, you are hereby notified that any dissemination or copying of the information contained in this message, or the taking of any action in reliance upon the information, is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete the message from your system. Thank You.

From:

"T.Rob" <***@IOPTCONSULTING.COM<mailto:***@IOPTCONSULTING.COM>>

To:

***@LISTSERV.MEDUNIWIEN.AC.AT<mailto:***@LISTSERV.MEDUNIWIEN.AC.AT>,

Date:

10/17/2013 10:32 AM

Subject:

Re: Multi instance over NETAPP NFS NAS

Sent by:

MQSeries List <***@LISTSERV.MEDUNIWIEN.AC.AT<mailto:***@LISTSERV.MEDUNIWIEN.AC.AT>>

________________________________

I'm not familiar with NETAPP so I have to ask â does it really promise to preserve the locks during failover? As for the 20 seconds, the heartbeat is probably a blocking call that hangs for a short bit during the NETAPP failover, yes?

-- T.Rob

From: MQSeries List [mailto:***@LISTSERV.MEDUNIWIEN.AC.AT] On Behalf Of Jordi Saltiveri Jovells
Sent: Thursday, October 17, 2013 6:51 AM
To: ***@LISTSERV.MEDUNIWIEN.AC.AT<mailto:***@LISTSERV.MEDUNIWIEN.AC.AT>
Subject: Multi instance over NETAPP NFS NAS

Hello,

We have an issue and would appreciate some advice on what could be causing it.
We are implementing Multi instance queuemanager solution.

The environment have this specs:

â¢ Virtual server on vmware host ESX 4 server
â¢ OS: Red hat 5.9 Enterprise
â¢ MQServer 7.5.0.1
â¢ NFS FileSystem V4 on NetApp Release 8.0.2P4 7-Mode: Tue Nov 15 16:16:47 PST 2011
â¢ NFS mount options: nfs4 bg,soft,nointr,timeo=300,retrans=5,rsize=65940,wsize=65940,noac 0 0

This issue occurs when the storage team wants to move the service at the other NETAPP active node using âtakeoverâ option or "giveback" option, our MQ server loses the file System and switch to standby instance.

The next FDC was generated in the MQServer:

AMQ11708.0.FDC 2013/10/14 23:33:28.593536+2 Installation1 amqzxma0 11708 9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK
!AMQ11740.0.FDC 2013/10/14 23:33:31.073928+2 Installation1 PS006049 psiWaitPSModeChange
AMQ11571.0.FDC 2013/10/14 23:33:37.426462+2 Installation1 amqzxma0 11571 9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK
!AMQ11740.0.FDC 2013/10/14 23:33:42.944821+2 Installation1 PS058054 psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:42.958556+2 Installation1 amqzmuf0 11740 9 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
!AMQ11740.0.FDC 2013/10/14 23:33:43.246717+2 Installation1 PS058054 psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:43.261180+2 Installation1 amqzmuf0 11740 10 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
!AMQ11740.0.FDC 2013/10/14 23:33:43.382625+2 Installation1 PS058054 psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:43.396977+2 Installation1 amqzmuf0 11740 11 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
AMQ11602.0.FDC 2013/10/14 23:35:01.528132+2 Installation1 amqzmuf0 11602 9 PS017094 psiProcessProxySubs MQRC_CONNECTION_BROKEN OK
!AMQ11602.0.FDC 2013/10/14 23:35:01.546072+2 Installation1 PS000035 psiReceivePublications
AMQ11708.0.FDC 2013/10/14 23:35:01.576255+2 Installation1 amqzxma0 11708 1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK
AMQ11571.0.FDC 2013/10/14 23:35:27.463749+2 Installation1 amqzxma0 11571 1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK
AMQ11735.0.FDC 2013/10/14 23:35:37.640100+2 Installation1 amqzmur0 11735 1 ZX086131 amqzmur0 zrcE_UNEXPECTED_ERROR OK
AMQ11735.0.FDC 2013/10/14 23:35:37.651677+2 Installation1 amqzmur0 11735 2 ZX086131 amqzmur0 OK OK

We know that file lock monitor thread stops the qmgr if it not responded for 20 seconds:

http://www-01.ibm.com/support/docview.wss?uid=swg21592501

The MQ queue manager runs a file lock monitor thread, which, every 10 seconds, checks that it has access to the resources it needs over the network. Another health-check thread monitors this one to determine whether it has hung, which is a sign that there is a network problem. The FDC with probe id ZX155001 is from this health-check thread which detected that the file lock monitor thread had not responded for 20 seconds.

I would appreciate if anyone can help me with this.

Thanks!

[DescripciÃ³n: DescripciÃ³: DescripciÃ³: DescripciÃ³: logo_claim] Jordi Saltiveri
Avda.Diagonal, 605, Planta 4Âª - 08028 Barcelona
E-mail: ***@everis.com<mailto:***@everis.com>

________________________________

AVISO DE CONFIDENCIALIDAD.
Este correo y la informaciÃ³n contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. everis informa a quien pueda haber recibido este correo por error que contiene informaciÃ³n confidencial cuyo uso, copia, reproducciÃ³n o distribuciÃ³n estÃ¡ expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminaciÃ³n sin copiarlo, imprimirlo o utilizarlo de ningÃºn modo.

CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. everis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail.

________________________________

List Archive<http://listserv.meduniwien.ac.at/archives/mqser-l.html> - Manage Your List Settings<http://listserv.meduniwien.ac.at/cgi-bin/wa?SUBED1=mqser-l&A=1> - Unsubscribe<mailto:***@LISTSERV.MEDUNIWIEN.AC.AT?subject=Unsubscribe&BODY=signoff%20mqseries>

Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com<http://www.lsoft.com/resources/manuals.asp>

________________________________
List Archive<http://listserv.meduniwien.ac.at/archives/mqser-l.html> - Manage Your List Settings<http://listserv.meduniwien.ac.at/cgi-bin/wa?SUBED1=mqser-l&A=1> - Unsubscribe<mailto:***@LISTSERV.MEDUNIWIEN.AC.AT?subject=Unsubscribe&BODY=signoff%20mqseries>

Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com<http://www.lsoft.com/resources/manuals.asp>

________________________________

AVISO DE CONFIDENCIALIDAD.
Este correo y la informaciÃ³n contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. everis informa a quien pueda haber recibido este correo por error que contiene informaciÃ³n confidencial cuyo uso, copia, reproducciÃ³n o distribuciÃ³n estÃ¡ expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminaciÃ³n sin copiarlo, imprimirlo o utilizarlo de ningÃºn modo.

CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. everis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail.

AJ Aronoff

2013-10-17 16:47:52 UTC

Permalink

Hello!

May I ask why you chose those mount options? (soft, nointr)
All the instructions I have seen recommend "hard,intr" as a mount option
example: Mount the file system: mount -t nfs4 -o hard,intr 9.122.163.105:/ /HA.
http://www.ibm.com/developerworks/websphere/library/techarticles/1006_sampige/1006_sampige.html

From the infocenter:
http://pic.dhe.ibm.com/infocenter/wmqv7/v7r5/index.jsp?topic=%2Fcom.ibm.mq.con.doc%2Fq018150_.htm
On UNIX and Linux systems, configure the shared file system on networked storage with a hard, interruptible, mount rather than a soft mount. A hard interruptible mount forces the queue manager to hang until it is interrupted by a system call. Soft mounts do not guarantee data consistency after a server failure.

Best Wishes and Cheers!
A.J. Aronoff
Connectivity Practice Director
Prolifics
Office: 646-201-4943 US
email: aj-***@public.gmane.org
IBM Award Winner for Technical Excellence, BPM, SOA, Portal and Rational
________________________________________
From: MQSeries List [MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org] On Behalf Of T.Rob [t.rob-CkT6zf+urXSzW/GOMZKyElesiRL1/***@public.gmane.org]
Sent: Thursday, October 17, 2013 9:09 AM
To: MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org
Subject: Re: Multi instance over NETAPP NFS NAS

I'm not familiar with NETAPP so I have to ask – does it really promise to preserve the locks during failover? As for the 20 seconds, the heartbeat is probably a blocking call that hangs for a short bit during the NETAPP failover, yes?

-- T.Rob

From: MQSeries List [mailto:MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org] On Behalf Of Jordi Saltiveri Jovells
Sent: Thursday, October 17, 2013 6:51 AM
To: MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org
Subject: Multi instance over NETAPP NFS NAS

Hello,

We have an issue and would appreciate some advice on what could be causing it.
We are implementing Multi instance queuemanager solution.

The environment have this specs:

• Virtual server on vmware host ESX 4 server

• OS: Red hat 5.9 Enterprise

• MQServer 7.5.0.1

• NFS FileSystem V4 on NetApp Release 8.0.2P4 7-Mode: Tue Nov 15 16:16:47 PST 2011

• NFS mount options: nfs4 bg,soft,nointr,timeo=300,retrans=5,rsize=65940,wsize=65940,noac 0 0

This issue occurs when the storage team wants to move the service at the other NETAPP active node using “takeover” option or "giveback" option, our MQ server loses the file System and switch to standby instance.

The next FDC was generated in the MQServer:

AMQ11708.0.FDC 2013/10/14 23:33:28.593536+2 Installation1 amqzxma0 11708 9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK

!AMQ11740.0.FDC 2013/10/14 23:33:31.073928+2 Installation1 PS006049 psiWaitPSModeChange

AMQ11571.0.FDC 2013/10/14 23:33:37.426462+2 Installation1 amqzxma0 11571 9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK

!AMQ11740.0.FDC 2013/10/14 23:33:42.944821+2 Installation1 PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:42.958556+2 Installation1 amqzmuf0 11740 9 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

!AMQ11740.0.FDC 2013/10/14 23:33:43.246717+2 Installation1 PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:43.261180+2 Installation1 amqzmuf0 11740 10 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

!AMQ11740.0.FDC 2013/10/14 23:33:43.382625+2 Installation1 PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:43.396977+2 Installation1 amqzmuf0 11740 11 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

AMQ11602.0.FDC 2013/10/14 23:35:01.528132+2 Installation1 amqzmuf0 11602 9 PS017094 psiProcessProxySubs MQRC_CONNECTION_BROKEN OK

!AMQ11602.0.FDC 2013/10/14 23:35:01.546072+2 Installation1 PS000035 psiReceivePublications

AMQ11708.0.FDC 2013/10/14 23:35:01.576255+2 Installation1 amqzxma0 11708 1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK

AMQ11571.0.FDC 2013/10/14 23:35:27.463749+2 Installation1 amqzxma0 11571 1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK

AMQ11735.0.FDC 2013/10/14 23:35:37.640100+2 Installation1 amqzmur0 11735 1 ZX086131 amqzmur0 zrcE_UNEXPECTED_ERROR OK

AMQ11735.0.FDC 2013/10/14 23:35:37.651677+2 Installation1 amqzmur0 11735 2 ZX086131 amqzmur0 OK OK

We know that file lock monitor thread stops the qmgr if it not responded for 20 seconds:

http://www-01.ibm.com/support/docview.wss?uid=swg21592501

The MQ queue manager runs a file lock monitor thread, which, every 10 seconds, checks that it has access to the resources it needs over the network. Another health-check thread monitors this one to determine whether it has hung, which is a sign that there is a network problem. The FDC with probe id ZX155001 is from this health-check thread which detected that the file lock monitor thread had not responded for 20 seconds.

I would appreciate if anyone can help me with this.

Thanks!

[cid:image001.gif-9m4K1VY6HDM+***@public.gmane.org] Jordi Saltiveri
Avda.Diagonal, 605, Planta 4ª - 08028 Barcelona
E-mail: jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org<mailto:jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org>

________________________________

AVISO DE CONFIDENCIALIDAD.
Este correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. everis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.

CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. everis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail.

________________________________
List Archive<http://listserv.meduniwien.ac.at/archives/mqser-l.html> - Manage Your List Settings<http://listserv.meduniwien.ac.at/cgi-bin/wa?SUBED1=mqser-l&A=1> - Unsubscribe<mailto:LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org?subject=Unsubscribe&BODY=signoff%20mqseries>

Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com<http://www.lsoft.com/resources/manuals.asp>

________________________________
List Archive<http://listserv.meduniwien.ac.at/archives/mqser-l.html> - Manage Your List Settings<http://listserv.meduniwien.ac.at/cgi-bin/wa?SUBED1=mqser-l&A=1> - Unsubscribe<mailto:LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org?subject=Unsubscribe&BODY=signoff%20mqseries>

Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com<http://www.lsoft.com/resources/manuals.asp>

To unsubscribe, write to LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org and,
in the message body (not the subject), write: SIGNOFF MQSERIES

d***@public.gmane.org

2013-10-17 20:05:28 UTC

Permalink

amqmfsck worked OK for me too, but the Netapp NFS still failed me with a locking error, causing my multi-instance QMs to "flip" to and fro across my 2 servers, but giving me an amusing story to tell at conferences... so now I still have the QMs defined as multi-instance but I start them manually on the "other" server, instead of starting them with the "automatic" failover option

----- Original Message -----

From: "Jordi Saltiveri Jovells" <jordi.saltiveri.jovells-***@public.gmane.org>
To: MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org
Sent: Thursday, October 17, 2013 11:57:17 AM
Subject: Re: Multi instance over NETAPP NFS NAS

Hi T-Rob , I sent your comment to storage team.

Hatcher , the amqmfsck works ok:

1.

amqmfsck /MQHA/qmgrs/COI01E01

The tests on the directory completed successfully.

2.

[SERVER1].mqmadmin:/opt/mqm/bin > ./amqmfsck -c /MQHA/qmgrs/COI01E01

Start a second copy of this program with the same parameters on another server.

Writing to test file. This will normally complete within about 60 seconds.

.......................................

The tests on the directory completed successfully.

[SERVER2].mqmadmin:/opt/mqm/bin > ./amqmfsck -c /MQHA/qmgrs/COI01E01

Writing to test file. This will normally complete within about 60 seconds.

.......................

The tests on the directory completed successfully.

3.

[SERVER1].mqmadmin:/opt/mqm/bin > ./amqmfsck -w /MQHA/qmgrs/COI01E01

Start a second copy of this program with the same parameters on another server.

File lock acquired.

Press Enter or terminate this process to release the lock.

File lock released.

The tests on the directory completed successfully.

[SERVER2].mqmadmin:/opt/mqm/bin > ./amqmfsck -w /MQHA/qmgrs/COI01E01

Waiting for the file lock.

Waiting for the file lock.

Waiting for the file lock.

Waiting for the file lock.

Waiting for the file lock.

Waiting for the file lock.

File lock acquired.

Press Enter or terminate this process to release the lock.

File lock released.

The tests on the directory completed successfully .

Thanks for all.

Jordi Saltiveri
Avda.Diagonal, 605, Planta 4Âª - 08028 Barcelona
E-mail: jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org

De: MQSeries List [mailto:MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org] En nombre de Hatcher Jeter
Enviado el: jueves, 17 de octubre de 2013 16:35
Para: MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org
Asunto: Re: Multi instance over NETAPP NFS NAS

I tried a NETAPP appliance at a customer and they FSCheck never worked. Is the fschk working for you?

Hatcher H. Jeter, Jr.
Domain Architect - Integration Technologies
IBM Practice
Avnet Services

Mobile: 804-334-4750
hatcher.jeter-***@public.gmane.org
www.atech.com

This message, including files attached to it, may contain confidential information that is intended only for the use of the ADDRESSEES(S) named above. If you are not an intended recipient, you are hereby notified that any dissemination or copying of the information contained in this message, or the taking of any action in reliance upon the information, is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete the message from your system. Thank You.

From:

"T.Rob" < t.rob-CkT6zf+urXSzW/GOMZKyElesiRL1/***@public.gmane.org >

To:

MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org ,

Date:

10/17/2013 10:32 AM

Subject:

Re: Multi instance over NETAPP NFS NAS

Sent by:

MQSeries List < MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org >

I'm not familiar with NETAPP so I have to ask â does it really promise to preserve the locks during failover? As for the 20 seconds, the heartbeat is probably a blocking call that hangs for a short bit during the NETAPP failover, yes?

-- T.Rob

From: MQSeries List [ mailto:MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org ] On Behalf Of Jordi Saltiveri Jovells
Sent: Thursday, October 17, 2013 6:51 AM
To: MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org
Subject: Multi instance over NETAPP NFS NAS

Hello,

We have an issue and would appreciate some advice on what could be causing it.
We are implementing Multi instance queuemanager solution.

The environment have this specs:

Â· Virtual server on vmware host ESX 4 server
Â· OS: Red hat 5.9 Enterprise
Â· MQServer 7.5.0.1
Â· NFS FileSystem V4 on NetApp Release 8.0.2P4 7-Mode: Tue Nov 15 16:16:47 PST 2011
Â· NFS mount options: nfs4 bg,soft,nointr,timeo=300,retrans=5,rsize=65940,wsize=65940,noac 0 0

This issue occurs when the storage team wants to move the service at the other NETAPP active node using âtakeoverâ option or "giveback" option, our MQ server loses the file System and switch to standby instance.

The next FDC was generated in the MQServer:

AMQ11708.0.FDC 2013/10/14 23:33:28 .593536+2 Installation1 amqzxma0 11708 9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK
!AMQ11740.0.FDC 2013/10/14 23:33:31.073928+2 Installation1 PS006049 psiWaitPSModeChange
AMQ11571.0.FDC 2013/10/14 23:33:37 .426462+2 Installation1 amqzxma0 11571 9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK
!AMQ11740.0.FDC 2013/10/14 23:33:42 .944821+2 Installation1 PS058054 psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:42.958556+2 Installation1 amqzmuf0 11740 9 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
!AMQ11740.0.FDC 2013/10/14 23:33:43.246717+2 Installation1 PS058054 psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:43.261180+2 Installation1 amqzmuf0 11740 10 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
!AMQ11740.0.FDC 2013/10/14 23:33:43.382625+2 Installation1 PS058054 psiStopAllTasks
AMQ11740.0.FDC 2013/10/14 23:33:43.396977+2 Installation1 amqzmuf0 11740 11 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK
AMQ11602.0.FDC 2013/10/14 23:35:01 .528132+2 Installation1 amqzmuf0 11602 9 PS017094 psiProcessProxySubs MQRC_CONNECTION_BROKE N OK
!AMQ11602.0.FDC 2013/10/14 23:35:01.546072+2 Installation1 PS000035 psiReceivePublications
AMQ11708.0.FDC 2013/10/14 23:35:01.576255+2 Installation1 amqzxma0 11708 1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK
AMQ11571.0.FDC 2013/10/14 23:35:27.463749+2 Installation1 amqzxma0 11571 1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK
AMQ11735.0.FDC 2013/10/14 23:35:37.640100+2 Installation1 amqzmur0 11735 1 ZX086131 amqzmur0 zrcE_UNEXPECTED_ERROR OK
AMQ11735.0.FDC 2013/10/14 23:35:37 .651677+2 Installation1 amqzmur0 11735 2 ZX086131 amqzmur0 OK OK

We know that file lock monitor thread stops the qmgr if it not responded for 20 seconds:

http://www-01.ibm.com/support/docview.wss?uid=swg21592501

The MQ queue manager runs a file lock monitor thread, which, every 10 seconds, checks that it has access to the resources it needs over the network. Another health-check thread monitors this one to determine whether it has hung, which is a sign that there is a network problem. The FDC with probe id ZX155001 is from this health-check thread which detected that the file lock monitor thread had not responded for 20 seconds.

I would appreciate if anyone can help me with this.

Thanks!

Jordi Saltiveri
Avda.Diagonal, 605, Planta 4Âª - 08028 Barcelona
E-mail: jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org

AVISO DE CONFIDENCIALIDAD.
Este correo y la informaciÃ³n contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. everis informa a quien pueda haber recibido este correo por error que contiene informaciÃ³n confidencial cuyo uso, copia, reproducciÃ³n o distribuciÃ³n estÃ¡ expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminaciÃ³n sin copiarlo, imprimirlo o utilizarlo de ningÃºn modo.

CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. everis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail.

List Archive - Manage Your List Settings - Unsubscribe

Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com

List Archive - Manage Your List Settings - Unsubscribe

Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com

AVISO DE CONFIDENCIALIDAD.
Este correo y la informaciÃ³n contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. everis informa a quien pueda haber recibido este correo por error que contiene informaciÃ³n confidencial cuyo uso, copia, reproducciÃ³n o distribuciÃ³n estÃ¡ expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminaciÃ³n sin copiarlo, imprimirlo o utilizarlo de ningÃºn modo.

CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. everis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail.

To unsubscribe, write to LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org and,
in the message body (not the subject), write: SIGNOFF MQSERIES
Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com
Archive: http://listserv.meduniwien.ac.at/archives/mqser-l.html

George Carey

2013-10-22 04:38:55 UTC

Permalink

Try changing the NFS mount option to 'hard' as noted by A.J. Aronoff and see
if any different luck. Hard will just keep on trying to connect.

http://linux.die.net/man/5/nfs

"soft/hard
Determines the recovery behavior of the NFS client after an NFS request
times out. If neither option is specified (or if the hard option is
specified), NFS requests are retried indefinitely. If the soft option is
specified, then the NFS client fails an NFS request after retrans
retransmissions have been sent, causing the NFS client to return an error to
the calling application.

NB: A so-called "soft" timeout can cause silent data corruption in certain
cases. As such, use the soft option only when client responsiveness is more
important than data integrity. Using NFS over TCP or increasing the value of
the retrans option may mitigate some of the risks of using the soft option."

If multi-instance is working normally as assumed but not, only when service
move is made than likely the details for DataOntap configurations of the HA
node pairs and their connected controllers are setup in such a manner as to
impact the MQ file system on the partner node when doing the takeover or
giveback. The configuration details should allow these service moves to be
transparent to the MQ applications with the NFS client to the NETAPP shared
mounted file system. The point being your storage team needs to make this
so. Not an MQ issue or NFS issue. See DataOntap doco for possible insights
on disruptive and non-disruptive takeover and with aggregate relocations,
etc. for what might be causing that impact.

https://library.netapp.com/ecm/ecm_download_file/ECMP1196905

GTC

From: MQSeries List [mailto:MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org] On Behalf Of
Jordi Saltiveri Jovells
Sent: Thursday, October 17, 2013 6:51 AM
To: MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org
Subject: Multi instance over NETAPP NFS NAS

Hello,

We have an issue and would appreciate some advice on what could be causing
it.
We are implementing Multi instance queuemanager solution.

The environment have this specs:

Virtual server on vmware host ESX 4 server

OS: Red hat 5.9 Enterprise

MQServer 7.5.0.1

NFS FileSystem V4 on NetApp Release 8.0.2P4 7-Mode: Tue Nov 15
16:16:47 PST 2011

NFS mount options: nfs4
bg,soft,nointr,timeo=300,retrans=5,rsize=65940,wsize=65940,noac 0 0

This issue occurs when the storage team wants to move the service at the
other NETAPP active node using takeover option or "giveback" option, our
MQ server loses the file System and switch to standby instance.

The next FDC was generated in the MQServer:

AMQ11708.0.FDC 2013/10/14 23:33:28.593536+2 Installation1 amqzxma0 11708 9
ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK

!AMQ11740.0.FDC 2013/10/14 23:33:31.073928+2 Installation1
PS006049 psiWaitPSModeChange

AMQ11571.0.FDC 2013/10/14 23:33:37.426462+2 Installation1 amqzxma0 11571
9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK

!AMQ11740.0.FDC 2013/10/14 23:33:42.944821+2 Installation1
PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:42.958556+2 Installation1 amqzmuf0 11740
9 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

!AMQ11740.0.FDC 2013/10/14 23:33:43.246717+2 Installation1
PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:43.261180+2 Installation1 amqzmuf0 11740
10 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

!AMQ11740.0.FDC 2013/10/14 23:33:43.382625+2 Installation1
PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:43.396977+2 Installation1 amqzmuf0 11740
11 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

AMQ11602.0.FDC 2013/10/14 23:35:01.528132+2 Installation1 amqzmuf0 11602
9 PS017094 psiProcessProxySubs MQRC_CONNECTION_BROKEN OK

!AMQ11602.0.FDC 2013/10/14 23:35:01.546072+2 Installation1
PS000035 psiReceivePublications

AMQ11708.0.FDC 2013/10/14 23:35:01.576255+2 Installation1 amqzxma0 11708
1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK

AMQ11571.0.FDC 2013/10/14 23:35:27.463749+2 Installation1 amqzxma0 11571
1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK

AMQ11735.0.FDC 2013/10/14 23:35:37.640100+2 Installation1 amqzmur0 11735
1 ZX086131 amqzmur0 zrcE_UNEXPECTED_ERROR OK

AMQ11735.0.FDC 2013/10/14 23:35:37.651677+2 Installation1 amqzmur0 11735
2 ZX086131 amqzmur0 OK OK

We know that file lock monitor thread stops the qmgr if it not responded for
20 seconds:

http://www-01.ibm.com/support/docview.wss?uid=swg21592501

The MQ queue manager runs a file lock monitor thread, which, every 10
seconds, checks that it has access to the resources it needs over the
network. Another health-check thread monitors this one to determine whether
it has hung, which is a sign that there is a network problem. The FDC with
probe id ZX155001 is from this health-check thread which detected that the
file lock monitor thread had not responded for 20 seconds.

I would appreciate if anyone can help me with this.

Thanks!

[cid:image001.gif-9m4K1VY6HDM+***@public.gmane.org] Jordi Saltiveri
Avda.Diagonal, 605, Planta 4ª - 08028 Barcelona
E-mail: jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org<mailto:jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org>

________________________________

AVISO DE CONFIDENCIALIDAD.
Este correo y la información contenida o adjunta al mismo es privada y
confidencial y va dirigida exclusivamente a su destinatario. everis informa
a quien pueda haber recibido este correo por error que contiene información
confidencial cuyo uso, copia, reproducción o distribución está expresamente
prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por
error, le rogamos lo ponga en conocimiento del emisor y proceda a su
eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.

CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private
and confidential and intended exclusively for the addressee. everis informs
to whom it may receive it in error that it contains privileged information
and its use, copy, reproduction or distribution is prohibited. If you are
not an intended recipient of this E-mail, please notify the sender, delete
it and do not read, act upon, print, disclose, copy, retain or redistribute
any portion of this E-mail.

________________________________
List Archive<http://listserv.meduniwien.ac.at/archives/mqser-l.html> -
Manage Your List
Settings<http://listserv.meduniwien.ac.at/cgi-bin/wa?SUBED1=mqser-l&A=1> -
Unsubscribe<mailto:LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org?subject=Unsubscribe&BO
DY=signoff%20mqseries>

Instructions for managing your mailing list subscription are provided in the
Listserv General Users Guide available at
http://www.lsoft.com<http://www.lsoft.com/resources/manuals.asp>

________________________________
List Archive<http://listserv.meduniwien.ac.at/archives/mqser-l.html> -
Manage Your List
Settings<http://listserv.meduniwien.ac.at/cgi-bin/wa?SUBED1=mqser-l&A=1> -
Unsubscribe<mailto:LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org?subject=Unsubscribe&BO
DY=signoff%20mqseries>

Instructions for managing your mailing list subscription are provided in the
Listserv General Users Guide available at
http://www.lsoft.com<http://www.lsoft.com/resources/manuals.asp>

To unsubscribe, write to LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org and, in the
message body (not the subject), write: SIGNOFF MQSERIES Instructions for
managing your mailing list subscription are provided in the Listserv General
Users Guide available at http://www.lsoft.com
Archive: http://listserv.meduniwien.ac.at/archives/mqser-l.html

To unsubscribe, write to LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org and,
in the message body (not the subject), write: SIGNOFF MQSERIES
Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com
Archive: http://listserv.meduniwien.ac.at/archives/mqser-l.html

Jordi Saltiveri Jovells

2013-10-22 08:50:20 UTC

Permalink

Hi to all,

The storage team will change the next parameters to try to resolve the crashes:

options nfs.v4.lease_seconds number_of_seconds -> 20
options locking.grace_lease_seconds number_of_seconds -> 30
And we will change the NFS option to:

rw,vers=4,rsize=65536,wsize=65536,hard,intr,proto=tcp,timeo=600,retrans=3,sec=sys

I hope these changes could correct and resolve this particular incident.

Thanks for all!

[Descripción: Descripció: Descripció: Descripció: logo_claim] Jordi Saltiveri
Avda.Diagonal, 605, Planta 4ª - 08028 Barcelona
E-mail: jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org<mailto:jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org>
Web: www.everis.cat<http://www.everis.cat/>

De: MQSeries List [mailto:MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org] En nombre de George Carey
Enviado el: martes, 22 de octubre de 2013 06:39
Para: MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org
Asunto: Re: Multi instance over NETAPP NFS NAS

Try changing the NFS mount option to 'hard' as noted by A.J. Aronoff and see if any different luck. Hard will just keep on trying to connect.

http://linux.die.net/man/5/nfs

"soft/hard

Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application.

NB: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option."

If multi-instance is working normally as assumed but not, only when service move is made than likely the details for DataOntap configurations of the HA node pairs and their connected controllers are setup in such a manner as to impact the MQ file system on the partner node when doing the takeover or giveback. The configuration details should allow these service moves to be transparent to the MQ applications with the NFS client to the NETAPP shared mounted file system. The point being your storage team needs to make this so. Not an MQ issue or NFS issue. See DataOntap doco for possible insights on disruptive and non-disruptive takeover and with aggregate relocations, etc. for what might be causing that impact.

https://library.netapp.com/ecm/ecm_download_file/ECMP1196905

GTC

From: MQSeries List [mailto:MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org] On Behalf Of Jordi Saltiveri Jovells

Sent: Thursday, October 17, 2013 6:51 AM

To: MQSERIES-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org<mailto:MQSERIES-0lvw86wZMd9k/***@public.gmane.orgAC.AT>

Subject: Multi instance over NETAPP NFS NAS

Hello,

We have an issue and would appreciate some advice on what could be causing it.

We are implementing Multi instance queuemanager solution.

The environment have this specs:

* Virtual server on vmware host ESX 4 server

* OS: Red hat 5.9 Enterprise

* MQServer 7.5.0.1

* NFS FileSystem V4 on NetApp Release 8.0.2P4 7-Mode: Tue Nov 15 16:16:47 PST 2011

* NFS mount options: nfs4 bg,soft,nointr,timeo=300,retrans=5,rsize=65940,wsize=65940,noac 0 0

This issue occurs when the storage team wants to move the service at the other NETAPP active node using "takeover" option or "giveback" option, our MQ server loses the file System and switch to standby instance.

The next FDC was generated in the MQServer:

AMQ11708.0.FDC 2013/10/14 23:33:28.593536+2 Installation1 amqzxma0 11708 9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK

!AMQ11740.0.FDC 2013/10/14 23:33:31.073928+2 Installation1 PS006049 psiWaitPSModeChange

AMQ11571.0.FDC 2013/10/14 23:33:37.426462+2 Installation1 amqzxma0 11571 9 ZX155001 zxcFileLockMonitorThread lrcE_S_Q_MGR_UNRESPONSIVE OK

!AMQ11740.0.FDC 2013/10/14 23:33:42.944821+2 Installation1 PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:42.958556+2 Installation1 amqzmuf0 11740 9 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

!AMQ11740.0.FDC 2013/10/14 23:33:43.246717+2 Installation1 PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:43.261180+2 Installation1 amqzmuf0 11740 10 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

!AMQ11740.0.FDC 2013/10/14 23:33:43.382625+2 Installation1 PS058054 psiStopAllTasks

AMQ11740.0.FDC 2013/10/14 23:33:43.396977+2 Installation1 amqzmuf0 11740 11 PS058099 psiStopAllTasks frcW_QUIESCE_FAILED OK

AMQ11602.0.FDC 2013/10/14 23:35:01.528132+2 Installation1 amqzmuf0 11602 9 PS017094 psiProcessProxySubs MQRC_CONNECTION_BROKEN OK

!AMQ11602.0.FDC 2013/10/14 23:35:01.546072+2 Installation1 PS000035 psiReceivePublications

AMQ11708.0.FDC 2013/10/14 23:35:01.576255+2 Installation1 amqzxma0 11708 1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK

AMQ11571.0.FDC 2013/10/14 23:35:27.463749+2 Installation1 amqzxma0 11571 1 ZX005025 zxcProcessChildren zrcX_PROCESS_MISSING OK

AMQ11735.0.FDC 2013/10/14 23:35:37.640100+2 Installation1 amqzmur0 11735 1 ZX086131 amqzmur0 zrcE_UNEXPECTED_ERROR OK

AMQ11735.0.FDC 2013/10/14 23:35:37.651677+2 Installation1 amqzmur0 11735 2 ZX086131 amqzmur0 OK OK

We know that file lock monitor thread stops the qmgr if it not responded for 20 seconds:

http://www-01.ibm.com/support/docview.wss?uid=swg21592501

The MQ queue manager runs a file lock monitor thread, which, every 10 seconds, checks that it has access to the resources it needs over the network. Another health-check thread monitors this one to determine whether it has hung, which is a sign that there is a network problem. The FDC with probe id ZX155001 is from this health-check thread which detected that the file lock monitor thread had not responded for 20 seconds.

I would appreciate if anyone can help me with this.

Thanks!

[cid:image001.gif-9m4K1VY6HDM+***@public.gmane.org] Jordi Saltiveri

Avda.Diagonal, 605, Planta 4ª - 08028 Barcelona

E-mail: jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org<mailto:jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org<mailto:***@everis.com%3cmailto:jsaltive-GH7bCjKZcdLQT0dZR+***@public.gmane.org>>

________________________________

AVISO DE CONFIDENCIALIDAD.

Este correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. everis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.

CONFIDENTIALITY WARNING.

This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. everis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail.

________________________________

List Archive<http://listserv.meduniwien.ac.at/archives/mqser-l.html> - Manage Your List Settings<http://listserv.meduniwien.ac.at/cgi-bin/wa?SUBED1=mqser-l&A=1> - Unsubscribe<mailto:LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org?subject=Unsubscribe&BODY=signoff%20mqseries>

Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com<http://www.lsoft.com/resources/manuals.asp<http://www.lsoft.com%3chttp:/www.lsoft.com/resources/manuals.asp>>

________________________________

List Archive<http://listserv.meduniwien.ac.at/archives/mqser-l.html> - Manage Your List Settings<http://listserv.meduniwien.ac.at/cgi-bin/wa?SUBED1=mqser-l&A=1> - Unsubscribe<mailto:LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org?subject=Unsubscribe&BODY=signoff%20mqseries>

Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com<http://www.lsoft.com/resources/manuals.asp<http://www.lsoft.com%3chttp:/www.lsoft.com/resources/manuals.asp>>

To unsubscribe, write to LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org<mailto:***@LISTSERV.MEDUNIWIEN.AC.AT> and, in the message body (not the subject), write: SIGNOFF MQSERIES Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com

Archive: http://listserv.meduniwien.ac.at/archives/mqser-l.html

________________________________
List Archive<http://listserv.meduniwien.ac.at/archives/mqser-l.html> - Manage Your List Settings<http://listserv.meduniwien.ac.at/cgi-bin/wa?SUBED1=mqser-l&A=1> - Unsubscribe<mailto:LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org?subject=Unsubscribe&BODY=signoff%20mqseries>

Instructions for managing your mailing list subscription are provided in the Listserv General Users Guide available at http://www.lsoft.com<http://www.lsoft.com/resources/manuals.asp>

________________________________

AVISO DE CONFIDENCIALIDAD.
Este correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. everis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.

CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. everis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail.

To unsubscribe, write to LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org and,
in the message body (not the subject), write: SIGNOFF MQSERIES
Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com
Archive: http://listserv.meduniwien.ac.at/archives/mqser-l.html