Tim Zielke
2014-05-12 19:54:09 UTC
Hello,
I was curious if anyone else has run across something similar to what we experienced last week with a client application connecting to a remote queue manager, or had any insight into the issue.
Configuration:
We have a client application at 7.0.1 on a Solaris 10 non-global zone (server A) connecting into a remote queue manager at 7.1.0.2 on a Solaris 10 non-global zone (server B).
Issue:
Between 5/4 - 5/11, the client application on A would time out with a 2059, while trying to connect to the queue manager on B. The connection issue was traced with client app and queue manager traces, and this was reviewed by IBM level 2 support. They found from the client trace that the client app was correctly resolving the ip address of the host system, sending an initial TSH, going into receive mode, and then timing out with no response back from the queue manager. On the queue manager side, there was no evidence of any activity for a connection request for this client channel. This is consistent with what I was seeing with a DIS CHSTATUS on the queue manager side during a connection attempt, where nothing was being reported.
A TCP stack investigation of servers A and B during a client connection attempt showed an established connection between A -> B:1414 on both the TCP stack on A and B. What was odd was that on the TCP stack on B, we would also see two more sockets for A-> B:1414 in a FIN_WAIT_2 state. A socket goes into a FIN_WAIT_2 when an the application actively closes the socket, so this presumably was being initiated by the queue manager. During the client connection attempt, neither of the TCP receive or send queues in the established connection between A -> B:1414 showed any bytes pending on both TCP stacks. So no evidence of ACK packets being blocked, for example.
We were also able to successfully telnet from A -> B:1414.
We also recycled the queue manager on server B once this issue started, and this did not correct the issue.
When I checked on this client app this morning, everything was magically working again. Neither server A or B were rebooted. The odd FIN_WAIT_2 sockets are now gone in the TCP stack on server B, and I only see the one expected established connection of A -> B:1414 in the server B TCP stack.
We are suspecting some kind of underlying network issue, but I doubt we will be able to uncover what the issue was. Has anyone else seen this type of behavior before, or have any insights on what could have been happening here?
Thanks,
Tim
To unsubscribe, write to LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org and,
in the message body (not the subject), write: SIGNOFF MQSERIES
Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com
Archive: http://listserv.meduniwien.ac.at/archives/mqser-l.html
I was curious if anyone else has run across something similar to what we experienced last week with a client application connecting to a remote queue manager, or had any insight into the issue.
Configuration:
We have a client application at 7.0.1 on a Solaris 10 non-global zone (server A) connecting into a remote queue manager at 7.1.0.2 on a Solaris 10 non-global zone (server B).
Issue:
Between 5/4 - 5/11, the client application on A would time out with a 2059, while trying to connect to the queue manager on B. The connection issue was traced with client app and queue manager traces, and this was reviewed by IBM level 2 support. They found from the client trace that the client app was correctly resolving the ip address of the host system, sending an initial TSH, going into receive mode, and then timing out with no response back from the queue manager. On the queue manager side, there was no evidence of any activity for a connection request for this client channel. This is consistent with what I was seeing with a DIS CHSTATUS on the queue manager side during a connection attempt, where nothing was being reported.
A TCP stack investigation of servers A and B during a client connection attempt showed an established connection between A -> B:1414 on both the TCP stack on A and B. What was odd was that on the TCP stack on B, we would also see two more sockets for A-> B:1414 in a FIN_WAIT_2 state. A socket goes into a FIN_WAIT_2 when an the application actively closes the socket, so this presumably was being initiated by the queue manager. During the client connection attempt, neither of the TCP receive or send queues in the established connection between A -> B:1414 showed any bytes pending on both TCP stacks. So no evidence of ACK packets being blocked, for example.
We were also able to successfully telnet from A -> B:1414.
We also recycled the queue manager on server B once this issue started, and this did not correct the issue.
When I checked on this client app this morning, everything was magically working again. Neither server A or B were rebooted. The odd FIN_WAIT_2 sockets are now gone in the TCP stack on server B, and I only see the one expected established connection of A -> B:1414 in the server B TCP stack.
We are suspecting some kind of underlying network issue, but I doubt we will be able to uncover what the issue was. Has anyone else seen this type of behavior before, or have any insights on what could have been happening here?
Thanks,
Tim
To unsubscribe, write to LISTSERV-0lvw86wZMd9k/bWDasg6f+***@public.gmane.org and,
in the message body (not the subject), write: SIGNOFF MQSERIES
Instructions for managing your mailing list subscription are provided in
the Listserv General Users Guide available at http://www.lsoft.com
Archive: http://listserv.meduniwien.ac.at/archives/mqser-l.html