Connection, websocket, and I/O errors on Jitterbit private agents using Azure VMs
Overview
This page provides instructions on troubleshooting a Linux or Windows private agent installed on a Microsoft Azure virtual machine (VM). (See Private agent performance tuning for general performance tuning information.)
Troubleshoot lost connections
When using a private agent installed on a Microsoft Azure VM, you may experience lost connections. Azure sets the WebSocket idle timeout to 4 minutes, while the private agent default to ping Harmony is set to 5 minutes. To resolve this issue, reduce the interval for the agent heartbeat:
-
Open the
jitterbit-agent-config.properties
file in a text editor. This file can be found in these directories:-
Linux:
<JITTERBIT_HOME>/Resources/
-
Windows:
C:\Program Files\Jitterbit Agent\Resources
-
-
Find the
agent.heart.beat.interval
setting:#Agent heart beat interval (IN MINUTES) agent.heart.beat.interval=5
-
Change the setting to
agent.heart.beat.interval=3
. -
Save the changes and restart the agent.
Troubleshoot websocket and I/O errors
Important
Plan for the following steps to take over 30 minutes to complete.
Errors related to WebSocket and I/O can be resolved with updates to the VM's associated IP idle timeout, network address translation (NAT) gateway TCP idle timeout, and virtual network (VNET) flow timeout settings.
The IP idle timeout, NAT gateway TCP idle timeout, and VNET flow timeout values all must be set to 15 minutes.
Identify relevant errors
WebSocket and I/O errors can be identified by referencing the operation logs and the jitterbit-agent.log
file. This log file can be found in one of the following locations:
-
For Windows:
C:\Program Files (x86)\Jitterbit Agent\log\jitterbit-agent.log
. -
For Linux:
/opt/jitterbit/log/jitterbit-agent.log
.
Operation log errors
If present, any of the following messages in the operation log details for an operation with an Error status can be indicative of a WebSocket or I/O error:
The operation "Example Operation" completed successfully.
No message found while removing message in cache for: Message Info: AgentId: 000001 AgentGroupId: 000001 MessageId: XXX Message Version (Agent): XXXX Message Version (Harmony): XXX Counter (Harmony): 1 Submitted Timestamp (Harmony):2024-01-20 11:55:00.700 , message will be retried later OperationInstanceGUID: XXX
Run message could not reach the agent.
Agent log file errors
If present, any of the following messages in the jitterbit-agent.log
file can be indicative of a WebSocket or I/O error:
2024-01-20 12:00:00 request handler thread #10642 INFO org.jitterbit.integration.server.api.util.AgentRetryExecutor:53 - Agent Message Receipt (OperationInstanceGUID: XXX) failed. Retrying....
2024-01-20 12:00:00 request handler thread #10642 ERROR org.jitterbit.integration.server.api.util.AgentRetryExecutor:55 - org.springframework.web.client.ResourceAccessException: I/O error on PUT request for "https://na-east.jitterbit.com/jitterbit-cloud-restful-service/agent/ackmsgreceipt": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out
E:2024-01-20 12:00:00 request handler thread #884 ERROR org.jitterbit.integration.server.messaging.agent.listener.AgentMessageListener:231 - No message found while removing message in cache for: Message Info: AgentId: 000001 AgentGroupId: 000001 MessageId: XXX Message Version (Agent): XXXX Message Version (Harmony): XXX Counter (Harmony): 1 Submitted Timestamp (Harmony):2024-01-20 11:55:00.700 , message will be retried later OperationInstanceGUID: XXX
Important
Continue only if a WebSocket or I/O error was identified in either the operation logs or agent logs based on the above criteria.
Drain stop the agent
Drain stop the agent before updating any timeout settings. If you have more than one agent in the affected agent group, do the same for all of them.
Isolate agent resources
It is recommended that the agent's VM and its associated resources are separated into their own resource group in Azure. This includes its VNET, IP, NAT gateway, network interface (NIC), and network security group (NSG), if present.
Update the IP's idle timeout
-
In the Azure portal, navigate to the resource group associated with the agent's VM.
-
Identify and click the IP item associated with the VM:
-
Click Configuration and change the Idle timeout (minutes) value to 15 minutes:
Update the NAT gateway's TCP idle timeout
-
In the Azure portal, navigate to the resource group associated with the agent's VM.
-
Identify and click the NAT gateway item associated with the VM and IP, if present. An associated NAT gateway will also be listed in the IP item's Overview next to the Associated to field.
-
Click Configuration and change the TCP idle timeout (minutes) value to 15 minutes.
Update the VNET's flow timeout
-
In the Azure portal, navigate to the resource group associated with the agent's VM.
-
Identify and click the VNET item associated with the VM:
-
In Overview, click Configure next to Flow timeout:
-
In the Flow timeout pane, enable the Enable flow timeout setting and change the Flow timeout (minutes) value to 15 minutes:
-
Click Save.
Restart the agent
-
In the Azure portal, restart the agent's VM.
-
Restart the stopped agent. See either Restart a Windows agent or Restart a Linux agent for detailed information.