<< Back to previous view

[QB-3842] How to trace the agent load history
Created: 15/Mar/22  Updated: 08/Jan/23

Status: Closed
Project: QuickBuild
Component/s: None
Affects Version/s: 11.0.26
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Bin Wu Assigned To: Robin Shen
Resolution: Won't Fix Votes: 0
Remaining Estimate: Unknown Time Spent: Unknown
Original Estimate: Unknown


 Description   
We always meet some system hang problem it not cause by QB
It maybe cause by some build order
So we want to query the agent load history to identify them
But we try to query from server log and agent log couldn't find them
Can you help to point which way to catch them?

 Comments   
Comment by Robin Shen [ 15/Mar/22 02:23 PM ]
The load history is not available yet. You may however check the agent log to see if there is any clue.
Comment by Bin Wu [ 16/Mar/22 01:01 AM ]
I can check with system log to know it run the test, but don't know which test
check with QB agent log just have some error like QB connect error
Comment by Robin Shen [ 16/Mar/22 10:47 AM ]
Can you please show me the error?
Comment by Bin Wu [ 16/Mar/22 11:02 AM ]
Some error like this, it is not match the test build time tag, it maybe cause by the network
or some cancel failed:

========
2022-02-28 13:34:27,385 [Thread-12] ERROR com.pmease.quickbuild.Quickbuild - Error connecting server.
    com.caucho.hessian.client.HessianRuntimeException: com.caucho.hessian.client.HessianRuntimeException: Error connecting 'http://servername:8810/service/connect'
        at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:285)
        at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171)
        at com.sun.proxy.$Proxy19.connect(Unknown Source)
        at com.pmease.quickbuild.grid.AgentConnectivityTask.run(AgentConnectivityTask.java:62)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: com.caucho.hessian.client.HessianRuntimeException: Error connecting 'http://servername:8810/service/connect'
        at com.caucho.hessian.client.HessianURLConnection.getOutputStream(HessianURLConnection.java:103)
        at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:283)
        ... 4 more
    Caused by: java.net.SocketTimeoutException: connect timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:607)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
        at sun.net.www.http.HttpClient.&lt;init&gt;(HttpClient.java:242)
        at sun.net.www.http.HttpClient.New(HttpClient.java:339)
        at sun.net.www.http.HttpClient.New(HttpClient.java:357)
        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1223)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1337)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1312)
        at com.caucho.hessian.client.HessianURLConnection.getOutputStream(HessianURLConnection.java:99)
        ... 5 more

=======
2022-03-15 13:02:09,837 [pool-2-thread-1282] WARN com.pmease.quickbuild.grid.NodeServiceImpl - testGridJob is cancelling job '41ce1f76-80e9-4b55-9baf-91f57379facf'...
2022-03-15 13:02:09,838 [pool-2-thread-1282] WARN com.pmease.quickbuild.grid.GridTaskFuture - Job still exists on job node and cancel command is issued (job id: 41ce1f76-80e9-4b55-9baf-91f57379facf, build id: 661422, job node: test-agent-1:8812)...
Comment by Robin Shen [ 16/Mar/22 11:28 PM ]
This seems to me like a network issue. When this happens, please login to the agent machine, and run below command to see if it works:
telnet servername 8810
Generated at Fri Apr 19 08:48:25 UTC 2024 using JIRA 189.