<< Back to previous view

[QB-2708] Builds are cancelled randomly
Created: 03/May/16  Updated: 01/Dec/16

Status: Closed
Project: QuickBuild
Component/s: None
Affects Version/s: 6.1.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Erki Männiste Assigned To: Robin Shen
Resolution: Cannot Reproduce Votes: 0
Remaining Estimate: Unknown Time Spent: Unknown
Original Estimate: Unknown
Environment: Linux 3.2.0-4-amd64
Java HotSpot(TM) 64-Bit Server VM 1.8.0_45


 Description   
We are lately seeing a lot of our builds getting CANCELLED. We don't have a 100% repro, but it happens quite often and usually under heavy link/compilation load.
Build log says:

21:00:23,539 ERROR - Step 'master>Compile' is failed.
    java.lang.RuntimeException: java.lang.InterruptedException: Composite step 'Compile' is cancelled.
        at com.pmease.quickbuild.stepsupport.CompositeStep.run(CompositeStep.java:121)
        at com.pmease.quickbuild.stepsupport.Step.execute(Step.java:539)
        at com.pmease.quickbuild.stepsupport.StepExecutionJob.executeStepAwareJob(StepExecutionJob.java:30)
        at com.pmease.quickbuild.stepsupport.StepAwareJob.executeBuildAwareJob(StepAwareJob.java:45)
        at com.pmease.quickbuild.BuildAwareJob.execute(BuildAwareJob.java:60)
        at com.pmease.quickbuild.grid.GridJob.run(GridJob.java:106)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
    Caused by: java.lang.InterruptedException: Composite step 'Group Compile' is cancelled.
        ... 11 more

I don't find anything with corresponding timestamps in agent log. We have investigated possible causes in our build infra, but didn't find anything. Do you have any ideas what might cause this?

Not sure if that's relevant, but build agents do log following errors every now and then:

2016-05-02 20:57:24,024 [Thread-14] ERROR com.pmease.quickbuild.Quickbuild - Error connecting server.
    com.caucho.hessian.client.HessianRuntimeException: java.net.SocketException: Software caused connection abort: recv failed
        at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:285)
        at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171)
        at com.sun.proxy.$Proxy19.connect(Unknown Source)
        at com.pmease.quickbuild.grid.AgentConnectivityTask.run(AgentConnectivityTask.java:51)
        at java.lang.Thread.run(Unknown Source)
    Caused by: java.net.SocketException: Software caused connection abort: recv failed
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at sun.security.ssl.InputRecord.readFully(Unknown Source)
        at sun.security.ssl.InputRecord.read(Unknown Source)
        at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source)
        at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
        at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
        at sun.net.www.protocol.https.HttpsClient.afterConnect(Unknown Source)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(Unknown Source)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(Unknown Source)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(Unknown Source)
        at com.caucho.hessian.client.HessianURLConnection.getOutputStream(HessianURLConnection.java:101)
        at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:283)
        ... 4 more

We didn't see such behavior in 5.1.30.




 Comments   
Comment by Erki Männiste [ 03/May/16 09:21 AM ]
As a clarification, these SocketExceptions are not always present in agent log when build is cancelled.
Comment by Robin Shen [ 03/May/16 11:50 PM ]
The server might be overloaded. You may consider to appoint some agents as artifact server via artifact storage setting in advanced setting of relevant configurations to reduce i/o load of server to see if situation is better.
Comment by Robin Shen [ 01/Dec/16 08:22 AM ]
Reopen it if there is more clue
Generated at Fri May 17 01:29:46 UTC 2024 using JIRA 189.