<< Back to previous view

[QB-3840] InterruptedException
Created: 08/Mar/22  Updated: 09/Mar/22

Status: Closed
Project: QuickBuild
Component/s: None
Affects Version/s: 12.0.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Martin Falkner Assigned To: Robin Shen
Resolution: Won't Fix Votes: 0
Remaining Estimate: Unknown Time Spent: Unknown
Original Estimate: Unknown
Environment: QB running on Windows 10 delegating to Ubuntu


 Description   
Since updating to version 12.0 my build breaks without obvious reason.
What could be wrong?
What is the meaning of 'InterruptedException'?

09:56:23,692 ERROR - Step 'master>Build YOCTO>Build LDK Bitbake?QB_Project=gpr3-root-image>Build Bitbake' is failed.
    java.lang.RuntimeException: java.lang.InterruptedException
        at com.pmease.quickbuild.execution.Commandline.execute(Commandline.java:395)
        at com.pmease.quickbuild.plugin.basis.CommandBuildStep.run(CommandBuildStep.java:238)
        at com.pmease.quickbuild.plugin.basis.CommandBuildStep$$EnhancerByCGLIB$$70f7be85.CGLIB$run$14(<generated>)
        at com.pmease.quickbuild.plugin.basis.CommandBuildStep$$EnhancerByCGLIB$$70f7be85$$FastClassByCGLIB$$86ab0a2d.invoke(<generated>)
        at net.sf.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228)
        at com.pmease.quickbuild.DefaultScriptEngine$Interpolator.intercept(DefaultScriptEngine.java:267)
        at com.pmease.quickbuild.plugin.basis.CommandBuildStep$$EnhancerByCGLIB$$70f7be85.run(<generated>)
        at com.pmease.quickbuild.stepsupport.Step.doExecute(Step.java:677)
        at com.pmease.quickbuild.stepsupport.Step.execute(Step.java:577)
        at com.pmease.quickbuild.stepsupport.StepExecutionJob.executeStepAwareJob(StepExecutionJob.java:31)
        at com.pmease.quickbuild.stepsupport.StepAwareJob.executeBuildAwareJob(StepAwareJob.java:56)
        at com.pmease.quickbuild.BuildAwareJob.execute(BuildAwareJob.java:77)
        at com.pmease.quickbuild.grid.GridJob.run(GridJob.java:131)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395)
        at com.pmease.quickbuild.execution.Commandline.execute(Commandline.java:357)
        ... 17 more

 Comments   
Comment by Robin Shen [ 08/Mar/22 10:45 PM ]
Is this happening all the time, or occasionally?
Comment by Martin Falkner [ 09/Mar/22 06:26 AM ]
On this build, It's happening every time.
Comment by Robin Shen [ 09/Mar/22 07:18 AM ]
Can you please help to reproduce this on a blank QB instance, and send me the database backup for diagnostics?
Comment by Martin Falkner [ 09/Mar/22 08:47 AM ]
Sure, I will.

Meanwhile I've restarted all the build machines.
I also ran ping on both sides, there was absolutely no interrupt.
I also looked at the agent log (see below), but probably this will not help.
It always breaks 5 minutes into the build.
I assume I just create a build that sleeps for 6 minutes and we sill see.

jvm 1 | 2022-03-09 09:38:25,057 INFO Workspace: /home/fusionuser/dev/buildagent/workspace/Fusion/GPR/GPR3/OS/BRCM/GPR3_OS Branch
jvm 1 | 2022-03-09 09:43:25,590 WARN testGridJob is cancelling job 'ac43330f-76c5-47f3-bcae-dfa1aa23f661'...
jvm 1 | 2022-03-09 09:43:25,591 WARN Job still exists on job node and cancel command is issued (job id: ac43330f-76c5-47f3-bcae-dfa1aa23f661, build id: 102982, job node: DB-FUSION-YOCTO:8811)...
jvm 1 | 2022-03-09 09:43:25,592 WARN testGridJob is cancelling job 'fd76e1dc-6c8f-4aa0-a327-e6423a5722d1'...
jvm 1 | 2022-03-09 09:43:25,592 WARN Job still exists on job node and cancel command is issued (job id: fd76e1dc-6c8f-4aa0-a327-e6423a5722d1, build id: 102982, job node: DB-FUSION-YOCTO:8811)...
jvm 1 | 2022-03-09 09:43:25,592 WARN testGridJob is cancelling job 'a221849a-e509-4a3c-aec4-ff821af30219'...
jvm 1 | 2022-03-09 09:43:25,592 WARN Job still exists on job node and cancel command is issued (job id: a221849a-e509-4a3c-aec4-ff821af30219, build id: 102982, job node: DB-FUSION-YOCTO:8811)...
jvm 1 | 2022-03-09 09:43:25,592 WARN testGridJob is cancelling job '1b884602-cf6a-40e2-9fc6-59e66e0a2214'...
jvm 1 | 2022-03-09 09:43:25,593 WARN Job still exists on job node and cancel command is issued (job id: 1b884602-cf6a-40e2-9fc6-59e66e0a2214, build id: 102982, job node: DB-FUSION-YOCTO:8811)...
jvm 1 | 2022-03-09 09:43:25,593 WARN testGridJob is cancelling job '423baf8b-8c03-42f3-8915-a7294f9f79db'...
jvm 1 | 2022-03-09 09:43:25,593 WARN Job still exists on job node and cancel command is issued (job id: 423baf8b-8c03-42f3-8915-a7294f9f79db, build id: 102982, job node: DB-FUSION-YOCTO:8811)...
jvm 1 | 2022-03-09 09:43:25,605 INFO Killing process 15112...
jvm 1 | 2022-03-09 09:43:25,605 INFO Killing process 15077...
jvm 1 | 2022-03-09 09:43:25,606 INFO Killing process 15076...
jvm 1 | 2022-03-09 09:43:25,627 INFO Killing process 15123...
jvm 1 | 2022-03-09 09:43:25,627 INFO Killing process 15125...
jvm 1 | 2022-03-09 09:43:25,627 INFO Killing process 15126...
jvm 1 | 2022-03-09 09:43:25,627 INFO Killing process 15127...
jvm 1 | 2022-03-09 09:43:25,628 INFO Killing process 15128...
jvm 1 | 2022-03-09 09:43:25,628 INFO Killing process 15129...
jvm 1 | 2022-03-09 09:43:25,628 INFO Killing process 15130...
jvm 1 | 2022-03-09 09:43:25,628 INFO Killing process 15131...
jvm 1 | 2022-03-09 09:43:25,629 INFO Killing process 15132...
jvm 1 | 2022-03-09 09:43:25,629 INFO Killing process 15120...
jvm 1 | 2022-03-09 09:43:25,629 INFO Killing process 15123...
jvm 1 | 2022-03-09 09:43:25,629 INFO Killing process 15125...
jvm 1 | 2022-03-09 09:43:25,630 INFO Killing process 15126...
jvm 1 | 2022-03-09 09:43:25,630 INFO Killing process 15127...
jvm 1 | 2022-03-09 09:43:25,630 INFO Killing process 15128...
jvm 1 | 2022-03-09 09:43:25,631 INFO Killing process 15129...
jvm 1 | 2022-03-09 09:43:25,632 INFO Killing process 15130...
jvm 1 | 2022-03-09 09:43:25,632 INFO Killing process 15131...
jvm 1 | 2022-03-09 09:43:25,632 INFO Killing process 15132...
Comment by Robin Shen [ 09/Mar/22 08:52 AM ]
The testGridJob checks connectivity of the node running the job periodically by checking the agent port. It kills the job if it detected disconnection. You may also edit parent step of the failed step to specify a network disconnection tolerance value (in advanced setting of the step) to see if it helps.
Comment by Martin Falkner [ 09/Mar/22 09:06 AM ]
The sleep task went through just fine.
Would it help to run a build with increased logging on the server?
Comment by Martin Falkner [ 09/Mar/22 09:08 AM ]
Actually, I already had disconnection tolerance on 180 on the master step (this is where the delegation to ubuntu is done).
Or is this not the correct place?
Comment by Robin Shen [ 09/Mar/22 10:05 AM ]
This setting should be specified in parent step of the step running on node "DB-FUSION-YOCTO:8811".
Comment by Martin Falkner [ 09/Mar/22 10:26 AM ]
The master step is already configured to run on "DB-FUSION-YOCTO:8811", therefore there is no parent, is it?
Comment by Robin Shen [ 09/Mar/22 10:40 AM ]
Can you please attach screenshot of overview of the failed build?
Comment by Martin Falkner [ 09/Mar/22 11:21 AM ]
I feel so stupid!
While experimenting with the "Disconnect Tolerance", I must have accidently entered the value in one of the "Sequential" steps into the wrong field "Timeout" instead of "Disconnect Tolerance". As we had network problems at that time, I must not have noticed it immediately and later forgot that I changes the "Disconnect Tolerance" in two places.
So, the "Timeout" was set to 5 minutes and this caused this 'InterruptedException'!
As while doing maintenance, I did the update to version version 12.0 at he same time and later looked at the timeouts, therefore I wrongly wrote it down to this update as the 'new problem' was still there even after fixing the network problems.
Sorry for having wasted your time.
Comment by Martin Falkner [ 09/Mar/22 11:27 AM ]
Just out of curiosity: In earlier versions, the build was marked as timed out when this happened. Has this been changed?
This is actually the reason I did not notice it in the beginning.
Comment by Robin Shen [ 09/Mar/22 11:44 AM ]
For step timeouts, it is always marked as build failure. And if timeout is specified at configuration level, it will be marked as build timed out. This is the same for all versions. I will close this issue and file a separate bug:

https://track.pmease.com/browse/QB-3841
Generated at Thu Apr 18 22:35:47 UTC 2024 using JIRA 189.