|
|
|
Just out of curiosity: In earlier versions, the build was marked as timed out when this happened. Has this been changed?
This is actually the reason I did not notice it in the beginning. I feel so stupid!
While experimenting with the "Disconnect Tolerance", I must have accidently entered the value in one of the "Sequential" steps into the wrong field "Timeout" instead of "Disconnect Tolerance". As we had network problems at that time, I must not have noticed it immediately and later forgot that I changes the "Disconnect Tolerance" in two places. So, the "Timeout" was set to 5 minutes and this caused this 'InterruptedException'! As while doing maintenance, I did the update to version version 12.0 at he same time and later looked at the timeouts, therefore I wrongly wrote it down to this update as the 'new problem' was still there even after fixing the network problems. Sorry for having wasted your time. Can you please attach screenshot of overview of the failed build?
The master step is already configured to run on "DB-FUSION-YOCTO:8811", therefore there is no parent, is it?
This setting should be specified in parent step of the step running on node "DB-FUSION-YOCTO:8811".
Actually, I already had disconnection tolerance on 180 on the master step (this is where the delegation to ubuntu is done).
Or is this not the correct place? The sleep task went through just fine.
Would it help to run a build with increased logging on the server? The testGridJob checks connectivity of the node running the job periodically by checking the agent port. It kills the job if it detected disconnection. You may also edit parent step of the failed step to specify a network disconnection tolerance value (in advanced setting of the step) to see if it helps.
Sure, I will.
Meanwhile I've restarted all the build machines. I also ran ping on both sides, there was absolutely no interrupt. I also looked at the agent log (see below), but probably this will not help. It always breaks 5 minutes into the build. I assume I just create a build that sleeps for 6 minutes and we sill see. jvm 1 | 2022-03-09 09:38:25,057 INFO Workspace: /home/fusionuser/dev/buildagent/workspace/Fusion/GPR/GPR3/OS/BRCM/GPR3_OS Branch jvm 1 | 2022-03-09 09:43:25,590 WARN testGridJob is cancelling job 'ac43330f-76c5-47f3-bcae-dfa1aa23f661'... jvm 1 | 2022-03-09 09:43:25,591 WARN Job still exists on job node and cancel command is issued (job id: ac43330f-76c5-47f3-bcae-dfa1aa23f661, build id: 102982, job node: DB-FUSION-YOCTO:8811)... jvm 1 | 2022-03-09 09:43:25,592 WARN testGridJob is cancelling job 'fd76e1dc-6c8f-4aa0-a327-e6423a5722d1'... jvm 1 | 2022-03-09 09:43:25,592 WARN Job still exists on job node and cancel command is issued (job id: fd76e1dc-6c8f-4aa0-a327-e6423a5722d1, build id: 102982, job node: DB-FUSION-YOCTO:8811)... jvm 1 | 2022-03-09 09:43:25,592 WARN testGridJob is cancelling job 'a221849a-e509-4a3c-aec4-ff821af30219'... jvm 1 | 2022-03-09 09:43:25,592 WARN Job still exists on job node and cancel command is issued (job id: a221849a-e509-4a3c-aec4-ff821af30219, build id: 102982, job node: DB-FUSION-YOCTO:8811)... jvm 1 | 2022-03-09 09:43:25,592 WARN testGridJob is cancelling job '1b884602-cf6a-40e2-9fc6-59e66e0a2214'... jvm 1 | 2022-03-09 09:43:25,593 WARN Job still exists on job node and cancel command is issued (job id: 1b884602-cf6a-40e2-9fc6-59e66e0a2214, build id: 102982, job node: DB-FUSION-YOCTO:8811)... jvm 1 | 2022-03-09 09:43:25,593 WARN testGridJob is cancelling job '423baf8b-8c03-42f3-8915-a7294f9f79db'... jvm 1 | 2022-03-09 09:43:25,593 WARN Job still exists on job node and cancel command is issued (job id: 423baf8b-8c03-42f3-8915-a7294f9f79db, build id: 102982, job node: DB-FUSION-YOCTO:8811)... jvm 1 | 2022-03-09 09:43:25,605 INFO Killing process 15112... jvm 1 | 2022-03-09 09:43:25,605 INFO Killing process 15077... jvm 1 | 2022-03-09 09:43:25,606 INFO Killing process 15076... jvm 1 | 2022-03-09 09:43:25,627 INFO Killing process 15123... jvm 1 | 2022-03-09 09:43:25,627 INFO Killing process 15125... jvm 1 | 2022-03-09 09:43:25,627 INFO Killing process 15126... jvm 1 | 2022-03-09 09:43:25,627 INFO Killing process 15127... jvm 1 | 2022-03-09 09:43:25,628 INFO Killing process 15128... jvm 1 | 2022-03-09 09:43:25,628 INFO Killing process 15129... jvm 1 | 2022-03-09 09:43:25,628 INFO Killing process 15130... jvm 1 | 2022-03-09 09:43:25,628 INFO Killing process 15131... jvm 1 | 2022-03-09 09:43:25,629 INFO Killing process 15132... jvm 1 | 2022-03-09 09:43:25,629 INFO Killing process 15120... jvm 1 | 2022-03-09 09:43:25,629 INFO Killing process 15123... jvm 1 | 2022-03-09 09:43:25,629 INFO Killing process 15125... jvm 1 | 2022-03-09 09:43:25,630 INFO Killing process 15126... jvm 1 | 2022-03-09 09:43:25,630 INFO Killing process 15127... jvm 1 | 2022-03-09 09:43:25,630 INFO Killing process 15128... jvm 1 | 2022-03-09 09:43:25,631 INFO Killing process 15129... jvm 1 | 2022-03-09 09:43:25,632 INFO Killing process 15130... jvm 1 | 2022-03-09 09:43:25,632 INFO Killing process 15131... jvm 1 | 2022-03-09 09:43:25,632 INFO Killing process 15132... Can you please help to reproduce this on a blank QB instance, and send me the database backup for diagnostics?
Is this happening all the time, or occasionally?
|
https://track.pmease.com/browse/QB-3841