Comment by
Robin Shen
[
14/May/15 11:25 PM
]
|
QB has to cancel jobs when it detects agent offline as otherwise the queue may be filled to block other builds. In your case, you may specify a sufficient large agent timeout (via "adminstration / system setting") so that QB server does not kick the agent out when there is a short outage.
|
Comment by
Phong Trinh
[
18/May/15 07:30 PM
]
|
There is an issue with setting a large timeout also. The issue is that when the machine/node has a network hiccup (or some other reasons,) QB thinks the machine is still available and assigns a job to that machine. Now the server is trying to the run the job the machine/node, but unable to connect to it. QB then cancels the job, so we loose this job. We think QB should not cancel jobs which are assigned node which goes offline and would like QB to have a way to manage these jobs. Maybe there is an option/configuration to tell QB to cancel these jobs or not. This is issue is critical to us, and we need discuss with you on it.
|
Comment by
Robin Shen
[
18/May/15 11:20 PM
]
|
Handling such case is very cubersome and error-prone, as the build can get failed in every possible stage. I'd suggest to reduce agent/server load or improve network bandwidth instead of having application to recover from this low level errors.
|
Generated at Sun May 05 17:55:34 UTC 2024 using JIRA 189.