|
|
|
[
Permlink
| « Hide
]
Robin Shen [18/Sep/24 01:42 AM]
Agent timeout can be configured in system setting. However, instead of changing agent timeout, please make sure to stop agent service gracefully before reverting the system so that agent has the chance to notify server of its down.
I am concerned that if I stop the service before reverting the system , there is chance that qb server drops the node before the reversion happens.
You may run a command step to stop service, for instance:
# sleep a while to allow the build to finish sleep 10 service buildagent stop Make sure to untick the option "wait for finish" in advanced setting of the step. Thank you very much for your suggestion. I am going to give it a try and will keep you informed.
Regarding my first issue: "QuickBuild server drops all of my test when all of the nodes in the resource disappeared." How do I avoid the server dropping the tests? Some nodes have very heavy load and need more time to response to the server. Can I override the agent timeout at their level? What do you mean by "when all of the nodes in the resource disappeared"? Is this because all your nodes are running heavy load and QB server thinks they are dead? If so, what do you mean by "QB drops all tests"?
I have a QB configuration for my tests, such as root/RegressionTest, which is set up to allow concurrent test execution. This configuration runs tests on nodes within the resource 'Eligible for Regression Tests,' which contains 5 nodes. After each test completes, even if it fails, the node/system is reverted. When I run 15 tests, 5 are executed on the available nodes, while the remaining tests wait for nodes to free up. If all 5 tests finish simultaneously, the nodes are reverted and go offline, effectively disappearing from the 'Eligible for Regression Tests' resource. When this occurs, QB drops the remaining tests.
How are you triggering build of configuration "root/RegressionTest". Is it manually, via restful api, or via trigger build step from another configuration? How are you distinguish different tests when triggering the configuration?
Also have you tried to gracefully shutdown the agents? Taking down the agent forcily can cause job loss, due to incorrect agent state. I created step in another configuration, and this step triggers root/RegressionTest is similar to the follow:
import com.pmease.quickbuild.*; productVersions = "1.0.0,1.1.0,1.2.0,1.3.0,1.4.0,2.0.0,2.1.0" // Up to 15 versions String[] arrayProductVersions = productVersions.split(","); def configurationIdToTrigger = system.configurationManager.get("root/RegressionTest").id; def productVersion for (int nLoop = 0; nLoop < arrayProductVersions.length; nLoop++) { def newRequest = new BuildRequest(); productVersion = arrayProductVersions[nLoop] newRequest.configurationId = configurationIdToTrigger; newRequest.variables = ["version":productVersion]; system.buildEngine.requestBuild(Context.getUser(), false, newRequest); } I stop the QB agent service as your suggestion: # sleep a while to allow the build to finish sleep 10 service buildagent stop Make sure to untick the option "wait for finish" in advanced setting of the step. Thanks for the elaboration. This happens as QB server sends job immediately to the build agent after current job finishes and before build agent is signaled to shutdown. To work around this issue:
1. Edit user attributes of each of your build agent to add an attribute say "ready", with initial value set to "true" 2. For the grid resoure you are using, change its node selection setting to only select build agents with attribute "ready" equals "true" 3. Edit pre-execute action of master step (or the step using above resource) of configuration "root/RegressionTest" to execute below script: groovy: def userAttributes = node.userAttributes; userAttributes["ready"] = "false"; node.setUserAttributes(userAttributes, true); 4. After reverting the build agent, change property "ready" to "true" in file "<build agent dir>/conf/attributes.properties". This has to be done before starting agent service. Thank you very much for your suggestion, Robin!
I think I did very similar to the suggestion. I created a step to set the user attribute at the beginning of the process (first step in the process) as follow: def agentName = "$var.getValue('RunningNode')}"; // I got the node that the test is running on and assigned to that variable. def agent = grid.getNode(agentName); userAttributes = agent.userAttributes; userAttributes.put("CleanMachine", "0"); agent.setUserAttributes(userAttributes,true); For the grid resource I am using, it sets to only select build agents with attribute "CleanMachine" equals "1" I reset the value of the attribute to 1 after the reversion. However, i still get the same issue. Is each of the agent running a single test at the same time? If so, the approach should work. You may check user attribute of the agent running the test to see if its value is changed.
Yes, that is correct. Each of the agents is running a single test at the same time.
So I guess master step of configuration "root/RegressionTest" is set to use the resource. If so, please use the logic I suggested in pre-execute action of the master step to see if it works.
When one of the tests is running on a node, none of the other tests can jump on this node. I think setting the user attribute at pre-execution or in the first step are the same. However, I will try it and keep you informed.
Thank you for your help! - I updated the reversion process to reset the flag/user attribute CleanMachine=1. It works for me. Thank you very much, Robin!
- Regarding the issue of dropping tests, I configured to keep at least one available node in the resource group. Hopefully, it can be configured that allows QB to drop builds/tests if there is no available in nodes in the group after a period of time. Even if no nodes are available, QB will not drop tests at my side with this approach. If you can reproduce this issue with a sample database, please send me the database backup for investigation.
Thank you for your help, Robin. Please close this request. If this happens again, I create a new ticket.
|