<< Back to previous view

[QB-2906] Quickbuild Server Queue hangs
Created: 22/Feb/17  Updated: 17/Jan/18

Status: Closed
Project: QuickBuild
Component/s: None
Affects Version/s: 6.1.9
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Todd Scholl Assigned To: Robin Shen
Resolution: Cannot Reproduce Votes: 0
Remaining Estimate: Unknown Time Spent: Unknown
Original Estimate: Unknown
Environment: Linux 3.0.101-71-default #1 SMP Thu Mar 3 12:56:15 UTC 2016 (7bdad2e) x86_64 x86_64 x86_64 GNU/Linux

File Attachments: JPEG File Issue_2_21_2017.JPG     JPEG File Issue_2_21_2017_2.JPG     GZip Archive qb_issues_02212017_1.tar.gz     GZip Archive qb_issues_02212017_2.tar.gz     Zip Archive quickbuild_issues_2_28_console.log.1.zip     Zip Archive quickbuild_issues_2_28_console.zip     Zip Archive quickbuild_issues_2_28_quickbuild.zip    

 Description   
In our quick build server today we are seeing an odd set of circumstances, this has occurred 3 times in the last 12 hours:
1. Jobs currently in progress stop progressing - Jobs collecting metrics stick in that state, jobs checking for build condition stick there, and job running steps stick there
2. All newly triggered jobs stay queued with Waiting_Process

The only thing that seems to resolve the issue is a Quickbuild service restart which interrupts our production job runs

Can you have a look at the logs and thread dumps we have and see if you can pinpoint what is going on?

 Comments   
Comment by Robin Shen [ 23/Feb/17 12:33 AM ]
Looks like SSL of two of your build agents "cv-web02:8811" and "cv-yup02:8811" are mis-configured as the log is full of SSL handshake errors. You either need to stop these two build agents, or fix the SSL issue.
Comment by Todd Scholl [ 24/Feb/17 02:46 PM ]
I have fixed one of the agent and am working on another. Does that SSL issue hang enough of the server threads to force the jobs to hang?
Comment by Robin Shen [ 25/Feb/17 12:31 AM ]
All jobs running on these agents will hang. Jobs running on other agents should not be affected (unless it depends on resources used by hanging jobs)
Comment by Todd Scholl [ 28/Feb/17 05:06 PM ]
Robin,
Issue returned today. I have cleaned up the SSL issues that you mentioned above but the queue still froze. Is there anything else you can see that could be causing this? I am uploading today's logs.
Comment by Todd Scholl [ 28/Feb/17 06:53 PM ]
Upload complete. As a second note, there are 3 thread dumps in the "quickbuild_issues_2_28_console.zip" if that helps.
Comment by Robin Shen [ 01/Mar/17 12:37 AM ]
The log is full of SSL error, and there are no odd things in thread dump. I'd suggest to fix the SSL error or remove the build agent to see if the build is still hanging.
Generated at Sun May 05 11:53:58 UTC 2024 using JIRA 189.