History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QB-2724
Type: Bug Bug
Status: Closed Closed
Resolution: Cannot Reproduce
Priority: Blocker Blocker
Assignee: Robin Shen
Reporter: Todd Scholl
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
QuickBuild

Job frozen in QB server

Created: 20/May/16 07:53 PM   Updated: 01/Dec/16 08:13 AM
Component/s: None
Affects Version/s: 6.0.9
Fix Version/s: None

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
File Attachments: 1. File jstack_qb_5_20.out (227 kb)

Image Attachments:

1. QB JOB -1.PNG
(154 kb)

2. QB_Overview_2.JPG
(95 kb)
Environment: Production


 Description  « Hide
We are seeing a job that has been building for almost 2 days. The job normally completes in seconds and it has froze in what appears to be the master step on our QB server. Other jobs appear to be completing successfully however. Last time we had this happen it locked up a whole bunch of job and caused a production outage.

Can someone tell us how we fix this short of rebooting the server?

This is the log as it exists:
20:40:26,409 INFO - Executing pre-execute action...
20:40:26,409 INFO - Running step...
20:40:26,416 INFO - Checking step execute condition...
20:40:26,417 INFO - Step execute condition satisfied, executing...

 All   Comments   Work Log   Change History      Sort Order:
Robin Shen [21/May/16 12:37 AM]
Please get me a stack trace by running below command on QB server:
/path/to/jdk/bin/jstack <QB server JVM process id>

If possible, please run this command while there is no other builds running.

Todd Scholl [21/May/16 03:30 AM]
requested jstack attached

Todd Scholl [21/May/16 03:31 AM]
Robin,
Unfortunately in our environment there is always something running.

Robin Shen [21/May/16 05:42 AM]
Checked the stack trace and nothing odd there. Are you able to cancel the build?

Todd Scholl [21/May/16 11:32 AM]
Cancel does not work on this job. It says build stoppping and never does. It is also worth noting this job passed it's build timeout without stopping.

Other info I have gathered is that this job was running in an environment that had network issues during the same time this job originally kicked off. Don't know if that is relitive to you.

Robin Shen [21/May/16 11:32 PM]
Please attach the screenshot of the stuck build overview page.

Todd Scholl [23/May/16 07:22 AM]
Screenshot attached

Robin Shen [24/May/16 12:27 AM]
Sorry for not being clear. I meant the build overview page, which can be displayed by clicking the build version link in problem.

Todd Scholl [24/May/16 03:22 PM]
File attached. let me know if its not right still

Dom Bellardo [24/May/16 04:38 PM]
Hello Robin,

I will be helping out with this issue as todd is on vacation. We were going to attempt a restart of the system to clear this issue, however when we put the system into pause to allow the currently running jobs to complete and not allow any others to start however we did not go forward because we noticed that jobs that were currently running actually paused after the currently running step. The documentation is light in this area, what would we expect during a pause of the system? Will all jobs that are added to the queue be stored in the database and re-added to the queue after a restart? Is there any way to pause the system but allow all running jobs to complete all steps in its configuration?

Also, let me know next steps on the the issue the ticket was opened for.

Robin Shen [25/May/16 12:59 AM]
Please create a maintenance configuration running below script on SERVER to see if the build can be stopped.

groovy:
import com.pmease.quickbuild.stepsupport.StepAwareJob;

for (job in grid.jobs) {
  if (job instanceof StepAwareJob) {
    if (job.build.id == 9854748 && job.stepPath.toString()=="master") {
      logger.info("Found step job");
      def taskFuture = grid.getTaskFuture(job.taskId);
      if (taskFuture != null) {
        logger.info("Found step task");
        taskFuture.jobFinished(job, false);
      }
    }
  }
}

If this script does not work, you may proceed to restart QB server by doing this:
1. issue "service quickbuild stop"
2. upon stop, QB will continue to run current builds. Please check the queue until there is only the stuck build left
3. get a stace trace of QB server process and attach to this issue.
4. now forcibly stop QB server by running "service quickbuild stop" again, QB should be able to be stopped immediately, if not, please kill the QB server process

Robin Shen [01/Dec/16 08:13 AM]
Reopen it if there is more clue