<< Back to previous view

[QB-2724] Job frozen in QB server
Created: 20/May/16  Updated: 01/Dec/16

Status: Closed
Project: QuickBuild
Component/s: None
Affects Version/s: 6.0.9
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Todd Scholl Assigned To: Robin Shen
Resolution: Cannot Reproduce Votes: 0
Remaining Estimate: Unknown Time Spent: Unknown
Original Estimate: Unknown
Environment: Production

File Attachments: File jstack_qb_5_20.out     PNG File QB JOB -1.PNG     JPEG File QB_Overview_2.JPG    

 Description   
We are seeing a job that has been building for almost 2 days. The job normally completes in seconds and it has froze in what appears to be the master step on our QB server. Other jobs appear to be completing successfully however. Last time we had this happen it locked up a whole bunch of job and caused a production outage.

Can someone tell us how we fix this short of rebooting the server?

This is the log as it exists:
20:40:26,409 INFO - Executing pre-execute action...
20:40:26,409 INFO - Running step...
20:40:26,416 INFO - Checking step execute condition...
20:40:26,417 INFO - Step execute condition satisfied, executing...

 Comments   
Comment by Robin Shen [ 21/May/16 12:37 AM ]
Please get me a stack trace by running below command on QB server:
/path/to/jdk/bin/jstack <QB server JVM process id>

If possible, please run this command while there is no other builds running.
Comment by Todd Scholl [ 21/May/16 03:30 AM ]
requested jstack attached
Comment by Todd Scholl [ 21/May/16 03:31 AM ]
Robin,
Unfortunately in our environment there is always something running.
Comment by Robin Shen [ 21/May/16 05:42 AM ]
Checked the stack trace and nothing odd there. Are you able to cancel the build?
Comment by Todd Scholl [ 21/May/16 11:32 AM ]
Cancel does not work on this job. It says build stoppping and never does. It is also worth noting this job passed it's build timeout without stopping.

Other info I have gathered is that this job was running in an environment that had network issues during the same time this job originally kicked off. Don't know if that is relitive to you.
Comment by Robin Shen [ 21/May/16 11:32 PM ]
Please attach the screenshot of the stuck build overview page.
Comment by Todd Scholl [ 23/May/16 07:22 AM ]
Screenshot attached
Comment by Robin Shen [ 24/May/16 12:27 AM ]
Sorry for not being clear. I meant the build overview page, which can be displayed by clicking the build version link in problem.
Comment by Todd Scholl [ 24/May/16 03:22 PM ]
File attached. let me know if its not right still
Comment by Dom Bellardo [ 24/May/16 04:38 PM ]
Hello Robin,

I will be helping out with this issue as todd is on vacation. We were going to attempt a restart of the system to clear this issue, however when we put the system into pause to allow the currently running jobs to complete and not allow any others to start however we did not go forward because we noticed that jobs that were currently running actually paused after the currently running step. The documentation is light in this area, what would we expect during a pause of the system? Will all jobs that are added to the queue be stored in the database and re-added to the queue after a restart? Is there any way to pause the system but allow all running jobs to complete all steps in its configuration?

Also, let me know next steps on the the issue the ticket was opened for.
Comment by Robin Shen [ 25/May/16 12:59 AM ]
Please create a maintenance configuration running below script on SERVER to see if the build can be stopped.

groovy:
import com.pmease.quickbuild.stepsupport.StepAwareJob;

for (job in grid.jobs) {
  if (job instanceof StepAwareJob) {
    if (job.build.id == 9854748 && job.stepPath.toString()=="master") {
      logger.info("Found step job");
      def taskFuture = grid.getTaskFuture(job.taskId);
      if (taskFuture != null) {
        logger.info("Found step task");
        taskFuture.jobFinished(job, false);
      }
    }
  }
}

If this script does not work, you may proceed to restart QB server by doing this:
1. issue "service quickbuild stop"
2. upon stop, QB will continue to run current builds. Please check the queue until there is only the stuck build left
3. get a stace trace of QB server process and attach to this issue.
4. now forcibly stop QB server by running "service quickbuild stop" again, QB should be able to be stopped immediately, if not, please kill the QB server process
Comment by Robin Shen [ 01/Dec/16 08:13 AM ]
Reopen it if there is more clue
Generated at Sun May 05 09:05:58 UTC 2024 using JIRA 189.