<< Back to previous view

[QB-3650] There was a deadlock in QuickBuild that caused the memory spike
Created: 17/Dec/20  Updated: 08/Jan/22

Status: Closed
Project: QuickBuild
Component/s: None
Affects Version/s: 9.0.46
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Phong Trinh Assigned To: Robin Shen
Resolution: Incomplete Votes: 0
Remaining Estimate: Unknown Time Spent: Unknown
Original Estimate: Unknown

File Attachments: PNG File QB Server - DB Config.PNG     Text File QuickBuild_Server_Deadlock.txt    

 Description   
 Recently our QuickBuild server went down by itself several times. We checked the server logs and found that there was a deadlock as follow which caused the memory spike. we set wrapper.java.maxmemory=4096, I think it should be fine. I attached the server log file for your references.

 =======================================================================================================================================

 2020-12-17 10:17:22,470 [C3P0PooledConnectionPoolManager[identityToken->z8kfltaesj634k1iyllxx|58cae5cb)-AdminTaskTimer] WARN com.mchange.v2.async.ThreadPoolAsynchronousRunner - com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@44547507 -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!
2020-12-17 10:20:56,673 [pool-1-thread-21150] ERROR com.pmease.quickbuild.grid.GridJob - Error notifying task node of job finishing (job class: 86efdbe6-673b-4b9b-8d92-373f930e71aa, job id: com.pmease.quickbuild.stepsupport.StepProcessJob, task node: QB_SERVER.com:8810)
    java.lang.OutOfMemoryError: Java heap space
=======================================================================================================================================

 Please look into the issue and advise.

 Thank you in advance,
ptrinh

 Comments   
Comment by Robin Shen [ 18/Dec/20 01:49 AM ]
When OutOfMemory error occurs, please get me a memory dump by running below command as the same user running QB server process:

/path/to/jdk/bin/jmap -dump:format=b,file=/path/to/dump <QB server JVM process id>

Then please put the memory dump to dropbox and send me the link
Comment by Phong Trinh [ 22/Dec/20 03:14 PM ]
It occurred again today, but forgot to take the memory dump prior to rebooting the server. We will do that in next one.
Thanks, Robin
Comment by Robin Shen [ 22/Dec/20 11:17 PM ]
In the meantime, please get a stack trace of the server when it happens:
/path/to/jdk/bin/stack <QB server JVM process id>
Comment by Phong Trinh [ 13/Jan/21 03:54 AM ]
The server was down again on 12/24/2020 due to that Java heap was out of memory. Unfortunately the system admin was not available to take the memory dump at that time. The issue have not occurred again since that day. I looked into the logs, it seems that number of build servers had network hiccups or down at the middle of the builds, and the server tried to connect to them. That might cause the deadlock as in the log file, and the deadlock consumed the memory and brought the server down. What do you think?
 We still keep an eye on the server and build servers and will take memory dump as your advice if the issue occurs again.

 Thanks,
ptinh
Comment by Robin Shen [ 13/Jan/21 04:29 AM ]
How many database connections you've configured in conf/hibernate.properties?
Comment by Phong Trinh [ 13/Jan/21 03:23 PM ]
We are using the PostgreSQL database. The configuration of database connections is 25 as in attached file. Thanks
Comment by Robin Shen [ 13/Jan/21 11:16 PM ]
Please increase it to 50 if you have many builds to run per day.
Comment by Phong Trinh [ 14/Jan/21 03:20 AM ]
Thanks, Robin. We are going to increase it to 50 as your advice, since there are many builds to run per day..
Comment by Phong Trinh [ 18/Jan/21 12:58 AM ]
We made the change to our server yesterday. Is there any reasons that you think the configuration of the database connections might cause the issue? Thanks.
Comment by Robin Shen [ 18/Jan/21 01:03 AM ]
Increasing database connections may not solve the issue completely, but should be able to reduce the possibility of the issue. Especially when the site is busy.
Generated at Fri Apr 19 22:55:36 UTC 2024 using JIRA 189.