<< Back to previous view

[QB-3311] Quickbuild Server Outage
Created: 17/Dec/18  Updated: 17/Jan/19

Status: Closed
Project: QuickBuild
Component/s: None
Affects Version/s: 7.0.16
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Todd Scholl Assigned To: Robin Shen
Resolution: Cannot Reproduce Votes: 0
Remaining Estimate: Unknown Time Spent: Unknown
Original Estimate: Unknown
Environment: Operating System Linux 4.4.114-94.14-default, amd64
OS User Name root
JVM Java HotSpot(TM) 64-Bit Server VM 1.7.0_79, Oracle Corporation
QuickBuild Version 7.0.16 - Fri May 12 09:57:15 EDT 2017
Total Memory 1.99 GB
Used Memory 1.46 GB
Configurations 7717

File Attachments: Zip Archive console.log.zip     GZip Archive cv-snapav-p3_QB_Issue_12_17.tra.gz     GZip Archive db_connection_gathering.gz     Zip Archive issue_12_17_2018.zip    

 Description   
Support,
On this morning 12/17 at ~ 1 am the GUI for our QB server became non responsive. The logs for the time show a sharp drop in activity, but do not mention any crash. The logs show jobs still launching, but no jobs reached the agents after this time and nothing appears to have gone into the QB database after ~1:06 am. I had to initiate a server restart at 2:34 am to try and recover the server and after that information and jobs started to flow. I executed the "./server.sh dump" command twice be fore restarting.

After recovery, several of the agents displayed errors and would not reconnect to the server. I had to do a series on manual restarts on them until about ~3 am when the remaining troublesome agents just started working without intervention.

I need to know what was causing these issues and what we can do to fix it.

I am attaching logs for the issue though some of them may be too large. Is there a place I can sftp them too for the big ones?

 Comments   
Comment by Robin Shen [ 17/Dec/18 11:32 PM ]
From the dump, almost all jobs are waiting for database connection. Looks like that there are connection issue to database server. Also the server memory should be increased to at least 4G in "conf/wrapper.conf" via property "wrapper.java.maxmemory" considering you have more than 7000 configurations.
Comment by Todd Scholl [ 18/Dec/18 03:12 PM ]
Thank you Robin, we are investigating our DB.
Comment by Todd Scholl [ 03/Jan/19 06:31 PM ]
Robin,
We believe we are seeing the issue, it appears that connections from the App are being stuck in a CLOSE_WAIT state and are not actually closing so connections just hang out. Originally we had a max of 10 DB connections in the hibernate config which we believe was getting full of these close wait connections, yesterday we adjusted the connection pool up to 70 and this morning we see the following connections from QB:

10.200.255.100:47442 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:46472 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:46470 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47500 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:47408 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:46466 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47440 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:52938 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:46468 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:47434 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:47508 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47498 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:40118 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:47502 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:47512 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:40116 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:40114 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:47430 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:47472 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:47410 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:47394 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47488 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47432 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:47474 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:47476 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:47486 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47482 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:46474 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:47484 10.200.255.110:3306 CLOSE_WAIT 22975/java

Is there a setting or log entry that can explain this?
Comment by Todd Scholl [ 03/Jan/19 09:16 PM ]
Robin,
After further analysis, we may have found something of interest and would like to know the meaning of it.

jvm 1 | 2019-01-03 12:00:26,809 WARN - [c3p0] A PooledConnection that has already signalled a Connection error is still in use!
jvm 1 | 2019-01-03 12:00:26,809 WARN - [c3p0] Another error has occurred [ java.sql.SQLException: Cannot do an operation on a closed statement ] which will not be reported to listeners!
jvm 1 | java.sql.SQLException: Cannot do an operation on a closed statement
jvm 1 | at org.mariadb.jdbc.MariaDbStatement.checkClose(MariaDbStatement.java:1189)
jvm 1 | at org.mariadb.jdbc.MariaDbStatement.getWarnings(MariaDbStatement.java:675)
jvm 1 | at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.getWarnings(NewProxyPreparedStatement.java:1031)
jvm 1 | at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.handleAndClearWarnings(SqlExceptionHelper.java:320)
jvm 1 | at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.logAndClearWarnings(SqlExceptionHelper.java:273)
jvm 1 | at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.close(JdbcCoordinatorImpl.java:529)
jvm 1 | at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.cleanup(JdbcCoordinatorImpl.java:509)
jvm 1 | at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.close(JdbcCoordinatorImpl.java:204)
jvm 1 | at org.hibernate.engine.transaction.internal.TransactionCoordinatorImpl.close(TransactionCoordinatorImpl.java:297)
jvm 1 | at org.hibernate.internal.SessionImpl.close(SessionImpl.java:369)
jvm 1 | at com.pmease.quickbuild.persistence.SessionInterceptor.invoke(SessionInterceptor.java:67)
jvm 1 | at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
jvm 1 | at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)
jvm 1 | at com.pmease.quickbuild.entitymanager.impl.DefaultBuildManager$$EnhancerByGuice$$370b989.getStatus(<generated>)
jvm 1 | at com.pmease.quickbuild.log.DefaultLivelogManager.execute(DefaultLivelogManager.java:68)
jvm 1 | at com.pmease.quickbuild.taskschedule.DefaultTaskScheduler$HelperTask.execute(DefaultTaskScheduler.java:103)
jvm 1 | at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
jvm 1 | at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)

It is included in the latest log I have attached.
Comment by Robin Shen [ 04/Jan/19 12:20 AM ]
Which database are you using?
Comment by Todd Scholl [ 04/Jan/19 05:21 AM ]
MYSQL
Comment by Robin Shen [ 05/Jan/19 12:09 AM ]
I noticed that you are using MariaDB driver, please use MySQL official jdbc driver to see if the problem still exists.
Comment by Todd Scholl [ 07/Jan/19 02:14 PM ]
Robin,
I believe we are using the MYSQL drivers in the hibernate.properties:

hibernate.dialect=org.hibernate.dialect.MySQLDialect
hibernate.connection.driver_class=com.mysql.jdbc.Driver
hibernate.connection.url=jdbc:mysql://&lt;ip&gt;:3306/qb
hibernate.connection.username=<username>
hibernate.connection.password=<password>

I don't see any reference for maria in the config files. Where would I find this?

Comment by Robin Shen [ 07/Jan/19 11:24 PM ]
The mariadb driver has the same class name as mysql driver (tries to be compatible). Please remove file "mariadb-java-client-1.5.2.jar" from "<QB server install dir>/plugins/com.pmease.quickbuild.libs", then restart QB.
Generated at Thu Apr 18 12:11:20 UTC 2024 using JIRA 189.