History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QB-3311
Type: Bug Bug
Status: Closed Closed
Resolution: Cannot Reproduce
Priority: Major Major
Assignee: Robin Shen
Reporter: Todd Scholl
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
QuickBuild

Quickbuild Server Outage

Created: 17/Dec/18 02:12 PM   Updated: 17/Jan/19 06:01 AM
Component/s: None
Affects Version/s: 7.0.16
Fix Version/s: None

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
File Attachments: 1. Zip Archive console.log.zip (1012 kb)
2. GZip Archive cv-snapav-p3_QB_Issue_12_17.tra.gz (41 kb)
3. GZip Archive db_connection_gathering.gz (3.30 Mb)
4. Zip Archive issue_12_17_2018.zip (2.04 Mb)

Environment:
Operating System Linux 4.4.114-94.14-default, amd64
OS User Name root
JVM Java HotSpot(TM) 64-Bit Server VM 1.7.0_79, Oracle Corporation
QuickBuild Version 7.0.16 - Fri May 12 09:57:15 EDT 2017
Total Memory 1.99 GB
Used Memory 1.46 GB
Configurations 7717


 Description  « Hide
Support,
On this morning 12/17 at ~ 1 am the GUI for our QB server became non responsive. The logs for the time show a sharp drop in activity, but do not mention any crash. The logs show jobs still launching, but no jobs reached the agents after this time and nothing appears to have gone into the QB database after ~1:06 am. I had to initiate a server restart at 2:34 am to try and recover the server and after that information and jobs started to flow. I executed the "./server.sh dump" command twice be fore restarting.

After recovery, several of the agents displayed errors and would not reconnect to the server. I had to do a series on manual restarts on them until about ~3 am when the remaining troublesome agents just started working without intervention.

I need to know what was causing these issues and what we can do to fix it.

I am attaching logs for the issue though some of them may be too large. Is there a place I can sftp them too for the big ones?

 All   Comments   Work Log   Change History      Sort Order:
Robin Shen [07/Jan/19 11:24 PM]
The mariadb driver has the same class name as mysql driver (tries to be compatible). Please remove file "mariadb-java-client-1.5.2.jar" from "<QB server install dir>/plugins/com.pmease.quickbuild.libs", then restart QB.

Todd Scholl [07/Jan/19 02:14 PM]
Robin,
I believe we are using the MYSQL drivers in the hibernate.properties:

hibernate.dialect=org.hibernate.dialect.MySQLDialect
hibernate.connection.driver_class=com.mysql.jdbc.Driver
hibernate.connection.url=jdbc:mysql://&lt;ip&gt;:3306/qb
hibernate.connection.username=<username>
hibernate.connection.password=<password>

I don't see any reference for maria in the config files. Where would I find this?


Robin Shen [05/Jan/19 12:09 AM]
I noticed that you are using MariaDB driver, please use MySQL official jdbc driver to see if the problem still exists.

Todd Scholl [04/Jan/19 05:21 AM]
MYSQL

Robin Shen [04/Jan/19 12:20 AM]
Which database are you using?

Todd Scholl [03/Jan/19 09:16 PM]
Robin,
After further analysis, we may have found something of interest and would like to know the meaning of it.

jvm 1 | 2019-01-03 12:00:26,809 WARN - [c3p0] A PooledConnection that has already signalled a Connection error is still in use!
jvm 1 | 2019-01-03 12:00:26,809 WARN - [c3p0] Another error has occurred [ java.sql.SQLException: Cannot do an operation on a closed statement ] which will not be reported to listeners!
jvm 1 | java.sql.SQLException: Cannot do an operation on a closed statement
jvm 1 | at org.mariadb.jdbc.MariaDbStatement.checkClose(MariaDbStatement.java:1189)
jvm 1 | at org.mariadb.jdbc.MariaDbStatement.getWarnings(MariaDbStatement.java:675)
jvm 1 | at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.getWarnings(NewProxyPreparedStatement.java:1031)
jvm 1 | at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.handleAndClearWarnings(SqlExceptionHelper.java:320)
jvm 1 | at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.logAndClearWarnings(SqlExceptionHelper.java:273)
jvm 1 | at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.close(JdbcCoordinatorImpl.java:529)
jvm 1 | at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.cleanup(JdbcCoordinatorImpl.java:509)
jvm 1 | at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.close(JdbcCoordinatorImpl.java:204)
jvm 1 | at org.hibernate.engine.transaction.internal.TransactionCoordinatorImpl.close(TransactionCoordinatorImpl.java:297)
jvm 1 | at org.hibernate.internal.SessionImpl.close(SessionImpl.java:369)
jvm 1 | at com.pmease.quickbuild.persistence.SessionInterceptor.invoke(SessionInterceptor.java:67)
jvm 1 | at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
jvm 1 | at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)
jvm 1 | at com.pmease.quickbuild.entitymanager.impl.DefaultBuildManager$$EnhancerByGuice$$370b989.getStatus(<generated>)
jvm 1 | at com.pmease.quickbuild.log.DefaultLivelogManager.execute(DefaultLivelogManager.java:68)
jvm 1 | at com.pmease.quickbuild.taskschedule.DefaultTaskScheduler$HelperTask.execute(DefaultTaskScheduler.java:103)
jvm 1 | at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
jvm 1 | at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)

It is included in the latest log I have attached.

Todd Scholl [03/Jan/19 06:31 PM]
Robin,
We believe we are seeing the issue, it appears that connections from the App are being stuck in a CLOSE_WAIT state and are not actually closing so connections just hang out. Originally we had a max of 10 DB connections in the hibernate config which we believe was getting full of these close wait connections, yesterday we adjusted the connection pool up to 70 and this morning we see the following connections from QB:

10.200.255.100:47442 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:46472 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:46470 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47500 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:47408 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:46466 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47440 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:52938 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:46468 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:47434 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:47508 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47498 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:40118 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:47502 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:47512 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:40116 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:40114 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:47430 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:47472 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:47410 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:47394 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47488 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47432 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:47474 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:47476 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 0 0 10.200.255.100:47486 10.200.255.110:3306 ESTABLISHED 22975/java tcp 0 0 10.200.255.100:47482 10.200.255.110:3306 ESTABLISHED 22975/java tcp 1 0 10.200.255.100:46474 10.200.255.110:3306 CLOSE_WAIT 22975/java tcp 1 0 10.200.255.100:47484 10.200.255.110:3306 CLOSE_WAIT 22975/java

Is there a setting or log entry that can explain this?

Todd Scholl [18/Dec/18 03:12 PM]
Thank you Robin, we are investigating our DB.

Robin Shen [17/Dec/18 11:32 PM]
From the dump, almost all jobs are waiting for database connection. Looks like that there are connection issue to database server. Also the server memory should be increased to at least 4G in "conf/wrapper.conf" via property "wrapper.java.maxmemory" considering you have more than 7000 configurations.