Glassfish – JDBC Connection Validation explained
Advanced properties are not being called “advanced” for nothing. Playing with some parameters hoping that it will improve some behaviour, without really knowing all underlying effects can have really bad side effects.
I came across the topic of connection validation while studying a very confusing error, with a very low level of reproducing. Actually, if i think it over, we were not able to reproduce it again once we restarted the application server (that was still before adding the connection validation)
So, let’s go to the roots, and have a sound understanding of what Connection Validation can do for us.
The error we were receiving showed us that the database was closing the connection, as soon as we were trying to do something over it.
Associate with Thread:
Thread-1;|RAR5031:System Exception javax.resource.spi.LocalTransactionException: Closed Connection
Trust me, there isn’t too much information out there regarding this. On the Oracle side, you will get something like this in the trace files:
ORA-600  – The error may have a number of different root causes. For example, a multi-threaded Client-side application sent an OCI call out of order, or the data in the network buffer may have been overwritten.
Since there are many potential causes of this error, it is essential to have a reproducible testcase to correctly diagnose the underlying cause.
Nice, isn’t it ?:)
Since we were working in a distributed environment, with many application servers and one single database, each application server had a pool of some 200 dedicated connections to the database. Depending on the server where the user was landing, this problem could be happening after one or after 1000 transactions. Since the jdbc connections are served from the pool, there is no guarantee that the same connection will be used by the same thread over and over again ( of course, you are able to set the advanced property “Associate with Thread”, but that brings some side effects, which are outside the topic discussed here)
At first i (after having a fresh read about the PGA and SGA on shared vs dedicated Oracle connections the previous night) supposed that the problem could be because of the number of sessions, and somehow because of the change we did a couple of days ago, when we decided to switch off the ASSM in Oracle. I thought that because we forgot setting some manual values for the SGA, a problem like this could happen, where too many connections could cause such a behaviour. So i decided to follow the well known “divide et impera”, by gradually reducing the number of connections on each of the application servers, and trying to reproduce the error on the application server in cause.
Since we were working with a cluster of 6 Glassfish Application Servers, we started with the last one.
- Reducing the number of connections in the pool
When trying to reproduce the problem, it occured to me that it was not only still happening, but it was happening more often. So reducing the number of connections increased the rate of reproducing…interesting
- Reducing the number of connections up to the last application server
After doing this for all servers, and reproducing the problem on all servers, i was now running with a number of 20 connections / pool / application server. Clearly, there was no SGA problem on the Database side.
- Faulty JDBC Connections
After carefully reviewing the log output of the error, reasearching the internet, and discussing this with our DBA, we soon came to the conclusion that the problem was being caused by a faulty jdbc connection. This can happen because of several causes, leading to some inconsistency on JDBC level, for example a connection that is trying to access a cursor that it still thinks it owns, but does no longer, network problems, etc. The main point here is that a JDBC connection CAN go corrupt. And if it does, there is nobody (when using the default configuration) to check on it and heal/remove it.
So, we isolated the problem up to the point that we knew that some of the connections were going corrupt. Since we cannot control which jdbc connection goes to which thread, the probability of the faulty connection to be used is pretty much the same as for any other connection to be used: HIGH
Connection Validation: Validation Method – Table
Armed with the information that JDBC connection can be corrupted, we needed a solution for a cleaning process. So i came across JDBC Connection Validation. Basically, this was designed with a purpose, and that is to make sure that faulty connections are being used as less as possible before being removed from the pool.
Unfortunately, there is not too much information out there on the topic, and if there is, none of it is really “complete”. After starting on Jagadish’s Blog, i was soon on the right track of configuring the connection validation. The idea is as following:
- Before using a connection from the pool, a test query is sent over the connection to the database. The table to be queried is chosen by you
- If the connection is faulty, it will be removed from the pool, and the next one will be used (removal of the connection will not add a new one instead – only in the case that the pool is empty)
- As long as the connections are faulty, each of the connections will be tested the same way as in step 2.
- Once the last connection has been marked as faulty, a fresh new connection will be added to the pool
In the worse case scenario, dealing with 200 faulty connections, you would go over 200 connections to run a statement, which of course, brings a lot of overhead. Therefore, you can always enable the following parameter:
Connection Validation – On Any Failure: Close all connections
By doing this, whenever a connection validation fails, all remaining connections, regardless of their state (valid or corrupt, we cannot know until we do a check) will be DROPPED. Now, don’t jump on setting this immediately. Of course this comes at some cost, specially if you have prepared your Glassfish Server for Production by following some of the awesome hints described here: http://weblogs.java.net/blog/sdo/archive/2007/12/a_glassfish_tun.html
Speaking of databases, it’s quite important in glassfish to use JDBC drivers that perform statement caching; this allows the appserver to reuse prepared statements and is a huge performance win. The JDBC drivers that come bundled with the Sun Java Systems Application Server provide such caching; Oracle’s standard JDBC drivers do as well, as do recent drivers for Postgres and MySQL. Whichever driver you use, make sure to configure the properties to use statement caching when you set up the JDBC connection pool — e.g., for Oracle’s JDBC drivers, include the propertiesImplicitCachingEnabled=true MaxStatements=200
When dropping a connection, you will drop it’s entire cache, which, in a highly performant environment can be costly. Another thing you are dropping, is the dedicated connection to the database server, ergo the cursors you were using, and all optimization that was done for specific queries under those cursors. So think twice if you afford dropping all conections!
Connection Validation – Table name
Since we were using ORACLE, the fastest table to use for validation would be DUAL. I have tracked the v$sql views of my Oracle instance to see which query is being run for validation, and came to the conclusion that the validation is done by issuing a:
select count(*) from DUAL
Connection Settings – Validate at most once
Now this was the tricky part. Do not let yourself get fooled by “validate at most once 0 (zero) seconds” Not enabled does not mean not enabled. It means that the validation will be done every time the connection will be used. In a system under heavy load, this could be happening several times/second/connection. I specifically checked on that by doing a trace on the v$sql view, and following the number of executions.
Now, in order to avoid this, make sure you set the value to a relevant (to your business case) number. Setting it to 60 seconds for example, has the following meaning:
- The first time the connection will be used, a validation will take place. If the validation fails, the next one will be used, as described above.
- After the connection has been validated, it will be used by all requesting threads for as long as 60 seconds, regardless if inbetween the connection goes corrupt. For 60 seconds, this connection will be marked as VALID
- After 60 seconds, regardless who requests the connection (it could be the same thread that started the validation in the first place), a new validation will take place.
We decided to go with the following parameters, which could be the ones you need (or not) depending on your business cases:
Glassfish Connection Validation Settings ( all to be found in the administration console>connection pools> advanced)
Statement Cache Size: 200 (that means 200 statements will be cached / thread – watch out with this one, don’t set it too high if you are Storage bound)
Wrap JDBC Objects: Enabled
Validate at most once: 60 seconds
Connection validation: required
Validation method: table
Table name: dual
And this is how the connection parameter looks in the domain.xml file, if you want to set it manually:
<jdbc-connection-pool validation-table-name=”DUAL” max-pool-size=”200″ datasource-classname=”oracle.jdbc.pool.OracleDataSource” pool-resize-quantity=”10″ res-type=”javax.sql.DataSource” steady-pool-size=”50″ statement-cache-size=”200″ name=”myPool” is-connection-validation-required=”true” validate-atmost-once-period-in-seconds=”60″>
I loved dealing with this one!