When java object count becomes a bottleneck CPU starts screaming
There is no typical distribution graph, and there is no typical explanation for bad performance. It is so much application and technology related, that you could probably write a small encyclopedia on the subject. One thing though that is available, are the starting points. And those are pretty much the same, giving you the hints where to look into for your performance problem.
I would organize them as following, descending by their priority:
- JVM Configuration and Garbage Collection Strategy
- Thread Pool Configuration
- JDBC Pool Configuration
- Database Server
- I/O usage
The list can of course continue, but it would be detailing of the above points.
As most of the performance experts in the market keep saying, JVM is responsable for 80 % of the performance problem of an application. Now, it is not called a Machine (java virtual machine) for nothing…This thing is so complex, so powerful, and, with the right tools and knowledge, so tunable that can REALLY make the difference between those maximum 30 Transactions and 300 Transactions / second.
Let me get slowly to the point.
While running a load test the previous days, i have noticed that my users were sucessfully calling a method, with good response times to the limit of ~40 transactions / second. Everything above this limit was having longer response times, and by 55 Transactions per second, my server would start refusing further incoming requests.
First thought, after fighting for some time with tuning the Oracle Server, was the database. Each transaction was querying the database for a specific object, and then once more for the children of that object, and that 10 times in a request. That would translate into 100 querys / transaction…so i thought maybe i had a bottleneck on my DB. After monitoring the CPU Usage and the I/O usage, i did not notice any waiting times in servicing requests, so that took Oracle out of the discussion.
Ok, what could be next?
I took a look at my jdbc connection pool, which was proportionally configured to my http thread pool on glassfish. I ran the test once again, by closely monitoring the jdbc pool usage, and the thread pool usage. The peak usage was somewhere about 40 threads used at once, (and 40 jdbc connections), without having any qued request in the connector que. That was clearly telling me that the problem was not in handling and processing the requests.
Therefore, contrary to the usual approach (where i start monitoring the jvm first), i took a closer look to the garbage collection. I noticed that on top of my test (when all users were logged in and firing requests), the Heap was filling up amazingly fast, causing a lot of young garbage collections. Causing young garbage collections was not necessarily the problem, but i was pretty worried about the rate that my Young Generation was filling.
I decided to increase my Young Generation, so i allocated 3 GB to my JVM, and 1 GB to my new generation, in order to see what exactly was happening.
I was little surprised to see that on top of the test (peak), my 1GB young generation was filling up in less than 3 seconds!!! That was an allocation rate of 200 MB / Second. Taking a parallel look to the CPU, i noticed that it was constantly hitting the 80 % line, with a http thread usage of less than 10 %
This machine was so busy allocating objects…
I decided to analyse the garbage collection behaviour using HP Garbage Collection Tool, in order to have a closer, detailed look to the allocation rate of objects. And this is how it looked like:
This test has been ran for 10 times. You can see how it gracefully starts and ends, and the ~ 200 seconds peak period. Well, take any peak period and look on the left. The allocation rate is bigger than 200 MB ! That means that my CPU was busy allocating 200 MB / second, only to clear them right away in a Garbage collection that was started at most 4 seconds later.
Monitoring the Glassfish Server returned this report:
You can see how the CPU was 80 % busy at the peak in the left panel. The lower points, the blue ones, are the garbage collections. Lots of them…You can see how the heap behaved in the right panel.
And for those of you interested, this is how the response times graph looks like:
- the left y scale represents the average transactions / second
- first right scale is the average response times / request / second
- second right scale represents the maximum response time of a request in a second
- the x scale is the time scale
This is a sample taken from the main report, and shows the peak usage – this returned the following results:
And this is how the main report looks like, where you can see how at 55 Transactions / second the system hits the margin:
This is what Java world calls object cycling, and is defined as follows :
“objects are rapidly created and destroyed”
Object cycling can be detected in the heap either through a visual tool that samples memory very frequently (once a second or less) you will observe very narrow choppy marks in the heap or through garbage collection logs. The key indicator to look for in reading garbage collection logs is frequent minor collections that reclaim small amounts of memory. Consider that the performance impact of garbage collection on your application is the result of two factors:
* The frequency of garbage collection
* The duration of each garbage collection
Object cycling impacts the frequency of garbage collection and can be therefore be detected by calculating the frequency of garbage collection.
Next step? Identifying the cycling objects, and analysing them for improvement (maybe caching? ) But that is another story.
So there it was, explained, how crazy allocation of object can have a huge impact on Java performance, regardless of how powerfull your server is.
Hope you enjoyed my post,