Hi Foglight.ORGers.
I want to talk about our philosophy for performance management of Java applications. This is different to how some other vendors do Java monitoring, but we believe that our approach is fundamentally superior and I’d like to explain why.
This post is in two parts. The first (this one) talks about an alternative way to ours of monitoring Java. Some large vendors do it this way. We don’t think it’s the best way – I explain why.
In the second part of this post (it’ll appear in a couple of days), I talk about our approach and show the advantages of it.
Enjoy.
The ‘Other’ Way of Monitoring Production Java Applications
The fundamental requirements of production performance management are as follows:
- 24x7 monitoring with long-term data retention.
- Extremely rapid Mean Time To Resolution (MTTR) of issues.
- Low overhead (you don’t want your monitor to be part of the performance problem).
- Context – which issues matter most? Which ones do I focus on?
The first three of these are clearly vital but it’s the last I want to concentrate on here.
One way to do Java performance management in production is to focus on ‘let’s monitor the methods’. So you go into your application servers and instrument some methods, EJBs, Servlets, whatever you like, restart the app server (usually required for such approaches) and start collecting data. This approach has a number of major issues which might make such a toolset useful for pre-production monitoring but render it peculiar at best for production monitoring.
1. Unless you know in advance what the problem is, you’re probably going to take a few goes to figure out what to instrument. Each time you change what you instrument, you have to restart your app servers. OK for a development system – but for production? I don’t think so.
- “We fixed the slowdown by restarting the app server 4 times. Is that OK?”. Er, no.
2. There is no context. That EJB might be slow, but why? Who is calling it? Of what business services is it a part? What’s the end-user experience of those using it (indirectly)?
- My point is that the performance of an EJB’s only matters to the extent that it affects business services that matter.
3. Servlet, EJB and method-centric monitoring tends, to focus theadmin/monitoring user on which things take longest. This can mislead: isn’t it more important that the key business process/transactions function properly (and are fast enough)?
- Maybe a particular method is only called once (say on application startup). It still might be important to improve its performance, but it might not. Importance is a function of something other than execution time.
- Maybe there’s no SLA associated with the transaction which calls a particular, slow method. So maybe you don’t care if it’s slow. Or maybe you do care, but fixing it isn’t a priority if other, more important stuff is going on.
- Maybe the method is, indeed, associated with a key transaction, but either that transaction is still running fast ‘enough’ according to the SLA/OLA (I’m not going to get too hung up here about correct ITIL terminology) or maybe there are other methods, which take less time but are called more often so add up to more of the total time for that transaction.
OK, so I’ve criticised one possible viewpoint. So what does Foglight do differently?
Well, that’s the subject of the next post (can you tell I’m trying to build tension?).
While you’re waiting for the sequel, please let me know if you have any comments/complaints/objections/positive feedback on this post. I can be reached on:
hugh.mcevoy@quest.com.
Cheers,
Hugh