Home      
  
   Discussion Forum      
  
   *Blog      
  
   www.quest.com      
  
Welcome Register | Login
Jan 27

Written by: Foglight R&D Team
1/27/2009 12:57 PM

My name is Geoff Vona. I am the director of development for Foglight core. I've been working on Foglight in various capacities for the last 5 years.

Foglight is full of great features. Unfortunately, some of those features are easy to miss. We add so many little things that it is hard to get the word out. In this article, I'll be talking about some Foglight "dark corners" related to performance. These features are handy for monitoring the monitor.

Tidbits on the Administration Dashboard

The Administration dashboard is the front end for all Foglight administration. You can access it from the Homes section, or using the cleverly named Administration->Administration path under Dashboards.

The Current Statistics view has a few key bits of information, specifically

  • The number of current agents. This gives you a rough idea how much data is coming into your server. If you have 0 active agents, you probably have a connectivity issue between agents and server. If you have fewer agents than you think you should, then you should click on the number to figure out which agents aren't connecting.
  • The number of rules. This gives you an idea how much business logic is running on your server. This number is proportional to the number of cartridges you deploy, so there isn't much you can do to change it. But it is a good general indicator of environment complexity. The more rules you have, the more work the server is doing to analyze data and fire alarms.
  • The number of users. This tells you how many users are currently logged in, and how many have logged in total. This gives you an idea how many people are accessing Foglight. If users are complaining about slow response, and this number is large, then you might have an issue with the number of users. You might want to consider using a federation to separate users from data processing.

In addition, you can tell at a glance if your license has expired, or is about to expire. You can also tell whether you're looking at a federation master or child.

Licenses that expire can cause odd behaviour. As for federation, a federation master has a different performance profile from a child. Child units focus on data processing as well as serving user requests. Federation masters synchronize with child models, and return data on demand. As a result, masters require more memory than children. Knowing whether you're looking at a master or child is critical!

Foglight Configuration Without the Hassle

The Administration page contains a link to the current Foglight configuration. It can be found under Setup and Support

As the dwell label indicates, this page contains all the details of foglight configuration - ports, mail, JVM settings, database. All the same information is available in the foglight.config file in FGLHOME/config, but this allows for access to the configuration if you do not have access to the local file system.

Worth noting: the JVM settings that appear in the configuration dashboard and foglight.config are not the full set. By default, the Foglight launcher sets a number of JVM settings. To see what is actually running on a JVM, you can look in the log file and see the settings (since 5.2.2). Here is an example from my local Foglight install - these options are all default, I have nothing set in my foglight.config.

2008-12-31 11:15:34.255 VERBOSE [main] com.quest.nitro.startup.FoglightServer - System Information: windows ia32 5.1, Java 1.6.0_06 (32 bit).
JVM: Sun Microsystems Inc. Java HotSpot(TM) Server VM 10.0-b22
Host name: tor017820.prod.quest.corp
Host architecture: ia32
Process ID: 5408
Running as daemon: false, Running as service: false
VM Options:
    -dsa
    -da
    -Djava.net.preferIPv4Stack=true
    -Xrs
    -XX:+UseAltSigs
    exit
    -Dsun.java.command=org.jboss.Main
    -Xms1024m
    -Xmx1024m
    -XX:MaxPermSize=256m
    -Djava.endorsed.dirs=c:\Work\Sources\catalyst\fglcore\build\dist\lib\endorsed
    -Djava.awt.headless=true
    -Dsun.rmi.dgc.client.gcInterval=86400000
    -Dsun.rmi.dgc.server.gcInterval=86400000
    -XX:+UseConcMarkSweepGC
    -XX:+CMSClassUnloadingEnabled

Take note of the -Xms/-Xmx settings, the ConcMarkSweepGC setting, and the sun.rmi setting. These options are frequently added by users, even though there has been no need to do so since 5.2.0.

Foglight has Performance Dashboards

The Administration dashboard has a link to a Server Performance Overview under Tooling and Diagnostics.

The Server Performance Dashboard set is fairly big - too big for a humble blog entry. :)

Take note of the pulldown. You can use this dashboard for every server in a federation. This means if you start at the federation master, you can review performance for every FMS. Handy!

Briefly, here's what is available on the performance dashboards:

  • Overview: Intended as a starting point for performance analysis. This dashboard answers the questions What is the data load (inserts per 5 minutes, data service activity graphs), Is there enough memory (JVM memory graph), and Is the load on the server too high (server load graph). Spikes on the data service graph indicate some kind of data processing lull - too much data, or insufficient resources to process incoming data. A sawtooth on the JVM memory graph is normal, but if the amount of memory freed by a garbage collect (ie the height of the sawtooth) is small, you may need more memory on your system.
  • Agents: Designed to answer the question What agents am I running. By selecting agent types in the tree, a graph showing the number of agents as a function of time is plotted. Look here for surprises - maybe someone added a bunch of agents that you don't know about, or maybe old agents haven't been properly removed.
  • Connectivity: Summarizes database connection state from the perspective of the FMS. The main concern here is to make sure that there are enough JDBC connections to service the requests.
  • Database: Works for MySQL only. Summarizes database performance, answering the question What is the database doing.
  • Java Virtual Machine: Full details on what is happening with the JVM. This dashboard is not useful unless you are performing detailed JVM tuning for the different memory spaces.
  • Topology and Agent Manager: This is a two-part dashboard. The first part answers the questions How big is my model and Is my model stable by showing the number of objects in memory, and the number of changes on those objects. Model changes should be correlated with agent changes. Frequent model changes could be a sign of model instability. A large or growing number of objects could indicate a need to expand the amount of memory on the FMS. The second part of this dashboard answers the question Can the server keep up to the data by showing how much incoming data is being skipped.
  • Server Load: This isn't much different than the bottom of the Overview tab. Can be safely ignored.
  • Rulette and Topology: This shows how many rules are running (rulette means rule instance). A lot of rulettes, especially as a function of the number of objects, might indicate you're trying to do too much on your server. Maybe some old rules are still running?
  • Messages and Data: Intended to show details on the server's data handling. It shows the number of skipped messages and number of discarded metrics. The graph in the bottom right is key, as it shows how long data processing is expected to take. If that graph grows over time, the server simply cannot keep up with the load from the agents.
  • Derivation and Query: This is a mix of graphs intended to answer the question Are Metrics Expensive. It shows how many derived metrics are running in the system, how many metric evaluations have occurred, and whether we're finding metrics in memory. Metrics are expensive if there are a lot of derived metrics compared to the number of topology objects, or if the memory settings are such that no metric history is being kept in memory.

Foglight has a Performance Report

Obviously there is a lot to learn about Foglight performance. The good news is that you can share all this detail with others, including Quest, using the performance report. In Foglight 5.2.4, you can use the Run Reports option to run a report immediately.

If you're using a version earlier than 5.2.4, you'll need to set up a schedule and run the report. I recommend the Frequent schedule, and limiting the number of results to 1. The report can be found in Foglight->Diagnostic->Performance, and is either called Management Server Performance Report (5.2.4 and later), or *Performance Report (5.2.3 and earlier).

This report contains all the information in the dashboards. If you are having a performance issue, I highly recommend that you include this PDF along with the support bundle.

Under the Covers - Foglight Self-Monitoring Models

All of the data in the performance dashboards and performance report is created using data that is gathered by Foglight. Foglight monitors its own performance. The easiest way to look at the Foglight self-monitoring models is to create a new dashboard and look under Foglight in the Data tab, as shown below:

There is data on the database, JVM, and on each service running inside Foglight. This is a fairly broad topic that will be explored in a future blog. Feel free to explore the self-monitoring model yourself and create your own performance graphs. The dashboard I've built below shows a more traditional stacking area graph for JVM memory use, and hilights alarm counts and processing times from the alarm service.

Foglight Runs Rules on Itself

Foglight has a small set of rules that it runs against its data model to determine whether it has a problem. The set of rules can be found by going to the Manage Rules dashboard (Administration->Rules) and typing Core-Monitoring in the Cartridge filter field, as shown below:

There are eight rules in Foglight by default:

  1. Agent Health State: Fires if there are any agents in a broken state.
  2. Catalyst Data Service Discarding Data: Fires if the data service starts to discard data. This alarm indicates the server is overloaded and cannot keep up with the data coming in.
  3. Catalyst Database Space Checking: Checks to see whether the database is getting too big. The database size target is set using the DBSMon.MaxDatabaseSize registry variable, which by default is set to 2Gb. You'll probably want to set your target size larger.
  4. Catalyst Free Space Checking for Oracle tablespace: For Oracle databases, this fires when the tablespace size is too large. The relevent thresholds are in registry variables called DBSMon.WarningFreeTablespaceSize, DBSMon.CriticalFreeTablespaceSize and DBSMon.FatalFreeTablespaceSoze.
  5. Catalyst Memory Usage Checking: Fires if Catalyst is in danger of running out of memory, defined as 95% memory utilization. If this alarm fires and clears occasionally, you don't have a problem. However, if the alarm stays fired, or if it fires and clears frequently, then you need more memory.
  6. Foglight Agent Type License Checking: Fires if there are more hosts connected to Foglight than should be allowed by the license.
  7. Foglight Topology Size Limit Reached: By default, Foglight doesn't allow more than 10,000 of any individual object type. This is to protect against volatile, untuned topologies. This happens most often with JavaEE Request URL tuning. If this rule fires, then there is likely some agent tuning required to make the data less volatile. However, it is possible that there are more than 10,000 of a particular type. If that is the case, an override can be specified in the foglight.limit.instances registry variable. Best to specify an override for the 10,000 default per type, at least in the beginning.
  8. Remote Agent Managers State per Host: I honestly don't know what this does. Some dark corners remain dark. :)

Summary

Foglight collects a lot of data on itself. Many options are available for self-monitoring and performance analysis. I hope this blog has shed a little light on some of the performance-related dark corners in Foglight. 

Tags:

2 comments so far...

Re: Blog - Foglight Dark Corners - Part One - Performance

Geoff,

Great Blog - really useful information here, especially JVM settings/config. Look forward to your next entry!

Kevin

By on   1/28/2009 4:45 AM

Re: Welcome to the Dark Corners of Foglight - Part One - Performance

The "Management Server Performance Report" is awesome! One tip: by default, generated reports will be monochrome for laser printers. These reports are easier to interpret in color. To adjust the default, go to Configuration/User Preferences and ensure that the Themes-> Print: "Report (Color)" is set.

By BrianW on   1/31/2009 8:59 AM

Your name:
Title:
Comment:
Add Comment    Cancel  
 News

 

Gartner Positions Quest
as a Leader
in Magic Quadrant for
Application Performance
Monitoring

Get your copy of the report 

 

 Blogs
 Latest Release

Foglight v5.5.4
Download Product

 Quest SupportLink
 Related Communities
 Related Quest Tools
Home | Discussion Forum | Blog | www.quest.com

@ 2008 Quest Software, Inc. All rights reserved. | Terms of Use | Trademarks | Privacy | Contact Us