Traditionally, analyzing Performance Monitor data has not been an easy task. It often involved viewing Performance Monitor logs in graph view, seeing what was happening, and hoping that your knowledge and background in SQL Server and the Windows OS provided you with enough information in order to help you to be able to pinpoint specific performance-related issues.
Thanks to SQL Server MVP Jonathan Kehayias, I recently learned about a tool that can greatly benefit DBAs who want to automatically analyze Performance Monitor logs. The tool is called “Performance Analysis of Logs”, or “PAL” for short. It was developed by Clint Huffman of Microsoft, and can be downloaded free from Codeplex at www.codeplex.com/PAL. The tool is free and is Open Source code.
In brief, the tool takes a Performance Monitor log file (.blg), analyzes it, and provides you with an HTML report that presents a graphical analysis of the data, including alerts for those counters that exceed a specified threshold. The thresholds for the counters are based on those recommended by various Microsoft product teams and Microsoft support.
The tool includes many features, and is customizable, allowing you to configure it to meet your specific needs. While it won’t help resolve all of your performance problems, it provides a great start, and I highly recommend that all DBAs download it and give it a try.
Below is a very brief overview of the tool.
The first step is to create a Performance Monitor log file, as this will be the source of the data that will be analyzed by PAL. In the example below, I am using the Reliability and Performance Monitor, and I have created a data collector called the “PAL Data Collector.” You can include as many counters as you want, but PAL won’t analyze all Performance Counters, only those that it knows about. More on this later. In addition, you will want to collect data over a representative period of time, such as one hour, one day, and so on. Keep in mind that collecting Performance Monitor logs can become resource intensive if you collect a lot of counters and collect the data for long periods of time. Once you have collected data, copy that log file onto a local desktop computer, this way, when PAL is running, it won’t negatively affect the performance of the server you are analyzing.
Now, it’s time to start PAL. The tool is wizard driven, so once the Welcome screen appears, click “Next.”
The next step is to tell PAL the location of the Performance Monitor log file you want to analyze. Notice (below) that you can analyze an entire log file or a partial log file (based on time and data).
The “Threshold File” screen is probably the most important screen to complete correctly. Notice at near the top of the screen below, “Microsoft SQL Server 2005” listed next to “Threshold File Title.” This is actually a drop-down box where you can select different analysis templates.
Below is example of the analysis templates that are available. As you can see, PAL is a multipurpose tool and can be used for analyzing the performance of many different Microsoft products. Currently, there is both a Microsoft SQL Server 2000 and 2005 analysis template. Currently, there isn’t a SQL Server 2008 template, although you can use some of the templates with SQL Server 2008.
Once you have selected an analysis template, you can optionally click on the “Edit” button, and a screen similar to the one below appears. This screen lists every Performance Counter that is analyzed by this analysis template, and in addition, you have the ability to edit it. What I really find useful about this screen is that it lists all of the threshold figures used by Microsoft to determine if a particular counter is exceeding its recommended threshold or not. When you create your Performance Monitor logs, I highly suggest you collect all of the counters that are listed in the analysis template that you want to use, so you can take full advantage of the analysis ability of PAL.
Once you have selected an analysis template (SQL Server 2005 in my case), the next step is to answer some questions about your server. The questions that are presented depend on which analysis template you have selected. Below, notice that there are five questions that you need to answer. This data is used when the analysis report is created later.
Next, the wizard presents us with the Analysis Interval screen. Here, we tell PAL how we want to slice and dice the data. For example, if we want to see Performance Monitor counter analysis for every 60 second time period, or for every hour time period, we can. This way, you can see what was happening during the time interval you specify. If you choose “AUTO,” PAL will automatically slice all of your data into 30 time slices. As this is not a great choice in most cases, you will probably want to pick your own time period.
The Output Options screen (below) allows you to specify where the reports will be outputted to. You can use the default values, or change them as you desire.
The Queue screen shows you (see below) the script file that it will run in order to analyze your data. In most cases, you will just click on “Next” to continue.
Now we are ready to analyze the Performance Monitor log file. In most cases, you will want to use the default “Execute: Execute what currently is in the queue” option, and the analysis begins. If you have collected a lot of data, it might take some time for the analysis to complete.
Once the analysis is over, a single-page (and very long) HTML report is produced. The top part of the report (shown below) is a table of contents and allows you to quickly drill down into the area(s) you want to focus on. It also shows you those areas where alerts have been produced. As you might imagine, an alert is not a good thing.
If you scroll past the table of contents screen (assuming you don’t want to drill down right away), you come to the “Alert” section of the report (see below). This is one of the most powerful parts of the report. For whatever Analysis Interval you have selected, the Performance Monitor counters are summarized. For example, I set my Analysis Interval to 5 seconds, so in the figure below, you can see three 5-second intervals (the last interval is cut off). During each of these intervals you can see the alerts that were created. For example, in the interval starting at 6:26:35 PM, there were three alerts. In the next interval, there were 2 alerts, and in the last interval, 7 alerts. In other words, what PAL has done is to summarize the counters for the Analysis Interval, and based on the thresholds set inside of PAL, it has produced various Performance Monitor counter warnings, in addition to providing us with various data for each counter. If a performance counter was fine during this time period (as defined by the predefined counter thresholds, then it won’t appear in this report.
Blue alerts are informational, yellow alerts are warnings, and red alerts are critical. So if you see a lot of yellow or red alerts, this is where you want to focus your time, as these are strong indications where your server may be having a problem. In addition, you can drill down into this information by clicking on the alert (it is a hyperlink) and it will move you to the location on the page where there is detailed data about the alert that provides additional information you can use to help you identify and resolve the specific problem.
For example, let’s say that I want to find out more about the red alert for the first interval for “Page Life Expectancy < 300 Seconds,” which is shown in the above screen. To find out more about this alert, I click on the hyperlink and the following detail screen appears.
In the above detail section of the report, I can see a graph of this counter over time (in this case, just a few seconds), along with the minimum, average, maximum, hourly trend (mine is negative because I only captured a few minutes of data), and other statistical data.
If you don’t know what this counter means to you as a DBA, there is an explanation at the top of the screen that tells you that if this value is below 300 (which it is), then adding more memory to your SQL Server instance would most likely boost performance. Each alert in the report provides similar information, providing you with a “short course” in Performance Monitor counter analysis.
PAL can take a Performance Monitor log file, analyze it, and highlight those counters that exceed commonly accepted thresholds, helping you to determine what, if any, problems your server is experiencing. This information, along with other information you gather, can be used to help you determine what is happening with your server, and hopefully, providing you with enough information to correct it.
This short blog entry just shows you a very small part of what PAL can do for you. Hopefully, you see the potential power this tool has, and you will download it and give it a try.