Don’t Make Dumb Mistakes by Duplicating Indexes

Just today, I was reviewing the indexing of a database used by a third-party application. As part of my review, I ran a script against it that is used to identify duplicate indexes. The script I used was written Paul Nielsen, the author of the SQL Server Bible; and Itzik Ben-Gan. The script is available here.

After running Paul’s script against the database to find duplicate indexes, I discovered that almost every table had a duplicate index. As I delved into this a little closer, I discovered what the designer of this database had done.

First, it looked like he created primary keys using non-clustered indexes on an identity column for most of the tables in his database. Second, he then added a clustered index on the same identity column for each of these same tables. By doing so, he ended up having two identical indexes on the same identity column, one clustered and one non-clustered. Instead, the designer should have created his primary keys using clustered indexes instead of using non-clustered indexes. This would give him the same end effect, but requiring only a single index, not two. Obviously, by having duplicate indexes on most of his tables, he was causing SQL Server a lot of additional work to maintain extra indexes that were not needed.

If you have never checked your databases for duplicate indexes before, you should do so, as you might find duplicate indexes that can be removed, helping to boost the performance of your existing databases.

Do You Verify Your Database Backups?

In the past, I have written and spoken a lot about SQL Server “best practices”. One that I emphasize over and over again is the need for DBAs to verify, regularly, the integrity of their database backups. However, I sometimes feel my advice is falling on deaf ears. When giving a presentation, I often ask how many DBAs in the audience regularly verify their backups. Invariably, few hands are raised. When I visit a SQL Server shop to review their database maintenance plans, I always ask the same question. Very few even know what I am talking about.

Why is this? Do some DBAs have a false belief that database backups are always perfect? Perhaps these DBAs don’t know the mechanics of to how to verify a backup? Maybe, they just don’t have the time to perform the verification? Most likely, they haven’t even given the issue any thought because they have not – yet – had a restore fail, and had to explain to an irate manager why their data is lost.

Backup verification is a vital part of a well-designed database maintenance plan. Fortunately, I learned this lesson very early in my career. Back in 1982, a time before hard disks were available for desktop computers, I worked for a computer retail store that performed its accounting on a desktop computer, storing the data on floppy disks. Every day, the owner of the company would religiously make backups of the floppy disks and then sit these backups on the floor, next to her desk. Also, every night, she would vacuum her office, keeping it nice and tidy to meet with customers. One day, one of the main floppy disks became corrupted, so she went to her backup floppy disks in order to restore her data. The only problem was that backup floppy disks had all been corrupted by the vacuum cleaner’s strong magnetic field. None of her backups were good. She had to start from scratch, reentering all of the lost data by hand.

It was a very painful lesson and ever since then I have been a fanatic when it comes to making lots of backups, and regularly verifying that I can restore them if needed.

I’d be interested to hear how regularly people here test the integrity of your database backups, how often you’ve encountered environments where test restores were never performed, and what excuses were given for this.

Check Out the Free Performance Analysis of Logs (PAL) Tool

Traditionally, analyzing Performance Monitor data has not been an easy task. It often involved viewing Performance Monitor logs in graph view, seeing what was happening, and hoping that your knowledge and background in SQL Server and the Windows OS provided you with enough information in order to help you to be able to pinpoint specific performance-related issues.

Thanks to SQL Server MVP Jonathan Kehayias, I recently learned about a tool that can greatly benefit DBAs who want to automatically analyze Performance Monitor logs. The tool is called “Performance Analysis of Logs”, or “PAL” for short. It was developed by Clint Huffman of Microsoft, and can be downloaded free from Codeplex at www.codeplex.com/PAL. The tool is free and is Open Source code.

In brief, the tool takes a Performance Monitor log file (.blg), analyzes it, and provides you with an HTML report that presents a graphical analysis of the data, including alerts for those counters that exceed a specified threshold. The thresholds for the counters are based on those recommended by various Microsoft product teams and Microsoft support.

The tool includes many features, and is customizable, allowing you to configure it to meet your specific needs. While it won’t help resolve all of your performance problems, it provides a great start, and I highly recommend that all DBAs download it and give it a try.

Below is a very brief overview of the tool.

Using PAL

The first step is to create a Performance Monitor log file, as this will be the source of the data that will be analyzed by PAL. In the example below, I am using the Reliability and Performance Monitor, and I have created a data collector called the “PAL Data Collector.” You can include as many counters as you want, but PAL won’t analyze all Performance Counters, only those that it knows about. More on this later. In addition, you will want to collect data over a representative period of time, such as one hour, one day, and so on. Keep in mind that collecting Performance Monitor logs can become resource intensive if you collect a lot of counters and collect the data for long periods of time. Once you have collected data, copy that log file onto a local desktop computer, this way, when PAL is running, it won’t negatively affect the performance of the server you are analyzing.

image

Now, it’s time to start PAL. The tool is wizard driven, so once the Welcome screen appears, click “Next.”

image

The next step is to tell PAL the location of the Performance Monitor log file you want to analyze. Notice (below)  that you can analyze an entire log file or a partial log file (based on time and data).

image

The “Threshold File” screen is probably the most important screen to complete correctly. Notice at near the top of the screen below, “Microsoft SQL Server 2005” listed next to “Threshold File Title.” This is actually a drop-down box where you can select different analysis templates.

image

Below is example of the analysis templates that are available. As you can see, PAL is a multipurpose tool and can be used for analyzing the performance of many different Microsoft products. Currently, there is both a Microsoft SQL Server 2000 and 2005 analysis template. Currently, there isn’t a SQL Server 2008 template, although you can use some of the templates with SQL Server 2008.

image

Once you have selected an analysis template, you can optionally click on the “Edit” button, and a screen similar to the one below appears. This screen lists every Performance Counter that is analyzed by this analysis template, and in addition, you have the ability to edit it. What I really find useful about this screen is that it lists all of the threshold figures used by Microsoft to determine if a particular counter is exceeding its recommended threshold or not. When you create your Performance Monitor logs, I highly suggest you collect all of the counters that are listed in the analysis template that you want to use, so you can take full advantage of the analysis ability of PAL.

image

Once you have selected an analysis template (SQL Server 2005 in my case), the next step is to answer some questions about your server. The questions that are presented depend on which analysis template you have selected. Below, notice that there are five questions that you need to answer. This data is used when the analysis report is created later.

image

Next, the wizard presents us with the Analysis Interval screen. Here, we tell PAL how we want to slice and dice the data. For example, if we want to see Performance Monitor counter analysis for every  60 second time period, or for every hour time period, we can. This way, you can see what was happening during the time interval you specify. If you choose “AUTO,” PAL will automatically slice all of your data into 30 time slices. As this is not a great choice in most cases, you will probably want to pick your own time period.

image

The Output Options screen (below) allows you to specify where the reports will be outputted to. You can use the default values, or change them as you desire.

image

The Queue screen shows you (see below) the script file that it will run in order to analyze your data. In most cases, you will just click on “Next” to continue.

image

Now we are ready to analyze the Performance Monitor log file. In most cases, you will want to use the default “Execute: Execute what currently is in the queue” option, and the analysis begins. If you have collected a lot of data, it might take some time for the analysis to complete.

image

Once the analysis is over, a single-page (and very long) HTML report is produced. The top part of the report (shown below) is a table of contents and allows you to quickly drill down into the area(s) you want to focus on. It also shows you those areas where alerts have been produced. As you might imagine, an alert is not a good thing.

image

If you scroll past the table of contents screen (assuming you don’t want to drill down right away), you come to the “Alert” section of the report (see below). This is one of the most powerful parts of the report. For whatever Analysis Interval you have selected, the Performance Monitor counters are summarized. For example, I set my Analysis Interval to 5 seconds, so in the figure below, you can see three 5-second intervals (the last interval is cut off). During each of these intervals you can see the alerts that were created. For example, in the interval starting at 6:26:35 PM, there were three alerts. In the next interval, there were 2 alerts, and in the last interval, 7 alerts. In other words, what PAL has done is to summarize the counters for the Analysis Interval, and based on the thresholds set inside of PAL, it has produced various Performance Monitor counter warnings, in addition to providing us with various data for each counter. If a performance counter was fine during this time period (as defined by the predefined counter thresholds, then it won’t appear in this report.

Blue alerts are informational, yellow alerts are warnings, and red alerts are critical. So if you see a lot of yellow or red alerts, this is where you want to focus your time, as these are strong indications where your server may be having a problem. In addition, you can drill down into this information by clicking on the alert (it is a hyperlink) and it will move you to the location on the page where there is detailed data about the alert that provides additional information you can use to help you identify and resolve the specific problem.

image

For example, let’s say that I want to find out more about the red alert for the first interval for “Page Life Expectancy < 300 Seconds,” which is shown in the above screen. To find out more about this alert, I click on the hyperlink and the following detail screen appears.

image

In the above detail section of the report, I can see a graph of this counter over time (in this case, just a few seconds), along with the minimum, average, maximum, hourly trend (mine is negative because I only captured a few minutes of data), and other statistical data.

If you don’t know what this counter means to you as a DBA, there is an explanation at the top of the screen that tells you that if this value is below 300 (which it is), then adding more memory to your SQL Server instance would most likely boost performance. Each alert in the report provides similar information, providing you with a “short course” in Performance Monitor counter analysis.

Summary

PAL can take a Performance Monitor log file, analyze it, and highlight those counters that exceed commonly accepted thresholds, helping you to determine what, if any, problems your server is experiencing. This information, along with other information you gather, can be used to help you determine what is happening with your server, and hopefully, providing you with enough information to correct it.

This short blog entry just shows you a very small part of what PAL can do for you. Hopefully, you see the potential power this tool has, and you will download it and give it a try.

How to Script a Profiler Trace for Use as a SQL Trace Data Collection Set for the SQL Server 2008 Performance Data Collector

In SQL Server 2008, a new feature called the Performance Data Collector was added. It has the ability to collect SQL Server-related performance data, store it in a database called the Management Data Warehouse, and then produce various performance reports based on the collected data.

By default, the Performance Data Collector includes three different data collection sets (Disk Usage, Query Statistics, and Server Activities) that are used to collect a wide variety of SQL Server 2008 performance data. One of the features of the Performance Data Collector is that you can create your own data collection sets, which allow you to decide what kind of data you want to collect and store in the Performance Data Collector’s Management Data Warehouse. For example, if you want, you can create your own custom data collection set to collect and store Profiler trace data. Once the data has been collected and stored in the Management Data Warehouse, you can then create queries or Reporting Services reports to analyze the data you have collected.

There are two ways to create your own custom data collection set to collect and store Profiler trace data. You can either write a custom Transact-SQL script from scratch that creates the custom data collector, or you can let Profiler write the script for you. Let’s see how Profiler can do this for us.

First, ensure that the SQL Server 2008 Data Collector has been properly configured and is running properly. Next, start SQL Server 2008 Profiler and load an existing Profiler trace definition, or create a new Profiler trace definition. Keep in mind that the SQL Trace Data Collection Set will be based on your trace definition, so choose an appropriate one.

clip_image002

Figure 1: Export a Profiler Trace to a Performance Data Collector SQL Trace Collection Set.

The next step is to export the Profiler trace definition into a Transact-SQL file that can be executed later to create the SQL Trace Collection Set. To do this, From the SQL Server 2008 Profiler, select “File|Export|Script Trace Definition|For SQL Trace Collection Set,” and a “Save As” dialog box appears, allowing you to name the file and to save it in any folder you prefer. Once the script has been saved, it can be opened in SSMS. For example, the Transact-SQL script that is created looks similar to this (it has been truncated because of its length).

clip_image004

Figure 2: Profiler can create a SQL Trace Collection Set for you automatically.

Before you can run this script and create the SQL Trace Collection Set for the Performance Data Collector, you need to make two changes in the code. If you look through the generated code, you will see these place holders:

‘SqlTrace Collection Set Name Here’

‘SqlTrace Collection Item Name Here’

You will need to replace these place holders with descriptive names. Once the code is changed and executed, you will see the new SQL Trace Collection Set in Performance Data Collection portion of SSMS, along with the three default collection sets that come with SQL Server 2008. At this point, you can enable the new custom collection set, and it will begin collecting Profiler trace data based on the criteria you specified in the Profiler template you used as the basis for creating it. If you like, you can create as many different SQL Trace Collection Sets as you want, with each one based on a different Profiler template.

Unfortunately, there are no built-in reports available to view the Profiler data you have collected. Because of this, you will need to create your own Transact-SQL scripts, or to create Reporting Service reports, to view and analyze the Profiler trace data.

One warning about creating your own custom SQL Trace Collection Sets, and that is they can use a lot of SQL Server resources and disk space when enabled. To minimize this impact, ensure that the Profiler trace definitions that you use to create your SQL Trace Collection Sets only collect the minimum number of events and data columns you need. In addition, only enable SQL Trace Collection Sets as needed; don’t run them all the time. If this is the first time you have tried this, you might want to first test this on a test SQL Server before implementing it on production server.