Upgrading the SSC SQL Servers: Part 2

Since I wrote the first part of this blog series, SQLServerCentral.com (SSC) SQL Servers have been upgraded, and performance is much better than it was on the older hardware. In fact, all hardware bottlenecks have gone away. Before the move to the new hardware, CPU and disk I/O were constantly pegged, and now, both resources are more than comfortably keeping up with their load. Memory and network utilization had never been a problem with the old hardware.

As I mentioned in Part 1, we used to have two SQL Server instances running multiple databases on two different servers. These older servers had identical specs, as follows:

  • Two, 2GHz AMD Dual Core CPUs (a total of four cores)
  • 4GB RAM (using the /3GB switch)
  • Three locally-attached RAID 1 arrays (with system files, SQL Server binaries, MDF, LDF, BAK, and TRN files scattered among them)
  • Windows 2003 (32-bit)
  • SQL Server 2005 (32-bit)

While most of the databases ran on one of the two servers, one of the larger, and more problematic databases (due to performance issues) ran on the second server by itself. In essence, we were dividing up the total database load among two different physical servers. While we had two servers, they ran independently and were not clustered.

When the Red Gate IT staff selected the new hardware (with my input), they were limited to what hardware (and configurations) they had to choose from because the SSC servers are hosted with a large ISP. After much consideration, the new hardware selected included two new servers with the following identical hardware:

  • Two, 2.26Hz Quad Core Intel CPUs (a total of eight cores)
  • 24GB RAM
  • Two, locally-attached RAID 1 arrays (array 1 holds the system files and SQL Server binaries, and array 2 is used to make local backups, before they are moved off the server)
  • Two, RAID 10 SAN arrays (array 1 holds the MDF files, and array 2 holds the LDF files)
  • One, RAID 1 SAN array (used exclusively for tempdb)
  • Windows 2008 (64-bit)
  • SQL Server 2008 (64-bit)

Unlike the previous two SQL Server setup, the two new servers have been configured as an active/active cluster. Currently, all of the databases from both of the older server instances are running on the first instance of SQL Server on node 1 of the cluster, while the second instance of SQL Server, running on node 2 of the cluster, is not currently being used (other than for the MDW database used by the SQL Server 2008 Data Collector). This second instance will soon be put to use to support the Simple-Talk website, which currently is still running on older hardware and software, with a different ISP. This move will be made later this year. So, right now, all of the SSC databases are running on a single instance of SQL Server, instead of two instances, as before. Even with the extra load of running all of the SSC databases on a single instance, the new hardware can more than handle the total load.

The CPU and disk I/O issues we had with the older servers have been eased greatly with the new hardware. The doubling of the number of cores (plus their addition speed), has eliminated the CPU bottleneck. The addition of more and faster spindles, and the isolation of MDF files, LDF files, and tempdb, has substantially reduced disk I/O problems, eliminating that bottleneck.

In the part 3 of this blog series, I will discuss the type of database maintenance we do on the databases.

Advertisements

2 thoughts on “Upgrading the SSC SQL Servers: Part 2

  1. Brad,

    I think you might have overlooked a part of your CPU and DiskIO issues. Your old server had 4GB memory so with the /3GB switch your BPool would have been limited to roughly 2.6GB. If your workload routinely used more than 2.6GB of data, you would constantly be hitting disk and rotating data through cache. This increased DiskIO would also increase CPU utilization. The bump in memory to 24GB would certainly reduce your total IO to disk because you have a much larger cache in memory. More memory can be a relief for some IO bottlenecking if that bottleneck is related to constantly reading information that is being rotated in and out of cache.

    Also having a RAID 1 for data is good protection against a drive failure, but it isn’t good performance wise for handling the physical IO needs. Coupled with smaller memory, it would definitely increase the CPU utilization, and I’d expect that your performance counters for disk would show significant queuing under the older configuration.

    I am not trying to be critical, just pointing out things from my own experience that aren’t covered above.

  2. Thanks for the feedback.

    On the old server, memory was never a problem. In fact, the servers never used all of its available memory, or even close to it. The servers never experienced any issues with running out of plan cache, and forcing unnecessary query compiliations.

    On the new server, it still isn’t using much memory (relatively speaking), although it is using somewhat more than it did before. The upgrade to 24GB for the new servers was overkill, but I didn’t make that decision. Most of it is sitting there unused.

    The reason we went with RAID 1 in some cases, was because of limitations of the available hardware from our ISP (and our budget).

Comments are closed.