Online T-SQL Scripts and Copyright

There are thousands of Transact-SQL scripts available on the Internet. They provide a great way to learn how other people approach and solve the same kind of problems you have and, in many cases, will save you from having to “reinvent the wheel”. When I need a new script, the first thing I always do is to check out the sizeable script libraries on SQLServerCentral.Com,Codeplex.Com, and other SQL Server-related websites. If I can’t find what I need, then I usually resort to a Google search. If that turns up nothing, only then do I resort to writing it from scratch.

While these T-SQL script archives are a wonderful resource, they also have a few drawbacks. The issue of variable code quality has been well-discussed and you should always carefully evaluate and test any online scripts before using them. A lesser known, but equally important issue, is the one of copyright. Sometimes the situation is clear-cut. I have seen a few scripts that carry a copyright notice explicitly stating that you can’t use the script without first obtaining permission from the author. In a few, ideal, cases the script author will have included some form of notice stating that the code has been published with a free public license, such as the one described at www.fsf.org. In this case, you will know that using the code won’t get you into any potential legal trouble.

However, in the vast majority of cases, the scripts are published without any form of copyright notice. Most people assume, incorrectly, that these scripts are freely available for reuse. In the United States, by law, any creative work is automatically copyrighted by its author, even if the author does not specifically include a copyright notice within the work. In other words, you need to get express permission from the author to use their script.

In theory, if you reuse a script that does not have any copyright notice, you leave yourself, and the organization you work for, open to potential legal action from the script author, or from whoever owns the copyright for the script. Furthermore, making a small modification to a script does not necessarily protect you from copyright issues. If you base the code you write on someone else’s code, you still may be violating copyright law.

The issue of copyright can get particularly sticky if the script author does not actually “own” the script. For example, it is not uncommon, especially at larger organizations, for employee contracts to stipulate that any creative work that an employee produces, while on the job, belongs to the organization. Any DBA subject to such an agreement should not share scripts on the Internet without specific permission from their organization. The problem is that, in the absence of a clear copyright statement, there is no way of telling who owns the script or even if the script author had permission to post it.

Because there are so many grey areas in copyright law, I would like to encourage all T-SQL script writers to:

  1. Verify that you own the code you write and, if not, that you have permission to share it.
  2. Include a free public license with any code you publish.

This way, people using your script can rest assured they are complying with copyright laws. I’d like to hear what you think. What is the best source of online scripts? How do you deal with the dilemma of scripts without any form of copyright notice? Do you think T-SQL script writers should include a free public license with the code they share?

Summary of What PASS Members Want from the 2009 PASS Community Summit

PASS recently conducted a survey of its members, asking them what topics they would like to see covered at the 2009 PASS Community Summit. A total of 304 people responded to the survey, and the entire survey can be seen at the PASS Speaker’s Resource Page (you have to log on to the website to see the results). Given that the report exceeds 20 pages, I thought I would summarize the key results.

They include:

  • Database Administrator topics were the most sought after, with Database Developer topics coming in second.
  • Most of the attendees were seeking Advanced level topics, those that assumes a solid knowledge of technology, strong coding skills, and for DBAs with 4-6 years of DBA experience.
  • Sessions on SQL Server 2008 were the most desired, with SQL Server 2005 a distant second.
  • Best Practices- and Tips & Tricks-type sessions were highly rated.
  • The three most popular “database administration” topics that cover new features in SQL Server 2008″ included: Performance Data Collection, SQL Audit, and Policy-Based Management.
  • The three most popular “general database administration” topics included: Performance Monitoring and Tuning, DMVs, and Scalability.
  • The three most popular “high availability” topics included: Minimizing System Downtime, Supporting Mission Critical Applications, and Database Mirroring.
  • The three most popular “Database Architecture/Deployment” topics include: Virtualization, Database Storage Solutions, and Clustering.
  • The three most popular “Database and Application Development Features in SQL Server 2008” included: T-SQL Improvements, Table-Valued Parameters, and Filtered Indexes.
  • The three most popular “General Database Design & Development” topic included: Optimizing Queries and Execution Plans, Database Modeling/Database Design, and Application Development Best Practices.
  • The three most popular “Data Warehousing and Business Intelligence Features in SQL Server 2008” included: Reporting Services Enhancements, Report Design Enhancements in Business Intelligence Development Studio, and Integration Services.
  • The three most popular “General Data Warehousing and Business Intelligence” topics included: BI/DW Best Practices using the Microsoft BI Framework, Data Modeling and Schema Design for BI, and Planning, Developing, and Administrating Your BI Environment.
  • The three most popular “Analysis Services” topics included: Advanced Techniques (Tips & Tricks) Using Analysis Services 2005/2008, Building and Deploying Solutions Using Analysis Services in SQL Server 2005/2008, and MDX.
  • The three most popular “SSIS” topics included: Advanced Techniques for SSIS Package Development, Maximizing Your SSIS Performance, and Building and Deploying Solutions Using Integration Services in SQL Server 2005/2008.
  • The three most popular “Reporting Services” topics included: Report Deployment and Administration, Report Access and Delivery Features, and Designing and Creating Reports.
  • The three most popular “Professional Development” topics included: Leadership and Team Management Skills, Strategic IT Planning, and Project Management.
  • The three most popular “Communication” topics included”: Technical Presentations, Project Documentation, and Technical Writing.
  • The three most popular “Career Development” topics included: “Developing and Improving Your Career, Marking Yourself More Marketable, and Certification.

Like all summaries of reports, a lot of detail has been left out. If you are serious about submitting a speaker’s abstract, I highly recommend you review the entire report, as you will want to take this information into account when creating your abstracts.

If you are interested in becoming a speaker at the 2009 PASS Community Summit, be sure you get your abstracts in no later than April 10, 2009. After then, the call for speakers will be closed.

Tips for Submitting Speaking Abstracts to the 2009 PASS Community Summit

As most of you know now, the Call for Speakers for the 2009 PASS Community Summit is open from March 20, 2009 through April 10, 2009. Each year, the PASS Program Committee, which is responsible for selecting the sessions offered at the Summit, receives many more speaking abstracts than they can accept. If you want to increase the odds of being selecting, please consider the following suggestions:

  1. If you are not already a member of PASS, join right away. Membership is free and gives you access to the full contents of the www.sqlpass.org website.
  2. If you have not already done so, complete your PASSPort profile.Complete is as fully as possible, completing the Profile, Community Activity, and Work Experience sections. Not only will you need the URL to your profile when you submit a speaking abstract, your profile will be used extensively as part of the speaking abstract selection process. When you go to the Call for Speakers website, you will also have to reenter some of your biographical information in the Profile pages, and this may seem a little redundant. But it is important that you do so.
  3. Be sure you visit the Call for Speakers Resource page and read it carefully. It provides sample biographies, sample abstracts, tips for submitting your abstracts, and most importantly, the 2009 PASS survey that tell you exactly what type of content PASS attendees are looking for. There is no point in submitting an abstract on a topic that is of little interest to PASS members. This survey is long, so you will want to take some time to carefully review it all.
  4. Besides the Call for Speakers Resource page, you will also want to check out the Resources for Speakers page, which lists many different resources for helping you become a better speaker.
  5. Next, visit the Call for Speakers website and begin the submission process by filling out your speaker’s profile. Complete the profile accurately and completely.
  6. Something that has been added to the Call for Speakers website this year is the ability to view abstracts that have already been submitted. Before you add your own abstracts, you may want to see what others have submitted. If you find that ten people have already submitted similar abstracts, you may want to find a more original topic.
  7. As you select your topics, ensure that they are topics of interest to PASS members (see the survey), fall in your area of expertise, and most of all, are topics that you have a passion for.
  8. As you complete each abstract, be sure that you select the proper Core Track and Target Audience. In addition, as part of your abstract, be sure you clearly state what the learning objectives are for your session, and how you intend to meet these objectives. The more case studies and demos you can include, generally the better.
  9. After you enter your abstracts, you have the opportunity to go back and “fine-tune” them if you like. In fact, this is probably a good idea, as a fresh look at your abstract after a few days after originally entering it may help you identify weaknesses in it that you can fix. But don’t wait too long, as once the Call for Speakers ends on April 10, 2009, you won’t be able to make any changes.

Disclaimer: I am a PASS volunteer in charge of Speaker Management for the PASS Program Committee. While I have been involved in helping create the Call for Speakers website and the Call for Speakers Resource page, I don’t have any direct involvement in selecting abstracts. That task is done by separate PASS Program committee volunteer members.

Reasons Why You May Not Want to Use a Heap

I have started compiling a list of reasons why heaps aren’t usually a good choice when designing a database. Now, I didn’t say that heaps are always bad, but in most cases, I think they should be avoided (read the list to see why).

As I compiled the list, in some cases I have had the opportunity to verify that they are true, but in other cases, I have not had the time to verify them. I would like your input on the list. Is there anything on the list that is not true, or only partially true (under some circumstances, but not others), and what other reason are there to avoid using heaps that have I left out?

I look forward to your feedback.

  1. If non-clustered indexes are not added to a heap, then all queries against a heap will require table scans to retrieve the requested data. If the heap is large, then these queries will be very resource intensive and hurt SQL Server’s overall performance.
  2. Since the data in a heap is unordered, performing a table scan on a heap can cause a lot of extra I/O activity because inefficient random reads, not efficient sequential reads, are more the norm.
  3. While a non-clustered index can be added to a heap to speed up some queries, when the non-clustered index is non-covering, the use of a RID bookmark lookup is required. A RID lookup means that once the record(s) to be returned by the query are identified in the non-clustered index, additional reads (the RID bookmark lookup) must be made of the related rows in the heap, so that all of the data requested by the query is returned. This is not very I/O efficient, especially if many rows are returned. At some point, it may be faster for SQL Server to do a table scan than it is to use a non-clustered index when returning many rows. On the other hand, if the non-clustered index is covering, then the non-clustered can be used to immediately return the data to the user without having to lookup anything in the heap.
  4. If you want to create an XML index on an XML data column, a clustered index must exist on the table.
  5. If you want to create a spatial index on a spatial data column (GEOMETRY or GEOGRAPHY), a clustered index must exist on that table.
  6. If a heap has a non-clustered index on it (as the primary key), and data is inserted into the table, two writes have to occur. One write for inserting the row, and one write for updating the non-clustered index. On the other hand, if a table has a clustered index as the primary key, inserts take only one write, not two writes. This is because a clustered index, and its data, are one in the same. Because of this, it is faster to insert rows into a table with a clustered index as the primary key than it is to insert the same data into a heap that has a non-clustered index as its primary key. This is true whether or not the primary key is monotonically increasing or not.
  7. When data is updated in a heap, and the updated row is larger than the old row and can’t fit into the old space, a forwarding record is inserted into the original location that points to the new location of the page. If this happens a lot, then there is a lot of space wasted in a database maintaining the forwarding records. This also contributes to additional I/O activity as both the pointer, and the row, have to be read.
  8. Even if data updated in a heap is not larger than the old row (the updated data is smaller or the same size than the original data), updating a heap with a non-clustered primary key is slower than updating the same table that has a clustered index as the primary key. This is because updating a table with a clustered index is less write intensive than updating a heap with a non-clustered index as its primary key.
  9. If a row is deleted from a heap with a non-clustered index as its primary key, it is slower than deleting the same row from the same table with a clustered index as its primary key. This is because it takes more I/O to perform this task on a heap than on a clustered index.
  10. When data is deleted from a heap, the data on the page is not compressed (reclaimed). And should all of the rows of a heap page are deleted, often the entire page cannot be reclaimed. This not only wastes space, it contributes to fragmentation of the data pages within a database.
  11. If you take two identical tables, one that is a heap with a non-clustered index as its primary key, and a table that has a clustered index as its primary key, the heap with the non-clustered index will be substantially larger, wasting valuable space and increasing disk I/O.
  12. The ALTER INDEX rebuild and reorganize options cannot be used to defragment and reclaim space in a heap (but they can used to defragment non-clustered indexes on a heap). If you want to defragment a heap in SQL Server 2005, you have three options: 1) create a clustered index on the heap, then drop the clustered index; 2) Use SELECT INTO to copy the old table to a new table; or 3) use BCP or SSIS to move the data from the old table to a new table. In SQL Server 2008, the ALTER TABLE command has been changed so that it now has the ability to a rebuild heap.

Do You Verify Your Database Backups?

In the past, I have written and spoken a lot about SQL Server “best practices”. One that I emphasize over and over again is the need for DBAs to verify, regularly, the integrity of their database backups. However, I sometimes feel my advice is falling on deaf ears. When giving a presentation, I often ask how many DBAs in the audience regularly verify their backups. Invariably, few hands are raised. When I visit a SQL Server shop to review their database maintenance plans, I always ask the same question. Very few even know what I am talking about.

Why is this? Do some DBAs have a false belief that database backups are always perfect? Perhaps these DBAs don’t know the mechanics of to how to verify a backup? Maybe, they just don’t have the time to perform the verification? Most likely, they haven’t even given the issue any thought because they have not – yet – had a restore fail, and had to explain to an irate manager why their data is lost.

Backup verification is a vital part of a well-designed database maintenance plan. Fortunately, I learned this lesson very early in my career. Back in 1982, a time before hard disks were available for desktop computers, I worked for a computer retail store that performed its accounting on a desktop computer, storing the data on floppy disks. Every day, the owner of the company would religiously make backups of the floppy disks and then sit these backups on the floor, next to her desk. Also, every night, she would vacuum her office, keeping it nice and tidy to meet with customers. One day, one of the main floppy disks became corrupted, so she went to her backup floppy disks in order to restore her data. The only problem was that backup floppy disks had all been corrupted by the vacuum cleaner’s strong magnetic field. None of her backups were good. She had to start from scratch, reentering all of the lost data by hand.

It was a very painful lesson and ever since then I have been a fanatic when it comes to making lots of backups, and regularly verifying that I can restore them if needed.

I’d be interested to hear how regularly people here test the integrity of your database backups, how often you’ve encountered environments where test restores were never performed, and what excuses were given for this.

Do You Run Antivirus Software on Your SQL Servers?

I have been to some IT departments where the standard procedure is to install antivirus/antispyware software on all servers, no matter what applications are running on it. In addition, the default settings are left untouched, which means that virtually every byte of data is examined, in real time, before it is read or written to disk. This virus checking is in addition to any regularly scheduled scans of most of the files on the server’s drives.

If you have ever examined the amount of memory, CPU, and disk I/O resources used by antivirus/antispyware software, you will know that it is a performance hog and, on a busy SQL Server, can cause huge performance issues for your users.

So what do you do? Depending on your situation, you have one of three options:

  1. Run the AV software using its default settings, and buy big enough hardware to overcome any resource contention issues.
  2. Remove the AV software entirely, but take other measures to help ensure that viruses can’t affect your server.
  3. Compromise. Some DBAs remove the AV software from their SQL Servers, but remotely scan it from another server on a scheduled basis, and during a time when the server is not too busy. Other DBAs leave the AV software on their SQL Servers, but change the default settings so that the scans exclude .mdf, .ldf, .ndf, .bak, .trn, full-text catalog files, and any folders that include Analysis Services data.

If your SQL Server is running on a web server that is accessible from the Internet, then the first option might be the right choice. If your SQL Server is running on a closed network, then the second option might work fine. If your SQL Server is in a mixed environment, combined with other servers and end-users connected to the same network, then the third option might be best.

So, here’s my question to you. Have you given any serious thought as to the level of antivirus/antispyware protection you need for your SQL Server instances, and to the performance implications of the strategy you have chosen? If you have, tell us what you have done, and why you have made the choices you did.