Home About Us SQL Interview Book Contact Us RSS
Articles
Tools
Code Snippets
Links
Tips & Tricks
FAQ
Resources
Articles
Code Snippets
Links
FAQ
Resources

Saravana Kumar
Vinod Kumar


Database Storage with SQL Server – Part I 

Database storage is one of the primary failure point in the database environment. Hence, making the storage architecture consistent and at the same time giving it maximum redundancy must be embedded within the design itself. When I call redundancy, I don’t specifically say implementing anything specific to RAID or SAN or other mechanisms. Redundancy must be achieved given your hardware and budget currently in hand – if it even means making just multiple backup copies at the simplest level.

When we talk about storage, the one thing that gets missed out is the need to plan for your system database like TempDB. More often than not we see DBA’s using the default configuration without understanding the consequences. Typically, tempDB handles large number of operations like row-versioning, online index operation, generating worktables while sort and other query operation, temp table creation etc. Hence, damage to this critical database will bring our whole instance down. Hence understanding the storage strategies from a high-performance and high-fault tolerance angle is very critical. This article will try to address some of these concerns if not all.

Storage requirement for Databases

To start with, in SSMS, we see a separate note as System Databases. Now there are databases created as part of adding additional database installation and configurations. The standard system databases include: master, model, msdb, tempdb and resource. Then we have the distribution (with replication), ReportServer, ReportServerTempDB (with RS installation) databases that get added based on the additional configuration we make to the system.

Although system databases are critical to the operation of SQL Server, they tend to be very small databases. Therefore, there is no need to determine their precise storage requirements. You SHOULD NOT configure the default database sizes and options – Change them based on your environment. Typically, larger enterprise environments have a set of standards for the location of system databases for all SQL Server installations. This set of standards provides a number of benefits, including the ability to run administrative scripts consistently across multiple installations.

Typically, the performance of system databases is not an issue because these databases are small and their active portion resides in memory. Reliability is the most important consideration for system databases because a catastrophic failure of these databases can bring our whole server instance down. All system databases are critical to the operation of SQL Server, not all of them will cause a catastrophic failure of your SQL Server in the event of corruption or other failures. MASTER, you are guaranteed to have a server downtime if this database goes corrupt. Should ensure that it resides on a redundant drive array and you should back up the master database regularly to ensure minimal data loss. Moving to MODEL database, though SQL Server runs without much problem, creating of new databases will not be possible if this database is corrupt. MSDB, another interesting database which doesn’t bring the server down in event of failure, but there are a lot of dependant services to this like the SSIS packages and replication via the SQL Server Agent service. These will seize to function properly.

Though we are talking about the various system databases and High-availability solutions for them a design typically involves a RAID or a storage area network (SAN) solution. However, you should avoid RAID 0 because it does not provide redundancy. RAID 1 is sufficient for the system databases and keeping them separate would help.

TempDB World!!!

TempDB as the name suggests is for temporary storage and is often overlooked aspect of configuration. As we quoted before this database will come to use when we have hash joins, cursors, table variables, temp tables, snapshot isolation, row versioning etc. Hence it is important for us to decide the space, location and performance of this database more than any other system database.

Now, there is no rule-of-thumb where we can give a one size fits all. This very much depends on whether you are using operations such as temporary tables, table variables, cursors, row versioning, Multiple Active Result Sets (MARS), and the SORT_IN_TEMPDB index option. Initially, you need to keep track on the growth of the tempDB database during peak hours and it is best to pre-allocate enough space for the tempdb database to prevent it from automatically growing too frequently and adversely affecting performance.

We can use the sys.dm_db_file_space_usage dynamic management view to monitor the disk space usage of the tempdb database files. We can also use the sys.dm_db_session_space_usage and sys.dm_db_task_space_usage dynamic management views to identify queries and structures that use a large amount of disk space.

From size to location, this is purely dependant on the budget, hardware and the database activity. TempDB is a write-intensive database and even when we are considering RAID scenarios we need to be careful. Like RAID 5 is particularly not suited for write performance and RAID 0 is bad because of its no redundancy option. Contrarily, if we are using a SAN solution, we must decide whether to store the tempdb database files on the server’s local drives or on the SAN solution. Heavy utilization of the tempdb database will generate a lot of network traffic; therefore, we should store the database files locally. However, there might be local storage limitations that might force us otherwise.

Some of the general high-level recommendations include: storing tempDB on its own dedicated solid-state drives, separating data and log files, creating multiple data files to take advantage of the I/O and multi-CPU machines.

Though with TempDB, we have just touched the surface, there is an interesting whitepaper on the TechNet site: http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/tempdb_capacity_planning_index.mspx

Storage requirement for User Databases

Configuring the best possible options for the user databases is vital for performance, scalable and highly-available systems. But before even we get started it is critical to understand the difference between the various disk drive technologies, such as enhanced integrated drive electronics (EIDE), Serial Advanced Technology Attachment (SATA), and small computer system interface (SCSI), Storage Area Network (SAN) and select the technology that is best suited to your requirement.

Another important consideration is the size of the transaction log. We must ensure that the log does not run out of free space because that will have an adverse effect on availability of the database. The size of the transaction log is affected by the type and volume of data modifications in our database. In general, the size of user databases depends on factors such as the structure of the tables and the number of rows in them, the number of data rows that can physically fit on a SQL Server page, the number of indexes, the fill factor used by indexes, and whether database snapshots are being taken etc.

It is often very difficult to change our storage design without incurring substantial downtime once we get them locked and system is in production. Therefore, we should carefully plan whether you are going to take advantage of database engine features, such as filegroups, partitions, the number of database files, and various database options. Therefore, you should take advantage of instant database file initialization, if possible, to improve the performance of auto-grow operations.

Note: Instant database file initialization reclaims the used disk space without filling that space with zeros; instead, the disk content is overwritten as new data is written to the files. As a result, instant database file initialization allows fast execution of file operations, such as creating a database file and increasing the size of an existing file. And this is available with Windows XP or Windows Server 2003 or later versions.

Other Storage requirements

The SQL Server 2005 setup process installs components in a separate directory that is based on the next available instance ID like MSSQL.1 and so-on. Although the physical location may vary depending on your configuration, we should install the various components using the default location on the system drive as it becomes easy from an administration point of view. The HDD requirement depends on our configuration we selected during install. The various components take approx space like: Database engine with data files and FTS (~130 MB), Analysis Services and data files (~35MB), Reporting server and Reporting Manager (~10MB), Integration Services (~10MB) and so-on.

Hence when we say installing on the default location, it must be taken into consideration when we talk about the various HDD requirements. What we have not mentioned is the pre-requisite installations like .NET Framework 2.0 and other pre-req files. SQL Server 2005 includes several full-text indexing enhancements that are mainly related to performance and integration with databases. Full-text indexing now supports multiple instances, and the database engine has been optimized for faster indexing and querying. Full-text catalogs are natively part of backup files and detached databases. Additional new features include thesaurus support, XML data type support, and accent sensitivity/insensitivity.

The full-text indexing process is I/O intensive because it consists of reading data from SQL Server databases and then writing to the index in the file system. Therefore, it is a good practice to create full-text catalogs on their own physical drive arrays. Although full-text indexes are backed up as part of a database backup, you should provide an appropriate level of disk drive redundancy to ensure availability and to minimize the impact on your high-availability solution in the event of hardware failure.

Trace and event log files are critical for monitoring and identifying trends in performance, and troubleshooting potential failures. Therefore, you should consider their requirements when designing your storage solution. SQL Server Profiler uses space from the operating system temporary directory. This default setting is usually sufficient. If you are running a trace for a long period of time, you should ensure that you have sufficient space on the operating system drive. To avoid problems, you can place the temporary directory on a drive that is not a system drive. To do this, you need to change the value for the TEMP environment variable. SQL Server Profiler requires at least 10 MB of free space. If the free space falls below 10 MB while you are using SQL Server Profiler, all SQL Server Profiler functions will stop.

Other log file includes the Windows Error Logs, SQL Server Agent Logs, Database Mail Logs, SAL Server Engine Error Logs etc. The size of the default Windows Event Log files is 512 KB, with the files being overwritten as required. I have seen this been changed to 2-3MB based on the requirement of retention period for these log data.

Next important component that needs attention from the storage engine is the backup operation. You need to determine the correct storage requirement for your backup files because it directly affects the speed of your backup and restore operations, and might impact the performance of online operations. Most enterprises back up data to a disk before archiving it to a tape device. This configuration has a minimal impact on the performance of your high-availability solution and it ensures that backups are completed more quickly. You can also back up databases to network shares or mapped drives, but you must evaluate the impact that this will have on your network and its reliability. You can have a separate network segment for such network traffic.

To calculate the space requirement, you must determine the number and size of the databases that are part of your backup strategy. You must also plan for future growth so that your backup storage location does not run out of free space. The amount of space required for your backups will also be affected by factors such as your database backup strategy, backup rotation, and database mode. You should not back up databases to compressed NTFS volumes. It is a common practice to keep the latest database backups on the high-availability solution so that you can recover your system in a minimal amount of time. Your DRP (Disaster Recovery Process) will outline the types of backup strategy you have and how to execute in event of a failure or natural calamity.

Ideally, your backup storage should be on a separate disk array, on a channel that is separate from your database files. Backup performance can also be improved by backing up to multiple disk drives, an appropriate RAID array, or even multiple backup devices on a single drive. With SQL Server 2005, we allocate a reader thread to each disk device that is used to store the database. A writer thread is allocated to each backup device. SQL Server 2005 can back up to 64 backup devices. This fundamentally improves the speed of the backup operation as we are doing the operations in parallel. The backup topic in SQL Server is quite interesting and will get into a separate article later.

Conclusion

In this article we just took a sneak preview to what and where all one needs to channelize energy when working with the storage system. Infact, we have just got started on this. I will follow this with a Part II where we will get into some of the more specifics of this topic.

Here are some interesting links for your reference:

INF: How to Move Tempdb to a Different Device - http://support.microsoft.com/kb/187824/

BOL: Moving System Databases - http://msdn2.microsoft.com/en-us/library/ms345408.aspx

BOL: Database File Initialization - http://msdn2.microsoft.com/en-us/library/ms175935.aspx

Webcast:  Understanding IO and Storage system with SQL Server 2005