Which Tool for a Large Database?

AMoney · September 5, 2020 10:37PM

Hi all!

I've inherited a database which has grown to a huge bloat, and needs to be shrunk. How big? Well, it's over 4TB on disk - and more than 45% is bloat / empty pages. IE, more than 2TB can be reclaimed ... which would really help on our AWS costs.

Note: This problem is compounded several times, since there were several copies of this "starter" database used, so there's actually about 12TB of disk space being used and almost 7TB of disk space that can be free'd up across all of the servers.

Running shrink is a 4 day procedure on a cloned / copied version of the system, and no reads/writes, which we obviously cannot do in production. Some of the tables are full of binary data (that's a whole different problem).

I do have an extended 12 hour maintenance window that I can shut down the server, export / copy / whatever to a new database, and drop the old database.

Any suggestions? Is there a Red Gate tool to help here?

-
A

Sarah B · September 7, 2020 2:36PM

Hi @AMoney,

Unfortunately, we don't have a tool that would be able to help in this scenario.

fhanlon · November 24, 2020 3:31PM

If this is still an issue, check where is the bloat. Often indexes contribute to this. You can try reorging or rebuilding indexes to see if this helps. Reorging an index is process that can be stopped and restarted later. Reindexing rolls back so this is different but at least this process can be done bit by bit. Stat with finding some of your large tables and checking out the index sizes.

rashmitabaliyarsingh · December 16, 2020 6:31AM

AMoney said:

Hi all!

I've inherited a database which has grown to a huge bloat, and needs to be shrunk. How big? Well, it's over 4TB on disk - and more than 45% is bloat / empty pages. IE, more than 2TB can be reclaimed ... which would really help on our AWS costs.

Note: This problem is compounded several times, since there were several copies of this "starter" database used, so there's actually about 12TB of disk space being used and almost 7TB of disk space that can be free'd up across all of the servers.

Running shrink is a 4 day procedure on a cloned / copied version of the system, and no reads/writes, which we obviously cannot do in production. Some of the tables are full of binary data (that's a whole different problem).

I do have an extended 12 hour maintenance window that I can shut down the server, export / copy / whatever to a new database, and drop the old database.

Any suggestions? Is there a Red Gate tool to help here?

-
A

There is less number of such tools and they are paid also so one solution for this is to convert your data in the format in which are comfortable as well as took less memory and there is two way for doing this manually and professionally.

fhanlon · December 16, 2020 2:10PM

Check why there is lots of space. Are there particular tables with indexes that are very large having lots of unused space. If so you can try to reorg indexes of ever drop unused indexes to free up space. There are lots of scripts available that will help you find space used by table as well as space used by indexes. Look at some of your larger tables first but look at tables that have indexes with large amounts of unused space as well.

Kashyapi · April 9, 2021 10:57AM

Are the first things that you'll have to check in this case is whether there is lots of space or not. If there are tables with indexes that have the largest slots then this might be one of the reasons behind the issue that you are facing.

You can simply go for the reorganization of the indexes in order to free up the required space.

atupuxi · June 18, 2021 8:07AM

Today's market is flooded with an array of Big database tools and technologies. They bring cost efficiency, better time management into the data analytical tasks. You need to choose the right Big database tool wisely as per your project needs.

Jeff Moden · July 17, 2021 3:15PM

The full answer is a bit more complicated but you could start with the following... make sure you read this whole thing before making any decisions and, most certainly, before you take any action.

1. Identify where the "bulk" is. You'll have two types of tables (and I'm speaking specifically about their indexes... hopefully, none of them are HEAPs): Those that you would consider to be large when compared to all the other tables and, of course, those other tables.

2. Pick the top 10 or say of the larger tables.

3. Determine the total size required for all the indexes for each table. Generally, this is the page_count of a given index divided by 128.0 times. You can get more sophisticated/accurate answers by taking things like the Fill Factor and current Page Density into account but that would take a bit longer to explain especially when it comes to things like "Trapped Short Rows".

4. Create a new file group and file (can be on the same drive letter or different drive letter) for each of the larger tables. Of course, there has to be enough room left on whatever drive you select according to Step 3 above.

5. For each of the indexes of a given table, do a CREATE INDEX using the DROP_EXISTING option with a different ON that points to the new file group and file.

6. Once step 5 has been completed for all the larger tables that you identified in Step 2, REBUILD "all" the other indexes from smallest (by page count) to largest. DO NOT USE REORGANIZE HERE.

7. Do a "shrink" where you just recover free space without brute forcing the size. It's sometimes amazing how fast this will go once step 6 is done.

8. If step 7 wasn't real successful, then do you normal brute force shrinks 10GB at a time so that if it gets interrupted, you have something to show for it.

Ok... so there you are with a nice shrunken PRIMARY file group and possibly several other file groups for the big tables. You can do one of two things and possible a mix of those two things:

A. Do the CREATE INDEX with the DROP_EXISTING thing to move everything back to the PRIMARY file group and then drop those temporary files and file groups

...OR...

B. Leave the data in the new file groups.

There's a serious advantage to leaving the new file groups alone and so you need to make sure that you've identified the correct drive and directory in Step 4 above. That advantage is that, in the future when you decide you need to rebuild the indexes on those large tables again, you can repeat the same steps... create yet another file group/file for each table, do the "rebuild" by doing a CREATE TABLE with DROP_EXISTING on to the new files groups and drop the old file groups/files. It's like doing an instant shrink with no worries.

Like I said, there are some nuances to all of this because the data IS large but I thought I'd at least introduce the overarching concept for you to consider. You might also want to look into some things like partitioning (I actually like partitioned views better than partitioned tables for a whole lot of reasons) so that when you do need to defrag something, it's not a huge, multi-TB monolithic monster that's going to require some serious downtime to pull off. This is especially true if you have anything that depends on the log file such as replication, AG, etc. Clustered Instances don't seem to have a problem with this but you might want to keep an eye on your quorum size the first couple of times you do something like this.

Last but not least, consider moving your large history/audit tables to a different database. In most cases, you don't immediately need such tables to "get back in business" in a DR situation. This is also why I like Partitioned Views instead of Partitioned Tables... you don't actually need to restore every bloody partition before you can start taking backups again in a DR situation. It will also allow you to make much smaller copies of your databases when you need such a copy for training, development, or whatever.

Jeff Moden · July 17, 2021 3:18PM

All you folks that claim that the market is flooded with tools to handle such things as what the op has requested... yeah... how about you post the names of some of the more popular ones and maybe even provide a link or two since you already know so much about them.

montykit · August 9, 2021 8:49AM

Hi,

I'm planning a new dwh project and i'm interested in SSDT as changemanagement tool that deploys my changes easily from develop to test to production. But, i'm a bit worried about some blogposts and articles I've read. As you may know a datawarehouse is a huge database with lots of tables and data. I've read that DACPAC's (that is being generated with SSDT(?)) results in a copy of the empty (only structure?) database to another database, apply changes, copy the data and finally rename the database (?).

This is not a desired behaviour in case of a huge datawarehouse project.

Are my conclusions right? Or has this been changed in the latest release?

------------------------------

We researched the best eyeliners out there. Whether you are looking for liquid, pencil, gel, or waterproof eyeliner, you will find the best picks here.

Which Tool for a Large Database?

Answers

Product Learning

Community Forums

Events & Friends

Simple Talk