Exclude Offline database from SQL Monitor

BillFinch · August 4, 2021 5:23PM

I have a database that is only online for one day a month to act as a source for finance reporting. The rest of the time it is offline. It appears that SQL Monitor is trying to access it every few minutes, and raises an error in the SQL Server log, even though I have disabled all DB specific alerting for that database. Is there some SQL Monitor process that tries to enumerate all of the databases on an instance?

Alex B · August 6, 2021 1:02PM

Hi @BillFinch,

Yes there are several processes that do this I believe. Disabling the alerting doesn't actually disable the data collection (and so doesn't disable the queries that are attempting to run against that database).

Is the error a Severity 21 error that you are seeing? If not can you let me know what error it is you are seeing in the SQL Server errorlog?

Kind regards,
Alex

BillFinch · August 6, 2021 1:07PM

Hi @Alex B ,

Yes, it is a Severity 21 error. It fires pretty much every 10 minutes. Here is the text of the two consecutive errors as logged:

Error 1

Date        8/6/2021 9:03:39 AM
Log        SQL Server (Current - 8/6/2021 9:03:00 AM)

Source        spid80

Message
Error: 615, Severity: 21, State: 1.

Error 2

Date        8/6/2021 9:03:39 AM
Log        SQL Server (Current - 8/6/2021 9:03:00 AM)

Source        spid80

Message
Could not find database ID 9, name '<REDACTED>'. The database may be offline. Wait a few minutes and try again.

Alex B · August 10, 2021 8:57AM

Hi @BillFinch,

Righto, that's something we've seen before where the new database configuration options sampler (which runs every 10 minutes) is having that error occur for some offline databases in some occasions.

As I understand it, from what the developers have commented on the internal issue SRP-13822 where we are tracking this, it is a SQL Server issue that is causing that error to appear (as it doesn't happen on all offline databases / in all cases) and they are considering whether and/or how to try and work around it in code.

In the meantime, to prevent this sampler from running you will need to use the ConfigurationFileEditor.exe (located in the SQL Monitor install directory, C:\Program Files\Red Gate\SQL Monitor\BaseMonitor\Tools by default).

Run the exe explicitly as an administrator (right click > run as administrator)
Click "load a configuration file" (this should make a backup copy, but you may want to also make a copy yourself)
Choose the RedGate.SqlMonitor.Engine.Alerting.Base.Service.exe.settings.config (located in %ProgramData%\Red Gate\SQL Monitor)
Scroll down to nearly the bottom and find the [Cluster].[SqlServer].[Database].[ConfigurationOptions] entry and check the box next to it, then click on the dropdown box that appears and choose "Disabled"
Then click the Disk icon at the top left to save the configuration file modification
Finally restart the SQL Monitor Base Monitor service

If you then wanted to revert the change you could either replace the modified file with the backup of it (after renaming the backup file appropriately) or you can use the ConfigurationFileEditor and go back and uncheck the box next to the sampler which will make it use the default setting (of collection every 10 minutes).

I will update here again when I have further information on the internal issue SRP-13822.

Kind regards,
Alex

BillFinch · August 10, 2021 6:44PM

Hi @Alex B ,

Thanks for the detailed explanation and workaround. I made the config change on the base monitoring instance responsible for the SQL Server where the offline DB lives and the errors stopped immediately. I'll keep and eye out for your updates

Thanks again!

Bill

Alex B · September 30, 2021 11:50AM

Hi @BillFinch,

The team have now modified the query for the database configuration options sampler to exclude offline databases which should avoid this error occurring and this is available in version 11.2.16 which they've just released. You can download it here: https://download.red-gate.com/checkforupdates/SQLMonitorWeb/SQLMonitorWeb_11.2.16.16124.exe

Kind regards,
Alex

pooyanism · June 13, 2022 6:01AM

Hi,
I have a same problem.

Here is the RedGate Monitor Alert sending every 10 Minutes:

Image: https://us.v-cdn.net/6029854/uploads/editor/pn/mr32pelbogj4.jpg

However, I don't have any database_Id = 16

Image: https://us.v-cdn.net/6029854/uploads/editor/ku/5whruld42r1t.jpg

Also, I don't want to make disable any alert from configuration.
I reset RedGate Monitor Service, but Alert message still comming!

Alex B · June 20, 2022 10:28AM

Hi @pooyanism,

The team have released version 12.1.22 this morning that makes a change to the cursor used in the query that is seeing this error, which will hopefully address this issue. You can download here: https://download.red-gate.com/checkforupdates/SQLMonitorWeb/SQLMonitorWeb_12.1.22.29294.exe

Kind regards,
Alex

JoeHarkins · August 15, 2022 6:46PM

I just upgraded to 12.1.28 and this error is back.
I dropped and restored a database and now every 10 minutes SQLMonitor throws the error similar to the above, referring to the database as though the name and id were the same.

marclen · August 18, 2022 1:50PM

I have the same Problem again.

We do daily Clones of Databases and before delete the old ones.

Now we got the same Error for many Clones (40Databases at Night) each 10 Minutes.

Is there a Workaround for this?

We use SQL-Monitor Version 12.1.29.33207

owain_gwynedd · September 7, 2022 11:01AM

Currently running 12.1.15.26810 and also seeing this. We have an AG cluster in pre-live and I have dropped and recreated the db's which has spawned these errors. Will 12.1.22 address this? The experience of the folks above suggests otherwise...

Alex B · September 8, 2022 2:05PM

Hi @owain_gwynedd,

It's possible updating will help as it has helped some users, but there are other users who describe a similar series of events (with AG and mismatch dbid) that are still seeing the issue even on 12.1.30, though we have not been able to reproduce the issue here locally yet. What OS and SQL Server versions is this occurring with?

Kind regards,
Alex

owain_gwynedd · September 13, 2022 12:51PM

Alex B said:

Hi @owain_gwynedd,

It's possible updating will help as it has helped some users, but there are other users who describe a similar series of events (with AG and mismatch dbid) that are still seeing the issue even on 12.1.30, though we have not been able to reproduce the issue here locally yet. What OS and SQL Server versions is this occurring with?

Kind regards,
Alex

Hi Alex, it's SQL2019 15.0.4236.7 (x64) on Windows Server 2019 Standard 10.0 (x64) Build 17763.

Ken_B · September 23, 2022 6:32AM

HI @Alex B

I am currently experiancing the same issue.

I have recently upgraded to 12.1.32.1452. Today we had dropped two databases and recreated them on the same instance. Since doing so we are recieving the above error message for both databases every 10 minutes. It looks like its trying to find the database by id, but the id has changed when we restored it. Is there a way to stop it looking for this ID?

I tried the above mentioned config change but it continued to notify us. I have changed the alert to ignore severity 21 in the meantime, this has stop those notifications while i find a solution.

Thank you
Ken

Alex B · September 27, 2022 3:23PM

Hi @Ken_B,

Were your databases a part of an AG? And what SQL Server version is the instance they are on? If it's not the SQL 2019 version Owain mentioned then updating may help.

Restarting the target SQL Server instance seems to work. We also had one customer who just said said:

I now tried to rebuild and join the database with deleting both copies of the database before restore instead of restore with overwrite and the error is gone even within Redgate Monitor. It seems to be a bug in SQL

I'm not sure if their prior case is similar to what anyone else had done and whether this may help. This seems likely since the cursor used should be recreated each time the sampler runs.

Kind regards,
Alex

Sire · November 22, 2022 1:35PM

Has there been any progress on this? We got this error now when we deleted and then restored a database (with a gap of perhaps a minute or two). We cannot restart the SQL Server because it's running in production.

jpomfret7 · December 10, 2022 9:15PM

Hi folks,
This is happening for me on 12.1.46.6959 - any updates on when this will be resolved. I've had to create a dummy database which stops the errors.

Alex B · December 14, 2022 3:52PM

Hi all,

We are still looking into this issue but have still unfortunately not been able to reproduce it. It does appear to now only be happening for SQL Server 2019.

Currently there are a couple options depending on whether you don't want the error raised in the SQL errorlog at all, whether the problem is just the alert getting raised every 10 minutes and whether you still want the sampling for other entities to continue

If you don't want the error raised at all and you don't mind other entities not being sampled, then disable the sampler (shown above in this thread)
If you do want other entities to be sampled still, you can perform the above, but then modify the XML in the base monitor config file similar to what is shown here in step 4b by adding the clusterName and sqlServerName properties to it to specify the entity you want to disable it for, which will allow the other entities to continue being sampled.
If the alert is the only problem, you can add an regex exclusion to the root alert configuration for the "SQL Server error log entry" alert to ignore it. Depending on the specific error entry you are getting it would be like:
```
Could not find database ID \d+, name '\d+'. The database may be offline. Wait a few minutes and try again.
```

We are still looking into it trying to understand why it is happening. SQL Monitor doesn't reference the databases by ID it does it by name, so it seems there's potentially something getting mixed up in the system tables we're sampling, but as I said we haven't been able to reproduce the issue.

Kind regards,
Alex

Alex B · December 15, 2022 12:33PM

Hi all,

Could you please run the following T-SQL from SSMS on an entity that is currently seeing the issue and let me know if it causes the error to occur?

DECLARE @dbinfo TABLE
(
    [database_name] NVARCHAR(450),
    [name] NVARCHAR(100),
    [value_in_use] SQL_VARIANT,
    [type] SQL_VARIANT
);

INSERT INTO @dbinfo
SELECT DB_NAME(database_id) database_name,
       'mirroring_role_desc' name,
       mirroring_role_desc value_in_use,
       SQL_VARIANT_PROPERTY(mirroring_role_desc, 'BaseType')
FROM sys.database_mirroring;

SELECT * FROM @dbinfo

Kind regards,
Alex

marclen · December 19, 2022 7:41AM

Hi Alex,

On my Instance, there is no Error showing after running your Query.

But, the Error shows only while deleting and creating the same Database. Maby that helps you.

We drop and clone (Netapp SnapCenter) the Database with the same Name after some seconds.

Alex B · December 19, 2022 4:27PM

Hi @marclen

Thank you for confirming that, I've passed this back to the team and will update here when I have more to share.

Kind regards,
Alex

ROliva · December 23, 2022 10:03PM

Hello,
@Alex B We are currently experiencing this with one of our servers. The SQL Server Version is 15.0.4236.7, SQL Monitor version is 12.1.48.7553. Running the query you posted above did not return an error. This started happening immediately after renaming a database. I had to create a dummy database in order for the error to stop. SELECT * FROM sys.databases showed a gap in database_id #s (1-9, 11). The error messages were for database ID 10. Is it possible the application is polling sequentially from the MIN to the MAX DBID but it really should be grabbing the ID list first?

Alex B · January 5, 2023 9:48AM

Hi @ROliva ,

There are two parts of the sampler that is involved in the issue.

The first part is a CURSOR, which runs before the above, and only references the database name from a list of databases, this is a cut down version of what the that is doing:

DECLARE @db_name VARCHAR(256) -- database name
DECLARE db_cursor CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY FOR
SELECT name FROM master.sys.databases dbs
                    left join sys.dm_hadr_availability_replica_states replica_states
                    on dbs.replica_id = replica_states.replica_id
                    left join sys.availability_replicas as availability_replica
                    on dbs.replica_id = availability_replica.replica_id
                    where source_database_id IS NULL AND (replica_states.role <> 2 OR replica_states.role IS NULL
                    OR availability_replica.secondary_role_allow_connections > 0
OR availability_replica.secondary_role_allow_connections IS NULL) AND collation_name IS NOT NULL AND state = 0
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO @db_name

@FETCH_STATUS = 0
BEGIN
      SELECT DB_ID(@db_name) as database_id, @db_name as database_name

      SELECT name, physical_database_name, database_id, * from sys.databases
      WHERE name = @db_name
      OR physical_database_name = @db_name

      DECLARE @sql NVARCHAR(256)
      SET @sql = 'SELECT TOP 1 * FROM ' + @db_name + '.sys.database_scoped_configurations'
      EXEC sys.sp_executesql @sql

      FETCH NEXT FROM db_cursor INTO @db_name
END

CLOSE db_cursor
DEALLOCATE db_cursor

After that, the query in my comment above is run, so our queries are not specifically trying to poll in sequential database_id order. It also doesn't seem that is the issue since the problem can be "fixed" by restarting the target SQL Server instance, which wouldn't change the database_id's. There just seems to be something that is keeping the list of database_name and/or database_id from when the queries were first run so that when the database is deleted and recreated or renamed and that is now out of date, it is still trying to access the wrong database_id somehow. It seeems to only happen in that specific SQL Server version 15.0.4236.7 as well now (previously there were other versions, but changes we made to the CURSOR to have it recreate every time have fixed those).

Extra logging has been added in version 12.1.49 at the DEBUG level, so that will need to be enabled to get further information on what may be going on here. You can download version 12.1.49 here: https://download.red-gate.com/checkforupdates/SQLMonitorWeb/SQLMonitorWeb_12.1.49.7862.exe

To enbable the DEBUG loggign, please edit the following entry found in the MinimumLevel -> Override section of RedGate.SqlMonitor.Engine.Alerting.Base.Service.logging.jsonc (located at C:\Program Files\Red Gate\SQL Monitor\BaseMonitor by default), changing the value from "Information" to "Debug" as seen here and then restart the base monitor service:

"RedGate.SqlMonitor.Common.Utilities.SqlServer.ConnectionProviders.SqlConnectionProvider": "Debug",

It will look like this:

Image: https://redgatesupport.zendesk.com/attachments/token/y36PB5nliJOHJ8LHvxABrKQXK/?name=image.png

Then restart the base monitor service and let the issue recur a few times and then either find the logging entries and post them here or raise a ticket with us including the log files zip (from Configuration > Retrieve all log files) referencing problem ticket 280016 and the error "Error: 615, Severity: 21, State: 1. Could not find database ID ##, name '##'."

Kind regards,

Alex

Alex B · January 10, 2023 12:12PM

Hi all,

My apologies, when adding the DEBUG logging, there is already an entry for the logger we're looking for set to "Information" like this in the Minimum Level > Override section, so it will need to be edited from:

"RedGate.SqlMonitor.Common.Utilities.SqlServer.ConnectionProviders.SqlConnectionProvider": "Information",

To the new Debug level:

"RedGate.SqlMonitor.Common.Utilities.SqlServer.ConnectionProviders.SqlConnectionProvider": "Debug",

So instead of what I had instructed which added a new entry at the top:

Image: https://redgatesupport.zendesk.com/attachments/token/kRqh5zF0o3fnGhFZ5aB6l4Vf4/?name=image.png

You will need to edit the existing one from "Information" to "Debug" (removing the extra entry):

If you have already done the above before the post time of this comment, then please do this instead, I am just about to edit the above comment to reflect this so if doing this afterward this has been posted, the above should be correct.

Kind regards,
Alex

DonFerguson · March 22, 2023 12:31AM

I spent some time chasing this error by setting up an extended event to capture it.

Turns out that the command that seemed to trigger the 615 error was the following just before the Error was logged.

  BEGIN TRY
        IF DB_ID('AnExistingDB') IS NOT NULL
            SELECT @dbName [database_name],
                   name,
                   value [value_in_use],
                   SQL_VARIANT_PROPERTY(value, 'BaseType') [type]
            FROM [master].sys.database_scoped_configurations;

        INSERT INTO @dbinfo
        EXEC sp_executesql @per_db_query;
    END TRY

So that error made absolutely no sense; nevertheless I kept on getting the Error: 615, Severity: 21 logged every 10 minutes like clock work.

I then tried to reproduce these errors by executing the same sets of commands in SSMS. To no avail, I could not reproduce it in SSMS. I then found a post online from someone else who was experiencing these Error: 615, Severity: 21 errors and they mentioned that they stopped after running DBCC FREEPROCCACHE on the server that produced them.

I went ahead and gave that a try and ran DBCC FREEPROCACHE on the server that was throwing these errors. Low and behold the error stopped.

So what does that mean? It does sort of explain why I couldn't reproduce the issue in an SSMS window, as it was likely running a cached procedure. Earlier in this thread it was mentioned that restarting the SQL Server instance would stop the problem, and I did indeed observe that as well. But while running DBCC FREEPROCACHE is not something we like to run in production, it's far less disruptive than restarting the SQL Server instance.

While this doesn't necessarily fully explain why this is a problem in the first place, I would surmise that there is likely some sort of Microsoft bug, may be triggered from dropping databases combined with the application accessing a cached procedure that may potentially be trying to access a DB_ID that no longer exists.

I hope this helps.

Alex B · March 22, 2023 1:55PM

Hi @DonFerguson,

Thank you for that! I have passed the information on to the developers.

Kind regards,
Alex

GDodd · March 22, 2023 9:59PM

We dropped a database yesterday and then started getting these alerts. We were able to run dbcc freeproccache and alerts have stopped (fortunately not much in the cache on that server anyway). Thanks @DonFerguson for the suggestion.

DonFerguson · March 22, 2023 10:08PM

I'm glad to hear that it worked for you as well @GDodd .

MWalton · March 27, 2023 4:04PM

Hi,

I ran into a similar issue this past week after restoring a database from Veeam to our SQL Server 2019 Availability Group. We are using RedGate SQL Monitor Version 12.1.60. Similar to others, I saw the error every 10 minutes. I temporarily disabled my alerting for this error until today when I stumbled on this thread.

Here is how I fixed it with help from this thread:

1) I used the SQL Server error log to identify the spid for the last "Could not find database" error.

For example,
"

Source spid85

Message

Could not find database ID 84, name '84'. The database may be offline. Wait a few minutes and try again.
"

2) I looked at the activity monitor from SSMS and identified the current login for that SPID, which unsurprisingly was the SQL Monitor domain login used to connect to the SQL Server AG. More useful was that I could identify the database that was in use as well, which in this case was 'tempdb'. (I assume this is the default database used by the application to gather stats for the instance...pardon me if this is common knowledge).

3) I then executed the following database-only equivalent to they system-wide DBCC FREEPROCACHE Note: the number in parentheses is the tempd's dbid from sys.sysdatabases:

DBCC FLUSHPROCINDB (2)

4) I waited for the next 10 minute interval to come and I no longer saw the error message.

5) I re-enabled my alert for the error 21.

I hope this helps others who have seen this with SQL Server 2019.

Good luck,
Matt

DonFerguson · March 27, 2023 4:54PM

DBCC FLUSHPROCINDB (2) is definitely a better option than DBCC FREEPROCCACHE. Great tip @MWalton. I'm waiting for this to happen again on my end so that I can capture the plan handle, and run a SELECT * from FN_GET_SQL (Handle); to confirm the actual SQL causing the issue.

Alex B · March 29, 2023 2:13PM

Hi all,

The team have just released version 13.0.2 where they added OPTION(RECOMPILE) to the sampler query which should make it avoid using cached versions of the proc, which in theory will avoid this issue.

You can download the latest version here: https://download.red-gate.com/checkforupdates/SQLMonitorWeb/SQLMonitorWeb_13.0.2.12294.exe

Kind regards,
Alex

Exclude Offline database from SQL Monitor

Answers

Product Learning

Community Forums

Events & Friends

Simple Talk