Database clone lost when vhd not accessible

melucasmelucas Posts: 1 New member
Hello,

We recently encountered a problem during which we lost two database clones. Just before losing the clones, we can see the following error in the SQL Clone server event log. It seems to me that the agent lost the connection to it's vhd and then couldn't get the database back. Does this make sense? How could we prevent losing a database clone when this error occurs?

Log Name:      Application
Source:        SqlCloneServer
Date:          6/7/2018 11:45:05 AM
Event ID:      30855
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      SQLCloneSRV01
Description:
An error was reported by "AgentSvr01" (AGENT): RaygunErrorMessage { InnerError: null, InnerErrors: null, Data: [KeyValuePair`2 { Key: "HelpLink.ProdName", Value: "Microsoft SQL Server" }, KeyValuePair`2 { Key: "HelpLink.EvtSrc", Value: "MSSQLServer" }, KeyValuePair`2 { Key: "HelpLink.EvtID", Value: "5159" }, KeyValuePair`2 { Key: "HelpLink.BaseHelpUrl", Value: "http://go.microsoft.com/fwlink" }, KeyValuePair`2 { Key: "HelpLink.LinkId", Value: "20476" }], ClassName: "SqlException", Message: "Operating system error 21(The device is not ready.) on file \"D:\data\MSSQL12.MSSQLSERVER\MSSQL\DATA\clones\MyDB_00010379_dyr\VhdMount\Data\MyDB.ldf\" during SQLServerLogMgr::CheckLogBlockReadComplete.
Could not open new database 'MyDB'. CREATE DATABASE is aborted.", StackTrace: [RaygunErrorStackTraceLineMessage { LineNumber: 94, ClassName: "System.Data.SqlClient.SqlConnection", FileName: null, MethodName: "OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 380, ClassName: "System.Data.SqlClient.MyDBParser", FileName: null, MethodName: "ThrowExceptionAndWarning(MyDBParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 2684, ClassName: "System.Data.SqlClient.MyDBParser", FileName: null, MethodName: "TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, MyDBParserStateObject stateObj, Boolean& dataReady)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 228, ClassName: "System.Data.SqlClient.SqlCommand", FileName: null, MethodName: "InternalEndExecuteNonQuery(IAsyncResult asyncResult, String endMethod, Boolean isInternal)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 92, ClassName: "System.Data.SqlClient.SqlCommand", FileName: null, MethodName: "EndExecuteNonQueryInternal(IAsyncResult asyncResult)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 77, ClassName: "System.Data.SqlClient.SqlCommand", FileName: null, MethodName: "EndExecuteNonQueryAsync(IAsyncResult asyncResult)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 15, ClassName: "System.Threading.Tasks.TaskFactory`1", FileName: null, MethodName: "FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 12, ClassName: "System.Runtime.ExceptionServices.ExceptionDispatchInfo", FileName: null, MethodName: "Throw()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 40, ClassName: "System.Runtime.CompilerServices.TaskAwaiter", FileName: null, MethodName: "HandleNonSuccessAndDebuggerNotification(Task task)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 408, ClassName: "Dapper.SqlMapper+<ExecuteImplAsync>d__37", FileName: null, MethodName: "MoveNext()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 12, ClassName: "System.Runtime.ExceptionServices.ExceptionDispatchInfo", FileName: null, MethodName: "Throw()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 40, ClassName: "System.Runtime.CompilerServices.TaskAwaiter", FileName: null, MethodName: "HandleNonSuccessAndDebuggerNotification(Task task)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 321, ClassName: "RedGate.SqlClone.Core.Sql.SqlDatabaseAttacher+<Attach>d__2", FileName: null, MethodName: "MoveNext()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 12, ClassName: "System.Runtime.ExceptionServices.ExceptionDispatchInfo", FileName: null, MethodName: "Throw()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 40, ClassName: "System.Runtime.CompilerServices.TaskAwaiter", FileName: null, MethodName: "HandleNonSuccessAndDebuggerNotification(Task task)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 592, ClassName: "RedGate.SqlClone.Core.Sql.SqlCloneAttacher+<AttachDb>d__4", FileName: null, MethodName: "MoveNext()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 12, ClassName: "System.Runtime.ExceptionServices.ExceptionDispatchInfo", FileName: null, MethodName: "Throw()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 40, ClassName: "System.Runtime.CompilerServices.TaskAwaiter", FileName: null, MethodName: "HandleNonSuccessAndDebuggerNotification(Task task)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 115, ClassName: "RedGate.SqlClone.Agent.CloneRestorer+<AttachDb>d__9", FileName: null, MethodName: "MoveNext()", Raw: null }] }

Event Xml:
  <System>
    <Provider Name="SqlCloneServer" />
    <EventID Qualifiers="0">30855</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2018-06-07T15:45:05.387327300Z" />
    <EventRecordID>15764</EventRecordID>
    <Channel>Application</Channel>
    <Computer>SQLCloneSRV01</Computer>
    <Security />
  </System>
  <EventData>
    <Data>An error was reported by "AgentSvr01" (AGENT): RaygunErrorMessage { InnerError: null, InnerErrors: null, Data: [KeyValuePair`2 { Key: "HelpLink.ProdName", Value: "Microsoft SQL Server" }, KeyValuePair`2 { Key: "HelpLink.EvtSrc", Value: "MSSQLServer" }, KeyValuePair`2 { Key: "HelpLink.EvtID", Value: "5159" }, KeyValuePair`2 { Key: "HelpLink.BaseHelpUrl", Value: "http://go.microsoft.com/fwlink" }, KeyValuePair`2 { Key: "HelpLink.LinkId", Value: "20476" }], ClassName: "SqlException", Message: "Operating system error 21(The device is not ready.) on file \"D:\data\MSSQL12.MSSQLSERVER\MSSQL\DATA\clones\MyDB_00010379_dyr\VhdMount\Data\MyDB.ldf\" during SQLServerLogMgr::CheckLogBlockReadComplete.
Could not open new database 'MyDB'. CREATE DATABASE is aborted.", StackTrace: [RaygunErrorStackTraceLineMessage { LineNumber: 94, ClassName: "System.Data.SqlClient.SqlConnection", FileName: null, MethodName: "OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 380, ClassName: "System.Data.SqlClient.MyDBParser", FileName: null, MethodName: "ThrowExceptionAndWarning(MyDBParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 2684, ClassName: "System.Data.SqlClient.MyDBParser", FileName: null, MethodName: "TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, MyDBParserStateObject stateObj, Boolean&amp; dataReady)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 228, ClassName: "System.Data.SqlClient.SqlCommand", FileName: null, MethodName: "InternalEndExecuteNonQuery(IAsyncResult asyncResult, String endMethod, Boolean isInternal)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 92, ClassName: "System.Data.SqlClient.SqlCommand", FileName: null, MethodName: "EndExecuteNonQueryInternal(IAsyncResult asyncResult)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 77, ClassName: "System.Data.SqlClient.SqlCommand", FileName: null, MethodName: "EndExecuteNonQueryAsync(IAsyncResult asyncResult)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 15, ClassName: "System.Threading.Tasks.TaskFactory`1", FileName: null, MethodName: "FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 12, ClassName: "System.Runtime.ExceptionServices.ExceptionDispatchInfo", FileName: null, MethodName: "Throw()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 40, ClassName: "System.Runtime.CompilerServices.TaskAwaiter", FileName: null, MethodName: "HandleNonSuccessAndDebuggerNotification(Task task)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 408, ClassName: "Dapper.SqlMapper+&lt;ExecuteImplAsync&gt;d__37", FileName: null, MethodName: "MoveNext()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 12, ClassName: "System.Runtime.ExceptionServices.ExceptionDispatchInfo", FileName: null, MethodName: "Throw()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 40, ClassName: "System.Runtime.CompilerServices.TaskAwaiter", FileName: null, MethodName: "HandleNonSuccessAndDebuggerNotification(Task task)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 321, ClassName: "RedGate.SqlClone.Core.Sql.SqlDatabaseAttacher+&lt;Attach&gt;d__2", FileName: null, MethodName: "MoveNext()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 12, ClassName: "System.Runtime.ExceptionServices.ExceptionDispatchInfo", FileName: null, MethodName: "Throw()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 40, ClassName: "System.Runtime.CompilerServices.TaskAwaiter", FileName: null, MethodName: "HandleNonSuccessAndDebuggerNotification(Task task)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 592, ClassName: "RedGate.SqlClone.Core.Sql.SqlCloneAttacher+&lt;AttachDb&gt;d__4", FileName: null, MethodName: "MoveNext()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 12, ClassName: "System.Runtime.ExceptionServices.ExceptionDispatchInfo", FileName: null, MethodName: "Throw()", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 40, ClassName: "System.Runtime.CompilerServices.TaskAwaiter", FileName: null, MethodName: "HandleNonSuccessAndDebuggerNotification(Task task)", Raw: null }, RaygunErrorStackTraceLineMessage { LineNumber: 115, ClassName: "RedGate.SqlClone.Agent.CloneRestorer+&lt;AttachDb&gt;d__9", FileName: null, MethodName: "MoveNext()", Raw: null }] }
</Data>
  </EventData>
</Event>
Tagged:

Answers

  • Communication between the clone VHD and its parent image VHD is handled by the VHD driver and SMB over the underlying network, but the clone database appears to SQL Server as though it is on standard local disk. If the VHD driver isn't able to contact the image VHD to access data from it, it often responds with error 21 (the device is not ready). SQL Server responds to this error when accessing the disk by blocking access to the database, which can only be reliably resolved by restarting SQL Server.

    SQL Clone tries to reduce the risk of this happening by detaching the database when Windows informs the agent that the image is unreachable, so that it can reattach it once the connection is restored. However, Windows doesn't inform of this instantly, and if SQL Server tries to access the database first, you can get into the situation that you describe.

    To prevent this from happening, there needs to be a reliable connection between the image and clone. Using a SAN for images and co-locating images with machines that will receive clones on the same LAN is recommended.
  • bobkingbobking Posts: 4 Bronze 2
    This also occurs (quite painfully) if the drive hosting the delta files (the "actual" databases to SQL) fills due to explosive growth of a clone's individual deltas. You'll get identical error 21s from SQL server.

    Some areas that we've had to disable to make a smoother experience with fewer giant delta files:
    * Make sure no index maintenance runs on the clones.
    * Turn off ghost page cleanup (trace flag 661) if your clones grow quickly just after creation. At idle, with no connections, the clones should be very small, but if they grow right cloning, look in sp_who2 for "GHOST CLEANUP" running on your database (generally with a low SPID).
    * Don't add/remove fields to compressed indexes. This causes a reindex but due to the nature of SQL's compression, the bits on index are likely to change dramatically, shuffling around quite a bit of data.
  • rmunson8rmunson8 Posts: 2 New member
    Hello, I have recently experienced this as well. While it make sense that this is monitored and detected, is there a way to not have it delete the clone? In my case, per the event logs, the connection was down for only 2 minutes which caused this and thus much more damage done. Ideally we'd either get a notification or a way to recheck several times with various intervals between retries in case it's an intermittent issue. Any suggestions?

    Thanks.
  • jakobehnjakobehn SwedenPosts: 11 New member
    We also experience this last night. The server hosting the image share was patched and restarted, and after this multiple clones were deleted across several servers! We haven't found any logic as to which clones were deleted, it seems pretty random.

    This behavior makes SQL Clone unreliable. Is there any way we can disable cleanup/removal of clones, or at least delay it? It must be able to handle that the file share server is unavailable for a couple of minutes?
  • If there's planned maintenance on the server hosting the images share, you can put the cloned databases Offline in SQL Server. See https://www.red-gate.com/hub/product-learning/sql-clone/sql-clone-on-your-laptop?product=sql-clone. For unplanned outages, I don't know (yet) how to handle this.

Sign In or Register to comment.