How to cancel a hung deployment?

ismeisme Posts: 119
edited December 19, 2013 12:41PM in Deployment Manager
Normally my deployment takes about five minutes to complete.

The latest one has been running for 40 minutes with no end in sight.

The log goes no further than starting to execute a custom pre-deployment script:
2013-12-18 17:48:16 +00:00 INFO   Looking for PowerShell scripts named PreDeploy.ps1
2013-12-18 17:48:16 +00:00 INFO   Calling PowerShell script: 'G:\Temp\p1piiqgl.csd\Packages\..\Applications\INT-CI\Rhubarb.Rhubarb.Rhubarb\1.0.8711.241\db\state\Replication\PreDeploy.ps1'

It turns out that the script contains a subtle infinite loop.

How do I cancel a runaway deployment?

My Deployment Manager version is v2.3.4.13. The host runs PowerShell v2.

EDIT: I removed the script I posted because it was another one that actually caused the problem. Pebkac!
Iain Elder, Skyscanner

Comments

  • We abandoned the deployment by restarting the Windows host.

    Is there a less disruptive way to do this?

    Here's what we tried.

    I used this command to restart the RGDM service.
    Restart-Service -InputObject (Get-Service -ComputerName 'redgatedeploy' -Name 'Red Gate Deployment Manager') -Confirm
    

    Initially it looked like it worked.
    $ Get-Service -ComputerName 'redgatedeploy' -Name 'Red Gate Deployment Manager'
    
    Status   Name               DisplayName                           
    ------   ----               -----------                           
    Running  Red Gate Deploy... Red Gate Deployment Manager           
    
    

    But a few seconds later, the service stopped again.
    $ Get-Service -ComputerName 'redgatedeploy' -Name 'Red Gate Deployment Manager'
    
    Status   Name               DisplayName                           
    ------   ----               -----------                           
    Stopped  Red Gate Deploy... Red Gate Deployment Manager           
    
    

    At this point we figured the easiest thing to do was to restart the host.
    Restart-Computer -ComputerName 'redgatedeploy' -Force -Confirm
    

    After a couple of minutes, the host responded to Get-Service requests again.
    $ Get-Service -ComputerName 'redgatedeploy' -Name 'Red Gate Deployment Manager'
    
    Status   Name               DisplayName                           
    ------   ----               -----------                           
    Stopped  Red Gate Deploy... Red Gate Deployment Manager           
    
    

    We tried to start the service again.
    Start-Service -InputObject (Get-Service -ComputerName 'redgatedeploy' -Name 'Red Gate Deployment Manager') -Confirm
    

    This time it stayed up.

    Worryingly, there is no history of the bad deployment in the user interface.

    The deployment history for my project says that the most recent deployment was successful.
    Version          Status         Date                              Environment     Deployed by
    1.0.8711.241     Successful     18 December 2013 17:46 +00:00     INT-CI          svc_teamcity
    
    Iain Elder, Skyscanner
  • RGDM has logged these events in the Windows Application log since I attempted to restart the service.
    2013-12-18 18:47:14,783 [80] ERROR RedGate.Deploy.Server.Tasks.TaskRunner [(null)] - System.OperationCanceledException: The operation was canceled.

    2013-12-18 18:47:15,221 [51] ERROR RedGate.Deploy.Server.Tasks.TaskRunner [(null)] - System.OperationCanceledException: The operation was canceled.

    2013-12-18 18:47:21,471 [9] ERROR RedGate.Deploy.Shared.Startup.Host [(null)] - System.ServiceModel.AddressAlreadyInUseException: There is already a listener on IP endpoint 0.0.0.0:10302. Make sure that you are not trying to use this endpoint multiple times in your application and that there are no other applications listening on this endpoint. ---> System.Net.Sockets.SocketException: Only one usage of each socket address (protocol/network address/port) is normally permitted

    2013-12-18 18:48:16,956 [5] ERROR RedGate.Deploy.Shared.Startup.CommandProcessor [(null)] - Command RedGate.Deploy.Agent.Commands.SingleShotDeploymentCommand failed
    RedGate.Deploy.Shared.Startup.CommandException: Timed out executing Powershell script 'G:\Temp\p1piiqgl.csd\Packages\..\Applications\INT-CI\Rhubarb.Rhubarb.Rhubarb\1.0.8711.241\db\state\Replication\PreDeploy.ps1'. Powershell.exe process was forcibly terminated after 01:00:00

    2013-12-18 18:48:16,893 [5] ERROR RedGate.Deploy.Agent.Services.Jobs.JobRunner [(null)] - Timed out executing Powershell script 'G:\Temp\p1piiqgl.csd\Packages\..\Applications\INT-CI\Rhubarb.Rhubarb.Rhubarb\1.0.8711.241\db\state\Replication\PreDeploy.ps1'. Powershell.exe process was forcibly terminated after 01:00:00

    2013-12-18 18:50:05,645 [9] ERROR RedGate.Deploy.Shared.Startup.Host [(null)] - System.ServiceModel.AddressAlreadyInUseException: There is already a listener on IP endpoint 0.0.0.0:10302. Make sure that you are not trying to use this endpoint multiple times in your application and that there are no other applications listening on this endpoint. ---> System.Net.Sockets.SocketException: Only one usage of each socket address (protocol/network address/port) is normally permitted

    2013-12-18 18:51:33,036 [9] ERROR RedGate.Deploy.Shared.Startup.Host [(null)] - System.ServiceModel.AddressAlreadyInUseException: There is already a listener on IP endpoint 0.0.0.0:10302. Make sure that you are not trying to use this endpoint multiple times in your application and that there are no other applications listening on this endpoint. ---> System.Net.Sockets.SocketException: Only one usage of each socket address (protocol/network address/port) is normally permitted

    2013-12-18 19:08:59,075 [36] WARN RedGate.Deploy.Server.Tasks.TaskRunner [(null)] - Task deployments-5793 exited with error Deployment on the agent failed.

    2013-12-18 19:08:58,887 [36] ERROR RedGate.Deploy.Server.Tasks.TaskRunner [(null)] - System.AggregateException: One or more errors occurred. ---> RedGate.Deploy.Server.Tasks.ActivityFailedException: Deployment on the agent failed.

    I've omitted the stack trace.

    Ask me if you want a complete copy of the filtered log.

    To generate this, I connected the event view on my workstation to the RGDM host...
    mmc eventvwr.msc /computer:redgatedeploy 
    

    ...and used this filter to show only the events from RGDM.
    <QueryList>
      <Query Id="0" Path="Application">
        <Select Path="Application">
          *[System[Provider[@Name='Red Gate Deployment Manager']]]
        </Select>
      </Query>
    </QueryList>
    


    I tried to restart the service twice before restarting the host, which could explain why System.ServiceModel.AddressAlreadyInUseException appears twice.
    Iain Elder, Skyscanner
  • The default timeout period for PowerShell scripts is one hour. You can reduce that by setting the RedGatePowerShellTimeout variable. The value should be in the form hh:mm:ss. For example, for a five minute timeout, set RedGatePowerShellTimeout to 00:05:00.

    By the way, the Red Gate Deployment Manager service (process name RedGate.Deploy.Server.exe) doesn't run the deployment directly; the Deployment Agent (RedGate.Deploy.Agent.exe) is responsible for actually executing the deployment. For standard deployments this is the Red Gate Deployment Agent service, normally running on a remote machine. For database deployments, the DM server spawns RedGate.Deploy.Agent.exe as a child process. If you need to terminate a running deployment, it's the agent process that you need to kill.

    We don't currently have the ability to cancel deployments that are in progress; if you'd like us to implement that feature, please vote for it in UserVoice
    Development Lead
    Redgate Software
  • Thanks, Mike. I've shared this thread with my team. It's useful knowledge and practical advice.

    The next time we encounter a runaway deployment, we'll try to kill just the agent processes rather than the server process.

    We may decide to reduce the default PowerShell time limit for most of our deployments to defend against being blocked by infinite loops.

    We expect some of our bulk data transfer operations to take more than one hour. In these situations it might make sense to increase the limit.

    Thanks for your help!
    Iain Elder, Skyscanner
Sign In or Register to comment.