Automation to start, stop, stop with hibernate, and reboot EC2 instances

Using Site24x7's IT automation framework, you can create an action profile to start, stop, stop with hibernate or reboot EC2 instances. You can set up to automatically trigger the automation by mapping it with a threshold or an alert event type (Up, Down and Trouble), or you can call the APIs directly by manually invoking the action from the Site24x7 console.

Required Permissions

Please make sure the IAM role assumed by Site24x7 or the IAM user created for Site24x7 has the following partial write actions in the attached policy document to perform the actions.

  • "ec2:StartInstances"
  • "ec2:StopInstances"
  • "ec2:RebootInstances"

Constraints

  • EC2 instances need to be in a running state and are required to be monitored by Site24x7 to successfully execute the action.
  • You can't perform actions on instances that are in a suspended state.

Create an action profile

  • Login to the Site24x7 web console, select Admin > IT Automation Templates
  • Click on Add Automation and select Start/Stop/Stop-Hibernate/Reboot EC2 as the type by clicking on the drop down.
  • Type in an unique name in the Display Name field
  • Click on the drop down and select the action to be performed.
  • Next, select the EC2 instance(s) where you want the action to be performed. (If you choose the option $LOCALHOST the operation would be performed on all those EC2 instances where the automation profile is mapped)
  • Max Allowed Action Execution Time: The maximum number of seconds Site24x7 has to wait before the request times out. The execution time is set at 15 seconds, by default. You can define an execution time between 1-90 seconds.
  • Send the Automation Result via Email: You can choose to receive an email regarding the automation result, by toggling to Yes. Share automation results via an email to your User Alert Group configured in the Notification Profile. This email will contain parameters including the automation name, type of automation, incident reason, destination hosts, and more.
  • Save the profile.

Simulate the Automation

Before mapping the action profile, you can test it's functionality by invoking the action manually within the Site24x7 console or by using our REST APIs. Once you've created the profile, navigate back to the IT Automation summary page (Admin > IT Automation) and click on the  to execute a test run.

Map the Action Profile

To execute the automation, map the action profile to an desired alert event. You can either map the profile to a predefined monitor level event type (Up/Down/trouble) or to an custom attribute level event type. (CPU usage > 90%).

Monitor level mapping

Navigate to the Edit monitor page of the monitored EC2 instance ("EC2 instance monitor page" > > Edit) , and map the action profile with any of the following monitor status changes.

    • Execute on Down
    • Execute on Up
    • Execute on Trouble
    • Execute on any Status Change

Attribute level mapping

You can also associate the action profile to EC2 related metric data points like CPU usage, memory usage or load balancer related metrics latency, HTTP 4xx and more. Navigate to the Edit threshold profile page of the monitored EC2 instance (Navigate to the Edit Monitor page of the resource > click on the Pencil icon adjacent to the Threshold and Availability field) and map the profile to any desired attribute by clicking on the  "Select Automation to Execute" field.

Use case

  • Troubleshoot instances with failed status check: You can create a mapping in such a way that, whenever your monitored Amazon EC2 instance fails a system or an instance reachability check and automated action to reboot the said instance or to stop and start the instance is automatically triggered.
  • Prevent out of memory failures: Map the EC2 reboot action profile to the metric data point memory utilization and set it up to trigger if memory usage starts to climb dangerously close to the threshold limit
  • Reduce consumed instance hours: Map the EC2 stop action profile to metric data points like CPU usage and Network utilization, to identify underutilized instances and stop them.
Was this document helpful?
Thanks for taking the time to share your feedback. We’ll use your feedback to improve our online help resources.