Problem #
It is common to see services deployed to mesos that are not in use anymore or have been over-provisioned to support testing such as QA validation or performance. Not cleaning up those services can result in extra costs.
Cleanup Process #
Marathon offers a convenient REST API that supports mesos parallel scaling resizing, allowing the cleanup process to be automated (see marathon REST API). The cleanup process can be specified through SLA per environment such as DEV (all instances will be resized to 0 nightly), TEST and STAGE (all instances will be resized to 1).
Steps to implement the cleanup process through python and Jenkins: #
-
install python and pip (version used for this guide is 3.7+):
brew install python install requests lib: pip3 install requests
-
create the cleanup.py script with the following content: cleanup.py
#!/usr/bin/python import sys import requests import json def delete_failed_tasks(id): app_tasks = requests.get(address + "/apps/" + id + "/tasks") if app_tasks.status_code != 200: print('error PUT /v2/apps {}'.format(app_tasks.status_code)) exit else: for task in app_tasks.json()['tasks']: print("task={}".format(task)) print('state={}\n'.format(task['state'])) if task['state'] == 'TASK_FAILED': delete_deployment(id) delete_task = requests.delete( address + "/apps/" + service['id'] + "/tasks/" + task['id'] + "?force=true") if delete_task.status_code != 200: print( 'error DELETE /v2/apps task {}'.format(delete_task.status_code)) exit else: print( 'DELETE /v2/apps task success{}'.format(delete_task)) def delete_deployment(id): deployments = requests.get(address + "/deployments/") if deployments.status_code != 200: print('error GET /v2/deployments {}'.format(deployments.status_code)) exit else: for deployment in deployments.json(): if id in deployment['affectedApps']: print("id={}".format(deployment['id'])) delete_request = requests.delete( address + "/deployments/" + deployment['id'] + "?force=true") if delete_request.status_code != 202 : print( 'error DELETE /v2/deployments {}'.format(delete_request.status_code)) exit else: print( 'DELETE /v2/deployment success {}'.format(delete_request)) env = str(sys.argv[1]) instances = str(sys.argv[2]) path_to_app = str(sys.argv[3]) if len(sys.argv) == 4 else None print('environment: ', env) environment_address_d = {"dev": "http://dev.mesos:8080/v2", "test": "http://test.mesos:8080/v2", "stage": "http://stage.mesos:8080/v2"} address = environment_address_d[env] print('address: ', address) if address == None: print("invalid address") exit resp = requests.get(address + "/apps") if resp.status_code != 200: # This means something went wrong. print('GET /v2/apps {}'.format(resp.status_code)) for service in resp.json()['apps']: if path_to_app in service['id']: print('current id={} instances={}\n'.format( service['id'], service['instances'])) delete_failed_tasks(service['id']) response = requests.put(address + "/apps/" + service['id'], data="{\"instances\": " + instances + "}") if response.status_code != 200: print('error PUT /v2/apps {}'.format(response.status_code)) else: print('updated id={} set to instances={}\n'.format( service['id'], instances))
NOTE: the script above defines 2 functions to remove failed tasks and deployments associated with an app. Alternatively, force=true code on line 76, but with potential side effects of deleting good deployments. `
-
script execution using arguments: python3.7 cleanup.py [enviroment. Ex: dev|test|stage] [desired number of instances. Ex: 0] [filter based on string constain. Ex: /middleware/employee/crew|crew|employee]
-
all services under employee path: python3.7 cleanup.py dev 0 /middleware/employee
-
a single service: python3.7 cleanup.py dev 1 /middleware/employee/crew-app-landing-v1
-
create a jenkins project using the following Jenkinsfile:
pipeline { agent { docker { image 'python:3.7-slim' } } triggers { cron ('H 0 0 0 0') } stages { stage('Build') { steps { sh 'pip3 install requests' sh 'python3.7 cleanup.py dev 1 path/to/api/' } } } }
NOTE the execution of the command
python3.7 cleanup.py dev 1 path/to/api/
will resize all instances under the path/to/api/ to 1 instance.