# infrastructure-kif **Repository Path**: mirrors_apache/infrastructure-kif ## Basic Information - **Project Name**: infrastructure-kif - **Description**: KIF - Kill It (with) Fire. Janitorial service for ASF Infra - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-22 - **Last Updated**: 2025-10-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # KIF - Kill It (with) Fire ## A simple find-and-fix program with a yaml configuration Kif is a simple monitoring program that detects programs running amok and tries to correct them. It can currently scan for: - Memory usage (MB, GB or % of total mem available) - No. of open file descriptors - No. of TCP connections open - No. of LAN TCP connections open - Age of process - State of process (running, waiting, zombie etc) and act accordingly, either running a custom command (such as restarting a service) or killing it with any preferred signal. It can also notify you of issues found and actions taken, either via email or hipchat. See [kif.yaml](kif.yaml) for example configuration and features. ### Requirements - python 3.6 or higher - python-yaml - python-psutil - asfpy ### Installation and use - Download Kif - Make a kif.yaml configuration (see the [example](kif.yaml)) - Install the dependencies with: `pip3 install -r requirements.txt` (or use pipenv) - Run as root (required to both read usage and restart services). - Enjoy! ### Installing via pipservice To install on an infra node, add the following yaml snippet to it: ~~~yaml pipservice: kif: tag: master ~~~ ### Rule syntax: ```yaml rules: apache: description: 'sample apache process rule' # We can specify the exact cmdline and args to scan for: procid: - '/usr/sbin/apache2' - '-k' - 'start' # We'll use combine: true to combine the resource of multiple processes into one check. combine: true triggers: # Demand no more than 500 LAN connections maxlocalconns: 500 # No more than 25,000 open connections in total maxconns: 25000 # Require < 1GB memory used (could also be 10%, 512mb etc) maxmemory: 1gb # And finally, no more than 65,000 open file descriptors maxfds: 65000 # If triggered, run this: runlist: - 'service apache2 restart' zombies: description: 'Any process caught in zombie mode' # use empty procid to catch all procid: '' triggers: # This can be any process state (zombie, sleeping, running, etc) state: 'zombie' # No runlist here, just kill it with signal 9 kill: true killwith: 9 puppet: description: 'kill -9 puppet agents that are hanging' procid: 'puppet agent' # Find all processes created more than 1 day ago. triggers: maxage: 1d # Ignore main process ignorepidfile: '/var/run/puppet/agent.pid' # Kill it with signal 9 kill: true killwith: 9 ``` ### Restricting rules to certain machines To have a specific rule run on certain nodes, please add the rule to kif.yaml, and make use of `host_must_match` or `host_must_not_match` definitions to narrow down where to run the rule-set, like so: ~~~yaml zombies_on_gitbox: description: 'Any gitweb process caught in zombie mode' host_must_match: gitbox.apache.org procid: '/usr/bin/git' triggers: # This can be any process state (zombie, sleeping, running, etc) # Or a git process > 30 minutes old. state: 'zombie' maxage: 30m kill: true killwith: 9 httpd_but_not_tlpserver: description: 'httpd too many backend connections (pool filling up?)' host_must_not_match: 'tlp-.+' procid: '/usr/sbin/apache2' # Use combine: true to combine the resource of multiple processes into one check. combine: true triggers: maxlocalconns: 1000 runlist: - 'service apache2 restart' ~~~ Both `host_must_match` and `host_must_not_match` are regular expressions and must match the full hostname. Be sure to use double escaping for keywords, for instance `\\d` instead of `\d`, or the yaml will break. The must/must-not can also be used in combination to include some nodes and rule out others. ### Command line arguments - `--debug`: Run in debug mode - detect but don't try to fix issues. - `--config $filename`: path to config file.