addon-operator logo

docker pull flant/addon-operator GH Discussions Telegram chat RU

Installation

You may use a prepared image flant/addon-operator to install addon-operator in a cluster. The image comprises a binary addon-operator file as well as several required tools: helm, kubectl, jq, bash.

The installation incorporates the image building process with files of modules and hooks, applying the necessary RBAC rights and deploying the image in the cluster.

Examples

To experiment with modules, hooks, and values we’ve prepared some examples.

Deckhouse Kubernetes Platform was an initial reason to create addon-operator, thus its modules might become a vital source of inspiration for implementing your own modules.

Sharing your examples of using addon-operator is much appreciated. Please, use the relevant Discussions section for that.

Community

Please feel free to reach developers/maintainers and users via GitHub Discussions for any questions regarding addon-operator.

You’re also welcome to follow @flant_com to stay informed about all our Open Source initiatives.

License

Apache License 2.0, see LICENSE.

Overview

Addon-operator combines Helm charts with hooks and values storage to transform charts into smart modules that configure themselves and respond to changes in the cluster. It is a sister project for shell-operator and is actively used in Deckhouse Kubernetes Platform to implement its modules.

Features

  • Discovery of values for Helm charts — parameters can be generated, calculated or retrieved from the cluster;
  • Continuous discovery — parameters can be changed in response to cluster events;
  • Controlled Helm execution — addon-operator monitors the Helm operation to ensure the Helm chart’s successful installation. Coming soon: use kubedog to track deploy status and more;
  • Custom extra actions before and after running Helm as well as any other events via the hooks paradigm. See related shell-operator capabilities.

Additionally, addon-operator provides:

  • ease of maintenance of Kubernetes clusters: use the tools that Ops are familiar with to build your modules and hooks such as Bash, kubectl, Python, etc;
  • the execution queue of modules and hooks that ensures the launch sequence and repeated execution in case of an error, which simplifies programming of modules and ensures predictable outcome of their operation;
  • the possibility of dynamic enabling/disabling of a module (depending on detected parameters);
  • the ability to tie conditions of module activation to the activation of other modules;
  • the unified ConfigMap for the configuration of all settings;
  • the ability to run Helm only if parameters have changed. In this case, helm history would output only releases with changes;
  • global hooks for figuring out parameters and performing actions that affect several dependent modules;
  • off-the-shelf metrics for monitoring via Prometheus.

Hooks and Helm values

Hooks are triggered by Kubernetes events and in response to other stimuli.

Hooks are triggered by Kubernetes events

A hook is an executable file that can make changes to Kubernetes and set values of Helm (they are stored in the memory of addon-operator) during execution.

A hook is an executable file

Hooks are a part of the module. Also, there is a Helm chart in the module. If the hook makes changes to values, then addon-operator would upgrade the release of the Helm chart.

Hook is a part of the module

Modules

There can be many modules.

Many modules

In addition to modules, addon-operator supports global hooks and global values. They have a storage of values. Global hooks are triggered by events and when active they can:

  • Make changes to Kubernetes cluster;
  • Make changes to global values storage.

Global hooks and global values

If the global hook changes values in the global storage, then addon-operator triggers an upgrade of releases of all Helm charts.

Changes in global values cause reinstallation

Running Addon-operator

Environment variables

GLOBAL_HOOKS_DIR — a directory with global hook files.

MODULES_DIR — paths separated by colon where modules are located.

UNNUMBERED_MODULE_ORDER — an integer number to use as the default order for modules without numbered prefix.

ADDON_OPERATOR_NAMESPACE — a required parameter with namespace where Addon-operator is deployed.

ADDON_OPERATOR_CONFIG_MAP — a name of ConfigMap to store values. Default is addon-operator.

Namespace and config map name are used to watch for ConfigMap changes.

Example of container:

containers:
- image: addon-operator-image:latest
  env:
  - name: ADDON_OPERATOR_NAMESPACE
    valueFrom:
      fieldRef:
        fieldPath: metadata.namespace
  - name: ADDON_OPERATOR_CONFIG_MAP
    value: my-values   

With this variables Addon-operator would monitor ConfigMap/my-values object.

ADDON_OPERATOR_LISTEN_ADDRESS — address for http server. Default is 0.0.0.0

ADDON_OPERATOR_LISTEN_PORT — port for http server. Default is 9650.

Addon-operator starts http server and listens on ADDRESS:PORT. There is a liveness probe and /metrics endpoint.

  env:
  ...
  - name: ADDON_OPERATOR_LISTEN_ADDRESS
    valueFrom:
      fieldRef:
        fieldPath: status.podIP
  - name: ADDON_OPERATOR_LISTEN_PORT
    value: 9090
  livenessProbe:
    httpGet:
      path: /healthz
      port: 9090      

ADDON_OPERATOR_PROMETHEUS_METRICS_PREFIX — a prefix for Prometheus metrics. Default is addon_operator_.

  env
  - name: ADDON_OPERATOR_PROMETHEUS_METRICS_PREFIX
    value: dev_cluster_  
curl localhost:9650/metrics

...
dev_cluster_live_ticks 32
...

ADDON_OPERATOR_CRD_EXTRA_LABELS – string with CRDs label pairs.
For example: heritage=my-app,scope=extra
Default is heritage=addon-operator.

ADDON_OPERATOR_CRD_FILTER_PREFIXES – String of filters for the CRD, separated by commas. Default is doc-,_.

Kubernetes client settings

KUBE_CONFIG — a path to a kubernetes client config (~/.kube/config)

KUBE_CONTEXT — a context name in a kubernetes client config (similar to a --context flag of a kubectl)

KUBE_CLIENT_QPS and KUBE_CLIENT_BURST — qps and burst parameters to rate-limit requests to Kubernetes API server. Default qps is 5 and burst is 10 as in a rest/config.go file.

Helm settings

Addon-operator expects that “helm” binary is available in $PATH. It detects Helm version at start by executing “helm --help” command. If this is not appropriate by some reasons, you can use these settings:

HELM_BIN_PATH — a path to a Helm binary.

HELM3 — set to “yes” to disable auto-detection and explicitly enable compatibility with helm3.

HELM_IGNORE_RELEASE — a name of the release that should not be treated as the module’s release. Prevent self-destruction when addon-operator release is stored in the same namespace as releases for modules.

env:
- name: HELM_IGNORE_RELEASE
  value: {{ .Release.Name }}

HELM_MONITOR_KUBE_CLIENT_QPS — QPS for a rate limiter of a kubernetes client for Helm resources monitor.

HELM_MONITOR_KUBE_CLIENT_BURST — Burst for a rate limiter of a kubernetes client for Helm resources monitor.

Logging settings

LOG_TYPE — Logging formatter type: json, text or color.

LOG_LEVEL — Logging level: debug, info, error.

LOG_NO_TIME — ‘true’ value will disable timestamp logging. Useful when output is redirected to logging system that already adds timestamps. Default is ‘false’.

Debug

Several tools are available for the debugging of addon-operator and hooks:

  • You can get logs of an Addon-operator’s pod for analysis (by executing kubectl logs -f po/POD_NAME)
  • You can set the environment variable LOG_LEVEL=debug to include detailed debugging data into logs
  • Addon-operator inherits shell-operator’s debug CLI interface and a UNIX socket HTTP endpoint. A path to the endpoint can be configured with DEBUG_UNIX_SOCKET environment variable, the default path is “/var/run/addon-operator/debug.socket”.

Available debug commands:

addon-operator queue list [-o text|yaml|json]
    Dump tasks in all queues.

addon-operator global values [-o yaml|json]
    Dump current global values.

addon-operator global patches
    Dump current JSON patches for global values.

addon-operator global config [-o yaml|json]
    Dump global config values.

addon-operator module list [-o text|yaml|json]
    List available modules and their enabled status.

addon-operator module values [-o yaml|json] <module_name>
    Dump module values by name.

addon-operator module patches <module_name>
    Dump JSON patches for module values by name.

addon-operator module config [-o yaml|json] <module_name>
    Dump module config values by name.

addon-operator module resource-monitor [-o text|yaml|json]
    Dump resource monitors.

Lifecycle

Structure

Module files are located in the /modules directory. The directory can be set via $MODULES_DIR variable. Global hook files are located in the /global-hooks directory (you can set your own directory with the $GLOBAL_HOOKS_DIR variable).

Startup sequence

During startup, Addon-operator finds and initializes all global hooks. For more info, see HOOKS.

After the global hooks initialization, Addon-operator executes all global onStartup hooks.

Then, the global hooks with kubernetes binding are executed with a binding context of type Synchronization and Kubernetes monitors for global hooks are started.

Reload all modules

Next, the ‘reload all modules’ process is started. First, it finds all modules and their hooks.

Then, all global hooks with beforeAll binding are executed.

Next, the ‘module discovery’ process is started, it finds which modules are enabled by executing ‘enabled’ script for modules enabled in values.yaml and in ConfigMap/addon-operator.

Enabled modules are started.

During each module start-up, it executes all onStartup hooks and initializes the installation of a Helm chart. Prior to the installation of a Helm chart, the beforeHelm hook is executed. The afterHelm hook is executed after the installation.

When all modules are started, all global hooks with afterAll binding are executed.

Main loop

After the first run of ‘reload all modules’, the main loop starts. It reacts to schedule and Kubernetes events, and to a values changes: it restarts a particular module if its values are changed and runs ‘reload all modules’ process again if global values are changed.

Named queues

The Addon-operator supports named queues to execute hooks in parallel for schedule and kubernetes/Event bindings.

All other actions are handled in a single “main” queue:

  • global hooks:
    • onStartup
    • kubernetes/Synchronization
    • beforeAll
    • afterAll
  • module hooks:
    • onStartup
    • kubernetes/Synchronization
    • beforeHelm
    • execution of helm commands
    • afterHelm

This document mainly describes modules. To get more information on hooks, see HOOKS document. To get a full view of how hooks, modules, values, binding contexts, and queues are interlinked, see LIFECYCLE-STEPS document.

Module lifecycle

The onStartup hooks of enabled module is executed at the startup of the Addon-operator or later on module enablement.

Next, the module’s chart is installed with helm upgrade --install. Before launching Helm, beforeHelm hooks are executed, after the launch, afterHelm hooks are executed.

After the launch the module would start responding to two types of events:

  • schedule — events that are generated by the crontab scheduler built in the addon-operator;
  • kubernetes — events within the cluster that API server announces to the Addon-operator.

When the module is deactivated, the Addon-operator launches command helm delete --purge and after the release deletion, the afterDeleteHelm hooks are executed.

All necessary hooks will be restarted if there are errors during the module activation or deactivation. For example, if an error occurred in the hook with afterHelm binding during the first module execution, then after a 5 seconds delay the onStartup and beforeHelm hooks are executed, the Helm chart is installed and then afterHelm hooks are executed.

Modules discovery

The Addon-operator makes a list of all enabled modules for their execution and a list of disabled modules for the deletion of their Helm releases. This process is called ‘modules discovery’ and is started in the following cases:

  • during the start of addon-operator
  • when an event to restart all modules occurs (see VALUES).

Modules are disabled by default. The module can be enabled by a key with the module name suffixed by Enabled. This key should contain a boolean value and can be specified in these sources:

  • $MODULES_DIR/values.yaml
  • values.yaml files in modules directories
  • ConfigMap/addon-operator

Boolean values from values.yaml files and ConfigMap/addon-operator are combined and if the result is equal to false or is empty, then the module is disabled.

If the value is true, an additional check is performed – the enabled script is executed (see below). If the script is present in the module and it returns false, then the module is considered disabled. If the script is not present or returns true, then the module is enabled.

If an error occurs during the ‘modules discovery’ process, then the module discovery is restarted every 5 seconds until successful execution. In this case, the execution of hooks with schedule and kubernetes bindings will be blocked in the “main” queue.

As a result of a ‘module discovery’ process, the tasks for the execution of all enabled modules, deletion of all disabled modules, and execution of all global hooks with the afterAll binding are added to the queue.

Enabled script

A script or an executable file that returns the status of the module. The script has access to the module values in $VALUES_PATH and $CONFIG_VALUES_PATH files, more details about the values are available here. The variable $MODULE_ENABLED_RESULT passes the path to the file into which the script should write the module status: true or false.

Below is an example of the enabled script that disables the module when parameter param2 is set to “stopMePlease”.

#!/usr/bin/env bash

param2=$(jq -r '.simpleModule.param2' $VALUES_PATH)

if [[ $param2 == "stopMePlease" ]] ; then
  echo "false" > $MODULE_ENABLED_RESULT
else
  echo "true" > $MODULE_ENABLED_RESULT
fi

Examples

Keys in values.yaml files

A module named nginx-ingress may have an nginxIngressEnabled flag in two files:

$ cat modules/values.yaml

nginxIngressEnabled: true

$ cat modules/001-nginx-ingress/values.yaml

nginxIngressEnabled: false

Module nginx-ingress is enabled in modules/values.yaml but disabled in modules/001-nginx-ingress/values.yaml. The final result is that the module is disabled.

Also, note that the module’s directory name is kebab-cased but keys in values.yaml are camelCased (see VALUES).

values.yaml and ConfigMap

A module named ‘some-module’ has no someModuleEnabled flag in modules/001-some-module/values.yaml but this flag is defined in a ConfigMap and the module has enabled script:

$ cat modules/values.yaml

global:
  param1: 100
someModuleEnabled: false

$ cat modules/001-some-module/values.yaml

someModule:
  param1: "String"


$ kubectl -n addon-operator get cm/addon-operator -o yaml

data:
  global: |
    param1: 200
  someModule: |
    param1: "Long string"
    param2: "FOO"
  someModuleEnabled: "true"

$ cat modules/01-some-module/enabled

#!/bin/bash

echo false > $MODULE_ENABLED_RESULT

Module some-module is explicitly disabled in modules/values.yaml but enabled by someModuleEnabled key in ConfigMap/addon-operator. Thus enabled script is executed and returns false. So the final result is that the module is disabled.

Task queues

Task queues are simple FIFO queues. The Addon-operator processes an event, creates a task and adds it to the particular named queue. Each named queue has a queue handler which runs the first task and proceeds to the next.

Each task is processed until successful completion. In case of an error, the task is returned to the start of the queue and executed with an exponentially growing delay (from 5s to 30s). When executing tasks for the kubernetes and schedule events, the queue handler ignores execution errors if the allowFailure: true flag is specified in the binding configuration.

Queue monitoring

You can use Prometheus metrics to monitor the queue. For details, see METRICS.

Steps of addon-operator lifecycle

This document is intended to give a full view of how hooks, modules, values, binding contexts, and queues are interlinked within the Addon-operator’s lifecycle.

Startup steps:

1. execute global hooks with ‘onStartup’ binding ordered by the ORDER value (see onStartup)

  • input
    • binding context ($BINDING_CONTEXT_PATH temporary file)
      • [{"binding":"onStartup"}]
    • config ($CONFIG_VALUES_PATH temporary file)
      • ‘global’ section in ConfigMap
    • values ($VALUES_PATH temporary file)
      • ‘global values’ merged from:
        • ‘global’ section in modules/values.yaml
        • ‘global’ section in ConfigMap
        • patched with patches saved from previous global hooks
  • output
    • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
      • applied to ConfigMap just after the hook execution
    • values patches ($VALUES_JSON_PATCH_PATH temporary file)
      • saved in memory
  • events after execution
    • values changes do not trigger any event

2. execute global hooks with ‘kubernets’ binding in alphabetic order (see kubernetes)

  • a hook executes several times for each defined ‘kubernetes’ binding
  • input
    • binding context ($BINDING_CONTEXT_PATH temporary file)
      • "type": "Synchronization"
        • "objects" contains all existed objects
        • "snapshots" contains existed objects from previous bindings
    • config ($CONFIG_VALUES_PATH temporary file)
      • ‘global’ section in ConfigMap
    • values ($VALUES_PATH temporary file)
      • ‘global values’ merged from:
        • ‘global’ section in modules/values.yaml
        • ‘global’ section in ConfigMap
        • patched with patches saved from previous global hooks
  • output
    • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
      • applied to ConfigMap just after the hook execution
    • values patches ($VALUES_JSON_PATCH_PATH temporary file)
      • saved in memory
  • events after execution
    • values changes do not trigger an event

‘Reload all modules’ steps:

3. execute global hooks with ‘beforeAll’ binding ordered by ORDER value (see beforeAll)

  • input
    • binding context ($BINDING_CONTEXT_PATH temporary file)
      • “snapshots” contains existed objects from all ‘kubernetes’ bindings of this hook
    • config ($CONFIG_VALUES_PATH temporary file)
      • ‘global’ section in ConfigMap
    • values ($VALUES_PATH temporary file)
      • ‘global values’ merged from:
        • ‘global’ section in modules/values.yaml
        • ‘global’ section in ConfigMap
        • patched with patches saved from previous global hooks
  • output
    • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
      • applied to ConfigMap just after the hook execution
    • values patches ($VALUES_JSON_PATCH_PATH temporary file)
      • saved in memory
  • events after execution
    • values changes do not trigger an event

4. discover modules

  • get merged ‘enabled’ state from config for each module
    • false
    • ‘{moduleName}Enabled’ value from modules/values.yaml
    • ‘{moduleName}Enabled’ value from modules/{moduleName}/values.yaml
    • ‘{moduleName}Enabled’ value from ConfigMap
  • run ‘enabled’ script if merged ‘enabled’ state is true
    • input
      • config ($CONFIG_VALUES_PATH temporary file)
        • ‘global’ section in ConfigMap
        • ‘{moduleName}’ section in ConfigMap
      • values ($VALUES_PATH temporary file)
        • ‘global values’ merged from:
          • ‘global’ section in modules/values.yaml
          • ‘global’ section in ConfigMap
          • patched with patches saved from previous global hooks
          • extra field ‘global.enabledModules’ contains a list of previously enabled modules
        • ‘module values’ merged from:
          • ‘{moduleName}’ section in modules/values.yaml
          • ‘{moduleName}’ section in modules/{moduleName}/values.yaml
          • ‘{moduleName}’ section in ConfigMap
          • patched with patches saved from previous module hooks
    • output
      • ‘enabled’ state ($MODULE_ENABLED_RESULT temporary file)
        • if hook return “false”, hook state is disabled
  • no ‘enabled’ script
    • merged ‘enabled’ state is used
  • create 3 lists
    • modules to enable
    • modules to delete (disabled)
    • modules to purge (there is helm release, but no module directory)

5. ‘module run’ for each enabled module

  • if startup or if module just become enabled
    • execute module hooks with ‘onStartup’ binding ordered by the ORDER value (see onStartup)
      • input
        • binding context ($BINDING_CONTEXT_PATH temporary file)
          • {"binding":"onStartup"}
        • config ($CONFIG_VALUES_PATH temporary file)
          • ‘global’ section in ConfigMap
          • ‘{moduleName}’ section in ConfigMap
        • values ($VALUES_PATH temporary file)
          • ‘global values’ merged from:
            • ‘global’ section in modules/values.yaml
            • ‘global’ section in ConfigMap
            • patched with patches saved from previous global hooks
            • extra field ‘global.enabledModules’ contains a list of all enabled modules created by ‘discover modules’ step (4)
          • ‘module values’ merged from:
            • ‘{moduleName}’ section in modules/values.yaml
            • ‘{moduleName}’ section in modules/{moduleName}/values.yaml
            • ‘{moduleName}’ section in ConfigMap
            • patched with patches saved from previous module hooks temporary file
      • output
        • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
          • applied to ConfigMap just after hook run
        • values patches ($VALUES_JSON_PATCH_PATH temporary file)
          • saved in memory
      • events after execution
        • values changes do not trigger an event
    • execute module hooks with ‘kubernetes’ bindings ordered in alphabetic order
      • a hook executes several times for each defined ‘kubernetes’ binding
      • input
        • binding context ($BINDING_CONTEXT_PATH temporary file)
          • "type": "Synchronization"
            • "objects" contains all existed objects
            • "snapshots" contains existed objects from previous bindings
        • config ($CONFIG_VALUES_PATH temporary file)
          • ‘global’ section in ConfigMap
          • ‘{moduleName}’ section in ConfigMap
        • values ($VALUES_PATH temporary file)
          • ‘global values’ merged from:
            • ‘global’ section in modules/values.yaml
            • ‘global’ section in ConfigMap
            • patched with patches saved from previous global hooks
            • extra field ‘global.enabledModules’ contains a list of all enabled modules created by ‘discover modules’ step (4)
          • ‘module values’ merged from:
            • ‘{moduleName}’ section in modules/values.yaml
            • ‘{moduleName}’ section in modules/{moduleName}/values.yaml
            • ‘{moduleName}’ section in ConfigMap
            • patched with patches saved from previous module hooks
      • output
        • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
          • applied to ConfigMap just after the hook execution
        • values patches ($VALUES_JSON_PATCH_PATH temporary file)
          • saved in memory
      • events after execution
        • values changes do not trigger an event
  • execute module hooks with ‘beforeHelm’ binding ordered by the ORDER value (see beforeHelm)
    • input
      • binding context ($BINDING_CONTEXT_PATH temporary file)
        • {"binding":"beforeHelm"}
        • extra field "snaphots" contains existed objects from all ‘kubernetes’ bindings of this hook
      • config ($CONFIG_VALUES_PATH temporary file)
        • ‘global’ section in ConfigMap
        • ‘{moduleName}’ section in ConfigMap
      • values ($VALUES_PATH temporary file)
        • ‘global values’ merged from:
          • ‘global’ section in modules/values.yaml
          • ‘global’ section in ConfigMap
          • patched with patches saved from previous global hooks
          • extra field ‘global.enabledModules’ contains a list of all enabled modules created by ‘discover modules’ step (4)
        • ‘module values’ merged from:
          • ‘{moduleName}’ section in modules/values.yaml
          • ‘{moduleName}’ section in modules/{moduleName}/values.yaml
          • ‘{moduleName}’ section in ConfigMap
          • patched with patches saved from previous module hooks
    • output
      • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
      • applied to ConfigMap just after the hook execution
      • values patches ($VALUES_JSON_PATCH_PATH temporary file)
      • saved in memory
    • events after execution
      • values changes do not trigger an event
  • check if helm upgrade should run
    • get saved checksum from last release values
    • render templates
      • if checksum is changed → helm release should be upgraded
    • get helm resources defined in templates
      • if there are absent resources → helm release should be upgraded
  • run helm upgrade --install
    • values (unique file in a temporary directory)
      • ‘global values’ merged from:
        • ‘global’ section in modules/values.yaml
        • ‘global’ section in ConfigMap
        • patched with patches saved from previous global hooks
      • ‘module values’ merged from:
        • ‘{moduleName}’ section in modules/values.yaml
        • ‘{moduleName}’ section in /modules/{moduleName}/values.yaml
        • ‘{moduleName}’ section in ConfigMap
        • patched with patches saved from previous module hooks
    • release name
      • module name in kebab-case
      • name from Chart.yaml is ignored
    • namespace
      • $ADDON_OPERATOR_NAMESPACE (see RUNNING)
  • execute module hooks with ‘afterHelm’ binding ordered by the ORDER value (see afterHelm)
    • input
      • binding context ($BINDING_CONTEXT_PATH temporary file)
        • {"binding":"afterHelm"}
        • extra field "snaphots" contains existed objects from all ‘kubernetes’ bindings of this hook
      • config ($CONFIG_VALUES_PATH temporary file)
        • ‘global’ section in ConfigMap
        • ‘{moduleName}’ section in ConfigMap
      • values ($VALUES_PATH temporary file)
        • ‘global values’ merged from:
          • ‘global’ section in modules/values.yaml
          • ‘global’ section in ConfigMap
          • patched with patches saved from previous global hooks
          • extra field ‘global.enabledModules’ contains a list of all enabled modules created by ‘discover modules’ step (4)
        • ‘module values’ merged from:
          • ‘{moduleName}’ section in modules/values.yaml
          • ‘{moduleName}’ section in modules/{moduleName}/values.yaml
          • ‘{moduleName}’ section in ConfigMap
          • patched with patches saved from previous module hooks
    • output
      • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
        • applied to ConfigMap just after the hook execution
      • values patches ($VALUES_JSON_PATCH_PATH temporary file)
        • saved in memory
    • events after execution
      • if module values are changed, restart ‘module run’

6. ‘module delete’ for each disabled module

  • run helm delete --purge
  • execute module hooks with ‘afterDeleteHelm’ binding ordered by the ORDER value (see afterDeleteHelm)
    • input
      • binding context ($BINDING_CONTEXT_PATH temporary file)
        • {"binding":"afterDeleteHelm"}
        • extra field "snaphots" contains existed objects from all ‘kubernetes’ bindings of this hook
      • config ($CONFIG_VALUES_PATH temporary file)
        • ‘global’ section in ConfigMap
        • ‘{moduleName}’ section in ConfigMap
      • values ($VALUES_PATH temporary file)
        • ‘global values’ merged from:
          • ‘global’ section in modules/values.yaml
          • ‘global’ section in ConfigMap
          • patched with patches saved from previous global hooks
          • extra field ‘global.enabledModules’ contains a list of all enabled modules created by ‘discover modules’ step (4)
        • ‘module values’ merged from:
          • ‘{moduleName}’ section in modules/values.yaml
          • ‘{moduleName}’ section in modules/{moduleName}/values.yaml
          • ‘{moduleName}’ section in ConfigMap
          • patched with patches saved from previous module hooks
    • output
      • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
        • applied to ConfigMap just after the hook execution
      • values patches ($VALUES_JSON_PATCH_PATH temporary file)
        • saved in memory
    • events after execution
      • values changes do not trigger an event

7. ‘module purge’ for each non-existent module

  • run helm delete --purge

8. execute global hooks with ‘afterAll’ binding ordered by the ORDER value (see afterAll)

  • input
    • binding context ($BINDING_CONTEXT_PATH temporary file)
      • {"binding":"afterAll"}
      • extra field “snapshots” contains existed objects from all ‘kubernetes’ bindings of this hook
    • config ($CONFIG_VALUES_PATH temporary file)
      • ‘global’ section in ConfigMap
    • values ($VALUES_PATH temporary file)
      • ‘global values’ merged from:
        • ‘global’ section in modules/values.yaml
        • ‘global’ section in ConfigMap
        • patched with patches saved from previous global hooks
  • output
    • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
      • applied to ConfigMap just after the hook execution
    • values patches ($VALUES_JSON_PATCH_PATH temporary file)
      • saved in memory
  • events after execution
    • if values are changed, re-run ‘Reload all modules’ steps 3 to 8.

Reaction to events

9. ‘kubernetes’ event for global hook (see kubernetes)

  • hook execution is queued in “main” or in a named queue according to the binding configuration
  • queue handler runs a hook:
    • input
      • binding context ($BINDING_CONTEXT_PATH temporary file)
        • "type": "Event"
          • "object" contains a related object
          • "filterResult" contains a result of jqFilter
          • "snapshots" contains existed objects from other bindings
      • config ($CONFIG_VALUES_PATH temporary file)
        • ‘global’ section in ConfigMap
      • values ($VALUES_PATH temporary file)
        • ‘global values’ merged from:
          • ‘global’ section in modules/values.yaml
          • ‘global’ section in ConfigMap
          • patched with patches saved from previous global hooks
    • output
      • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
        • applied to ConfigMap just after the hook execution
      • values patches ($VALUES_JSON_PATCH_PATH temporary file)
        • saved in memory
    • events after execution
      • ‘global values changed’ if global section or *Enabled flags are changed

10. ‘kubernetes’ event for module hook (see kubernetes)

  • hook execution is queued in “main” or in a named queue according to the binding configuration
  • queue handler runs a hook:
    • input
      • binding context ($BINDING_CONTEXT_PATH temporary file)
        • "type": "Event"
          • "object" contains a related object
          • "filterResult" contains a result of jqFilter
          • "snapshots" contains existed objects from other bindings
      • config ($CONFIG_VALUES_PATH temporary file)
        • ‘global’ section in ConfigMap
        • ‘{moduleName}’ section in ConfigMap
      • values ($VALUES_PATH temporary file)
        • ‘global values’ merged from:
          • ‘global’ section in modules/values.yaml
          • ‘global’ section in ConfigMap
          • patched with patches saved from previous global hooks
          • extra field ‘global.enabledModules’ contains a list of all enabled modules created by ‘discover modules’ step (4)
        • ‘module values’ merged from:
          • ‘{moduleName}’ section in modules/values.yaml
          • ‘{moduleName}’ section in modules/{moduleName}/values.yaml
          • ‘{moduleName}’ section in ConfigMap
          • patched with patches saved from previous module hooks
    • output
      • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
        • applied to ConfigMap just after the hook execution
      • values patches ($VALUES_JSON_PATCH_PATH temporary file)
        • saved in memory
    • events after execution
      • ‘global values changed’ if global values are changed
      • ‘modules values changed’ if module values are changed

11. ‘schedule’ event for global hook (see schedule)

  • hook execution is queued in “main” or in a named queue according to the binding configuration
  • queue handler runs a hook:
    • input
      • binding context ($BINDING_CONTEXT_PATH temporary file)
        • "snapshots" contains existed objects from other bindings
      • config ($CONFIG_VALUES_PATH temporary file)
        • ‘global’ section in ConfigMap
      • values ($VALUES_PATH temporary file)
        • ‘global values’ merged from:
          • ‘global’ section in modules/values.yaml
          • ‘global’ section in ConfigMap
          • patched with patches saved from previous global hooks
    • output
      • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
        • applied to ConfigMap just after the hook execution
      • values patches ($VALUES_JSON_PATCH_PATH temporary file)
        • saved in memory
    • events after execution
      • ‘global values changed’ if global section or *Enabled flags are changed

12. ‘schedule’ event for module hook (see schedule)

  • hook execution is queued in “main” or in a named queue according to the binding configuration
  • queue handler runs a hook:
    • input
      • binding context ($BINDING_CONTEXT_PATH temporary file)
        • "snapshots" contains existed objects from other bindings
      • config ($CONFIG_VALUES_PATH temporary file)
        • ‘global’ section in ConfigMap
        • ‘{moduleName}’ section in ConfigMap
      • values ($VALUES_PATH temporary file)
        • ‘global values’ merged from:
          • ‘global’ section in modules/values.yaml
          • ‘global’ section in ConfigMap
          • patched with patches saved from previous global hooks
          • extra field ‘global.enabledModules’ contains a list of all enabled modules created by ‘discover modules’ step (4)
        • ‘module values’ merged from:
          • ‘{moduleName}’ section in modules/values.yaml
          • ‘{moduleName}’ section in modules/{moduleName}/values.yaml
          • ‘{moduleName}’ section in ConfigMap
          • patched with patches saved from previous module hooks
    • output
      • config patches ($CONFIG_VALUES_JSON_PATCH_PATH temporary file)
        • applied to ConfigMap just after the hook execution
      • values patches ($VALUES_JSON_PATCH_PATH temporary file)
        • saved in memory
    • events after execution
      • trigger ‘global values changed’ if global values are changed
      • trigger ‘modules values changed’ if module values are changed

13. ‘global values changed’ event (see global hook)

  • create ‘Reload all modules’ task in the “main” queue (steps 3 to 8)

14. ‘module values changed’ event (see module hook)

  • create ‘module run’ task in the “main” queue
    • step 5 without onStartup and kubernetes@Synchronization hooks

15. ‘helm resources absent’ event (see auto-healing)

  • create ‘module run’ task in the “main” queue
    • step 5 without onStartup and kubernetes@Synchronization hooks

16. ConfigMap is changed (see ConfigMap/addon-operator)

  • values in global section are changed
    • create ‘Reload all modules’ task in the “main” queue (steps 3 to 8)
  • *Enabled flags are changed
    • create ‘Reload all modules’ task in the “main” queue (steps 3 to 8)
  • values in modules sections are changed
    • create ‘module run’ task in the “main” queue
      • step 5 without onStartup and kubernetes@Synchronization hooks

Module structure

A module is a directory with files. Addon-operator searches for the modules directories in /modules or in the paths specified by the $MODULES_DIR variable. The module has the same name as the corresponding directory excluding the numeric prefix.

An example of the file structure of the module:

/modules/001-simple-module
├── crds
│   ├── doc-ru-projects.yaml
│   ├── doc-ru-projecttemplate.yaml
│   ├── projects.yaml
│   ├── projecttemplate.yaml
├── hooks
│   ├── module-hook-1.sh
│   ├── ...
│   └── module-hook-N.sh
├── openapi
│   ├── config-values.yaml
│   └── values.yaml
├── templates
│   ├── config-maps.yaml
│   ├── ...
│   └── daemon-set.yaml
├── enabled
├── README.md
├── .helmignore
├── Chart.yaml
└── values.yaml
  • crds — a directory with crd files.
  • hooks — a directory with hooks.
  • openapiOpenAPI schemas for config values and for helm values.
  • enabled — a script that gets the status of module (is it enabled or not). See the modules discovery process.
  • Chart.yaml, .helmignore, templates — a Helm chart files.
  • README.md — an optional file with the module description.
  • values.yaml – default values for chart in a YAML format.

The name of this module is simple-module. values.yaml should contain a section simpleModule and a simpleModuleEnabled flag (see VALUES).

Notes on how Helm is used

values.yaml

Addon-operator does not use values.yaml as the only source of values for the chart. It generates a new file with a merged set of values (also mixing values from this file (see VALUES).

Chart.yaml

We recommend to define the “version” field in your Chart.yaml as “0.0.1” and use VCS to control versions. We also recommend to explicitly specify the “name” field even despite it is ignored: Addon-operator passes the module name to the Helm as a release name.

Releases deduplication

A module’s execution might be triggered by an event that does not change the values used by Helm templates (see modules discovery). Re-running Helm will lead to an “empty” release. To avoid this, Addon-operator runs a helm template command and compares a checksum of output with a saved checksum and starts the installation of a Helm chart only if there are changes.

Release auto-healing

The Addon-operator monitors resources defined by a Helm chart and triggers an update if something is deleted. This is useful for resources that Helm can’t update without deletion. It is worth noting, that resource deletion by hooks is smartly ignored to prevent needless updates.

Next

Hooks

A hook is an executable file that the Addon-operator executes when some event occurs. It can be a script or a compiled program written in any programming language.

The Addon-operator pursues an agreement stating that the information is transferred to hooks via files and results of hook’s execution are also stored in files. Paths to files are passed via environment variables. The output to stdout will be written to the log, except for the case with the configuration output (run with --config flag). Such an agreement simplifies the work with the input data and reporting the results of the hook execution.

Global hooks

Global hooks are stored in the $GLOBAL_HOOKS_DIR/hooks directory. The Addon-operator recursively searches all executable files in it (lib subdirectory ignored) and runs them with the --config flag. Each hook prints its events binding configuration in JSON or YAML format to stdout. If the execution fails, the Addon-operator terminates with the code of 1.

Bindings from shell-operator are available for global hooks: onStartup, schedule and kubernetes. The bindings to the events of the modules discovery process are also available: beforeAll and afterAll (see modules discovery).

During execution, a global hook receives global values. These values can be modified by the hook to share data with global hooks, module hooks, and Helm templates. If the hook changes global values, the ‘global values changed’ event is generated and all modules are reloaded. For details on values storage, see VALUES. See also an overview and a detailed description of ‘Reload all modules’ process.

Module hook

Module hooks are executable files stored in the hooks subdirectory of the module. During the ‘modules discovery’ process, if module appears to be enabled, the Addon-operator searches for executable files in hooks directory and executes them with --config flag. Each hook prints its event binding configuration in JSON or YAML format to stdout. The module discovery process restarts if an error occurs.

Bindings from shell-operator are available for module hooks: schedule and kubernetes. The bindings of the module lifecycle are also available: onStartup, beforeHelm, afterHelm, afterDeleteHelm — see module lifecycle.

During execution, a module hook receives global values and module values. Module values can be modified by the hook to share data with other hooks of the same module. If the hook changes module values, the ‘module values changed’ event is generated and then the module is reloaded. For details on values storage, see VALUES. See also a module lifecycle and a module run detailed description.

Bindings

Overview

BindingGlobal?Module?Info
onStartupOn Addon-operator startup
onStartupOn Addon-operator startup or module enablement
beforeAllBefore any modules are executed
afterAllAfter all modules are executed
beforeHelmBefore executing helm install
afterHelmAfter executing helm install
afterDeleteHelmAfter executing helm delete
scheduleRun on schedule
kubernetesRun on event from Kubernetes

onStartup

Example:

configVersion: v1
onStartup: ORDER

Parameters:

  • ORDER — an integer value that specifies an execution order. When added to the “main” queue, the hooks will be sorted by this value and then alphabetically by file name.

beforeAll

Example:

configVersion: v1
beforeAll: ORDER

Parameters:

  • ORDER — an integer value that specifies an execution order. When added to the “main” queue, the hooks will be sorted by this value and then alphabetically by file name.

afterAll

Example:

configVersion: v1
afterAll: ORDER

Parameters:

  • ORDER — an integer value that specifies an execution order. When added to the “main” queue, the hooks will be sorted by this value and then alphabetically by file name.

beforeHelm

Example:

configVersion: v1
beforeHelm: ORDER

Parameters:

  • ORDER — an integer value that specifies an execution order. When added to the “main” queue, the hooks will be sorted by this value and then alphabetically by file name.

afterHelm

Example:

configVersion: v1
afterHelm: ORDER

Parameters:

  • ORDER — an integer value that specifies an execution order. When added to the “main” queue, the hooks will be sorted by this value and then alphabetically by file name.

afterDeleteHelm

Example:

configVersion: v1
afterDeleteHelm: ORDER

Parameters:

  • ORDER — an integer value that specifies an execution order. When added to the “main” queue, the hooks will be sorted by this value and then alphabetically by file name.

schedule

See the schedule binding from the Shell-operator.

kubernetes

See the kubernetes binding from the Shell-operator.

Note: Addon-operator requires a ServiceAccount with the appropriate RBAC permissions. See addon-operator-rbac.yaml files in examples.

Execution on event

When an event associated with a hook is triggered, Addon-operator executes the hook without arguments and passes the global or module values from the storage of the values via temporary files. In response, a hook could return JSON patches to modify values. The detailed description of the storage of the values is available in VALUES document.

Binding context

The binding context is a piece of information about the event which caused the hook execution.

The $BINDING_CONTEXT_PATH environment variable contains the path to a file with a JSON array of structures with the following fields:

  • binding is a string from the name parameter for schedule or kubernetes bindings. Its value is a binding type if the parameter is not set and for other hooks. For example, the binding context for beforeAll binding type:
[{"binding":"beforeAll"}]

The binding context for schedule and kubernetes hooks contains additional fields, described in Shell-operator documentation.

beforeAll and afterAll global hooks and beforeHelm, afterHelm, and afterDeleteHelm module hooks are executed with the binding context that includes a snapshots field, which contains all Kubernetes objects that match hook’s kubernetes bindings configurations.

For example, a global hook with kubernetes and beforeAll bindings may have this configuration:

configVersion: v1
beforeAll: 10
kubernetes:
- name: monitor-pods
  apiVersion: v1
  kind: Pod
  jqFilter: ".metadata.labels"

This hook will be executed before updating the Helm release with this binding context:

[{"binding": "beforeAll",
"snapshots": {
  "monitor-pods": [
    {
      "object": {
        "kind": "Pod",
        "apiVersion": "v1",
        "metadata": {
          "name":"pod-1r62e3",
          "namespace":"default", ...},
        ...
      },
      "filterResult": {
        "label1": "label value",
        ...
      },
    },
    ...
    more pods
    ...
  ]
}
}]

Synchronization for global hooks

Synchronization is the first run of global hooks with “kubernetes” bindings. As with the Shell-operator, it executes right after the successful completion of global “onStartup” hooks, but the following behavior is slightly different. By default, the Addon-operator executes “beforeAll” hooks after the completion of hooks with executeHookOnSynchronization: true. Set waitForSynchronization: false to execute these hooks in parallel with “beforeAll” hooks.

For example, a global hook with kubernetes and beforeAll bindings may have this configuration:

configVersion: v1
beforeAll: 10
kubernetes:
- name: monitor-pods
  apiVersion: v1
  kind: Pod
  jqFilter: ".metadata.labels"
- name: monitor-nodes
  apiVersion: v1
  kind: Node
  jqFilter: ".metadata.labels"
  queue: nodes-handling
  executeHookOnSynchronization: false
- name: monitor-cms
  apiVersion: v1
  kind: ConfigMap
  jqFilter: ".metadata.labels"
  queue: config-map-handling
  waitForSynchronization: false
- name: monitor-secrets
  apiVersion: v1
  kind: Secret
  jqFilter: ".metadata.labels"
  queue: secrets-handling
  executeHookOnSynchronization: false
  waitForSynchronization: false

This hook will be executed after “onStartup” as follows:

  • Run hook with binding context for the “monitor-pods” binding in the “main” queue.
  • Fill snapshot for the “monitor-nodes” binding, do not execute hook.
  • Run in parallel:
    • hook with the “beforeAll” binding context in the “main” queue
    • hook with the “monitor-cms” binding context in the “config-map-handling” queue
    • fill snapshot for the “monitor-secrets” binding.

Note: there is no guarantee that the “beforeAll” binding context contains snapshots with ConfigMaps and Secrets.

Synchronization for module hooks

Synchronization is the first run of module hooks with “kubernetes” bindings after module enablement. It executes right after the successful completion of the module’s “onStartup” hooks. By default, the Addon-operator executes “beforeHelm” hooks after the completion of hooks with executeHookOnSynchronization: true. Set waitForSynchronization: false to execute these hooks in parallel with “beforeHelm” hooks.

For example, a module hook with kubernetes and beforeHelm bindings may have this configuration:

configVersion: v1
beforeHelm: 10
kubernetes:
- name: monitor-pods
  apiVersion: v1
  kind: Pod
  jqFilter: ".metadata.labels"
- name: monitor-nodes
  apiVersion: v1
  kind: Node
  jqFilter: ".metadata.labels"
  queue: nodes-handling
  executeHookOnSynchronization: false
- name: monitor-cms
  apiVersion: v1
  kind: ConfigMap
  jqFilter: ".metadata.labels"
  queue: config-map-handling
  waitForSynchronization: false
- name: monitor-secrets
  apiVersion: v1
  kind: Secret
  jqFilter: ".metadata.labels"
  queue: secrets-handling
  executeHookOnSynchronization: false
  waitForSynchronization: false

This hook will be executed after “onStartup” as follows:

  • Run hook with binding context for the “monitor-pods” binding in the “main” queue.
  • Fill snapshot for the “monitor-nodes” binding, do not execute hook.
  • Run in parallel:
    • hook with the “beforeHelm” binding context in the “main” queue
    • hook with the “monitor-cms” binding context in the “config-map-handling” queue
    • fill snapshot for the “monitor-secrets” binding

Note: there is no guarantee that the “beforeHelm” binding context contains snapshots with ConfigMaps and Secrets.

Execution rate

Hook configuration has a settings section with parameters executionMinPeriod and executionBurst. These parameters are used to throttle hook executions and wait for more events in the queue. See section execution rate from the Shell-operator.

Values storage

The Addon-operator provides the storage for the values that will be passed to the Helm chart. You may find out more about the chart values concept in the Helm documentation: values files. Global and module hooks have access to the values in the storage and can change them.

The storage is a hash-like data structure. The global key contains all global values – they are passed to every hook and available to all Helm charts. Only global hooks may change global values.

The other keys must match the module’s name converted to camelCase. Each key stores the object with module values. These values are only available to hooks, enabled script of this module, and to its Helm chart. Only module hooks can change the values of the module.

Note: You cannot get the values of another module within the module hook. Shared values should be global values for now (#9).

Hook receives values via files on execution. These schemas can help you understand the flow of values for a global hook and for a module hook:

Flow of values for module hook

Flow of values for global hook

The values can be represented as:

  • a structure (including empty structure)
  • a list (including empty list)

Structures and lists must be JSON-compatible since hooks receive values at runtime as JSON files (see using values in hook).

Note: each module has an additional key with Enabled suffix and a boolean value to enable or disable the module (e.g., ingressNginxEnabled: false). This key is handled by modules discovery process.

values.yaml

On start-up, the Addon-operator loads values into storage from values.yaml files:

  • $MODULES_DIR/values.yaml
  • values.yaml files in modules directories — only the values from key with camelCase name of the module

An example of global values in $MODULES_DIR/values.yaml:

global:
  param1: value1
  param2: value2
simpleModule:
  modParam1: value3

An example of module values in $MODULES_DIR/001-simple-module/values.yaml:

simpleModule:
  modParam1: value1
  modParam2: value2

ConfigMap/addon-operator

There is a key global in the ConfigMap/addon-operator that contains global values and the keys with module values. The values are stored in these keys as the YAML encoded strings. Values in the ConfigMap/addon-operator override the values loaded from values.yaml files.

The Addon-operator monitors changes in the ConfigMap/addon-operator and starts the ‘reload all modules’ process in case of global values changes or ‘module run’ process if only the module section is changed. See LIFECYCLE.

An example of ConfigMap/addon-operator:

data:
  global: | # vertical bar is required here
    param1: newValue
    param3: valu3
  simpleModule: | # module name should be in camelCase
    modParam2: newValue2
  anotherModule: "false" # `false' value disables a module

Update values

Hooks can update values in the storage. To do that the hook returns a JSON Patch.

A hook can update values in the ConfigMap/addon-operator so that the updated values would be available after restarting the Addon-operator (long-term update). For example, you may store generated passwords or certificates.

Patch for a long-term update is returned via the $CONFIG_VALUES_JSON_PATCH_PATH file and after hook execution, the Addon-operator immediately applies this patch to the values in ConfigMap/addon-operator.

Another option is to store updated values for a period while the Addon-operator process is running. For example, you may store the results of the discovery of cluster resources or parameters.

Patch for temporary updates is returned via the $VALUES_JSON_PATCH_PATH file and remains in the Addon-operator volatile memory.

Merged values

When the hook or enabled script is about to be executed, or a Helm chart is to be installed, the Addon-operator generates a merged set of values. This merged set combines:

  • global values from values.yaml files and ConfigMap/addon-operator;
  • module values from the values.yaml files and ConfigMap/addon-operator;
  • patches for the temporary updates are applied.

The merged values are passed as the temporary JSON file to hooks or enabled script and as the temporary values.yaml file to the helm install.

Using values in the hook

When the hook is triggered by an event, the values are passed to it via JSON files. The hook can use environment variables to get paths of those files:

  • $CONFIG_VALUES_PATH — this file contains values from the ConfigMap/addon-operator.
  • $VALUES_PATH — this file contains merged values.

For global hooks, only global values are available.

For module hooks the global values and the module values are available. Also, the enabledModules field is added to the global values in the $VALUES_PATH file. It contains the list of all enabled modules in the order of execution (see module lifecycle).

To change the values, the hook must return JSON patches via the result files. The hook can use environment variables to get paths of those files:

  • $CONFIG_VALUES_JSON_PATCH_PATH — hook should write a patch for ConfigMap/addon-operator into this file.
  • $VALUES_JSON_PATCH_PATH — hook should write a patch for a temporary update of parameters into this file.

Using the values in enabled scripts

The enabled script works with values in the read-only mode. It receives values in JSON files. The script can use environment variables to get paths of those files:

  • $CONFIG_VALUES_PATH — this file contains values from ConfigMap/addon-operator.
  • $VALUES_PATH — this file contains merged values.

The enabledModules field with the list of previously enabled modules is added to the global key in the $VALUES_PATH file.

Using values in Helm charts

Helm chart of the module has access to the merged values similar to the $VALUES_PATH but without enabledModules field.

The Helm template’s variable .Values allows you to use values in the templates:

{{ .Values.global.param1 }}

{{ .Values.moduleName.modParam2 }}

Example

Let’s assume the following values are defined:

$ cat modules/values.yaml:

global:
  param1: 100
  param2: "Yes"

$ cat modules/01-some-module/values.yaml

someModule:
  param1: "String"

$ kubectl -n addon-operator get cm/addon-operator -o yaml

data:
  global: |
    param1: 200
  someModule: |
    param1: "Long string"
    param2: "FOO"

The Addon-operator generates the following files with values:

$ cat $CONFIG_VALUES_PATH

{"global":{
    "param1":200
}, "someModule":{
    "param1":"Long string",
    "param2": "FOO"
}}

$ cat $VALUES_PATH

{"global":{
    "param1":200,
    "param2": "YES"
}, "someModule":{
    "param1":"Long string",
    "param2": "FOO"
}}

A hook adds a new value with the help of a JSON patch:

$ cat /modules/001-some-module/hooks/hook.sh

#!/usr/bin/env bash
...
cat > $CONFIG_VALUES_JSON_PATCH_PATH <<EOF
    [{"op":"add", "path":"/someModule/param3", "value":"newValue"}]
EOF
...

Now the ConfigMap/addon-operator has the following content:

data:
  global: |
    param1: 200
  someModule: |
    param1: "Long string"
    param2: "FOO"
    param3: "newValue"

Next time the hook is executed, the Addon-operator would generate the following files with values:

$ cat $CONFIG_VALUES_PATH

{"global":{
    "param1":200
},
"someModule":{
    "param1":"Long string",
    "param2": "FOO",
    "param3": "newValue"
}}

$ cat $VALUES_PATH

{"global":{
    "param1":200,
    "param2": "YES"
}, "someModule":{
    "param1":"Long string",
    "param2": "FOO",
    "param3": "newValue"
}}

Helm chart template replicas: {{ .Values.global.param1 }} would generate the string replicas: 200. As you can see, the value “100” from the values.yaml is replaced by “200” from the ConfigMap/addon-operator.

Validation

The addon-operator supports OpenAPI schemas for config values and for effective values. These schemas should be stored in the $GLOBAL_HOOKS_DIR/openapi directory for global values and in the $MODULES_DIR/<module-name>/openapi directories for modules.

openapi/config-values.yaml is a schema for values merged from values.yaml, modules/values.yaml and the ConfigMap.

openapi/values.yaml is a schema for values merged from values.yaml, modules/values.yaml and the ConfigMap with applied values patches.

Validation occurs on startup, on ConfigMap changes, and after hook executions. If validation fails after hook execution, hook is restarted. If validation fails on startup, the addon-operator stops. If validation fails on ConfigMap changes, error is logged and no new tasks are queued.

Note: Unlike the default behavior, the addon-operator sets additionalProperties: false if additionalProperties is not set.

Example

# /global/openapi/config-values.yaml

type: object
additionalProperties: false
required:
  - project
  - clusterName
minProperties: 2
properties:
  project:
    type: string
  clusterName:
    type: string
  clusterHostname:
    type: string
  discovery:
    type: object

This schema defines 2 required fields for ‘global’ values: project and clusterName. clusterHostname field is an optional string. discovery is an optional object with no restrictions on keys.

Consider this ConfigMap/addon-operator content:

metadata:
...
data:
  global: |
    project: myProject
  moduleOne: |
    param1: value1
...

This ConfigMap has invalid ‘global’ values, and the addon-operator stops with an error on startup.

Consider valid ConfigMap/addon-operator and this config patch from global hook:

[{"op":"add", "path":"/global/clusterHostname", "value":"{}"}]

This patch sets clusterHostname field in the ‘global’ section. It is not allowed because schema defines clusterHostname as a string. This situation is handled like a hook execution error, the hook stays in queue and restarts with exponential backoff (see LIFECYCLE.

Extending

Values are config values with applied patches, so schema in values.yaml should contain duplicates of properties from config-values.yaml schema. There is a technique with allOf to reduce duplicates, but it will not eliminate duplicates when additionalProperties: false. To overcome this problem, we implement custom property x-extend for values.yaml schema.

If values.yaml schema contains x-extend field, shell-operator extends fields in values.yaml schema with fields from config-values.yaml schema:

  • definitions
  • required
  • properties
  • patternProperties
  • title
  • description

Also, “x-*“ properties copied from config-values.yaml schema.

Example

Consider these OpenAPI schemas:

# /global/openapi/config-values.yaml

type: object
additionalProperties: false
required:
  - project
  - clusterName
properties:
  project:
    type: string
  clusterName:
    type: string
  clusterHostname:
    type: string
# /global/openapi/values.yaml

x-extend:
  schema: config-values.yaml
type: object
additionalProperties: false
required:
  - discovery
  - param1
properties:
  discovery:
    type: object
  param1:
    type: string

The addon-operator will validate values with this effective schema:

# effective schema for values

type: object
additionalProperties: false
required:
  - project
  - clusterName
  - discovery
  - param1
properties:
  project:
    type: string
  clusterName:
    type: string
  clusterHostname:
    type: string
  discovery:
    type: object
  param1:
    type: string

Defaults

The addon-operator respects default key in schemas and apply defaults when merge values.

Example

Consider this schema for global values:

# /global/openapi/values.yaml

x-extend:
  schema: config-values.yaml
type: object
additionalProperties: false
required:
  - param1
properties:
  discovery:
    type: object
    default:
      {}
  param1:
    type: string

The addon-operator will add discovery with empty object to values if no discovery key is present in the ConfigMap, modules/values.yaml or in patches.

Required fields

There is a problem with required fields defined in openapi/values.yaml: values for Helm can be constructed by multiple hooks. Different hooks return different portions of required fields and validation will fail on hook execution. To define a contract for Helm values in this situation, the addon-operator implements x-required-for-helm to define required values for Helm. Values are checked before helm execution with x-required-for-helm array merged with required.

Example

Suppose we have two hooks: one hook prepares a param1 value and the second hook prepares a param2 value. Helm required both fields, but we can’t require both fields after each hook execution. x-required-for-helm to the rescue:

# /global/openapi/values.yaml

type: object
x-required-for-helm:
  - param1
  - param2
properties:
  param1:
    type: string
  param2:
    type: string

The addon-operator will validate values after each hook execution with this effective schema:

# effective schema for values

type: object
additionalProperties: false
properties:
  param1:
    type: string
  param2:
    type: string

The addon-operator will validate values before Helm execution with this effective schema:

# effective schema for values

type: object
additionalProperties: false
required:
  - param1
  - param2
properties:
  param1:
    type: string
  param2:
    type: string

Addon-operator metrics

The Addon-operator implements Prometheus target at /metrics endpoint. The default port is 9650.

Metrics

  • addon_operator_binding_count{module="", hook=""} — a gauge with bindings count for every hooks. Global hooks has empty “module” label.

  • addon_operator_config_values_errors_total{} — a counter of ConfigMap validation errors after kubectl edit. See validation.

  • addon_operator_global_hook_run_seconds{hook="", binding="", activation="", queue=""} — a histogram with hook execution times. “hook” label is a name of the hook, “binding” is a binding name from configuration, “queue” is a queue name where hook is queued and “activation” is an event that triggers hook execution.

  • addon_operator_global_hook_run_errors_total{hook="", binding="", activation="", queue=""} – this is the counter of hooks’ execution errors. It only tracks errors of hooks with the disabled allowFailure (i.e. respective key is omitted in the configuration or the allowFailure: false parameter is set). This metric has a “hook” label with the name of a failed hook.

  • addon_operator_global_hook_run_allowed_errors_total{hook="", binding="", activation="", queue=""} – this is the counter of hooks’ execution errors. It only tracks errors of hooks that are allowed to exit with an error (the parameter allowFailure: true is set in the configuration). The metric has a “hook” label with the name of a failed hook.

  • addon_operator_global_hook_run_success_total{hook="", binding="", activation="", queue=""} – this is the counter of hooks’ success execution. The metric has a “hook” label with the name of a succeeded hook.

  • addon_operator_global_hook_run_sys_cpu_seconds{hook="", binding="", activation="", queue=""} — a histogram with global hook system cpu seconds.

  • addon_operator_global_hook_run_user_cpu_seconds{hook="", binding="", activation="", queue=""} — a histogram with global hook user cpu seconds.

  • addon_operator_global_hook_run_max_rss_bytes{hook="", binding="", activation="", queue=""} — a gauge with global hook max rss usage in bytes.

  • addon_operator_module_hook_run_seconds{module="", hook="", binding="", activation="", queue=""} — a histogram with module hook execution times. “module” label is a name of the module, “hook” label is a name of the hook, “binding” is a binding name from configuration, “queue” is a queue name where hook is queued and “activation” is an event that triggers hook execution.

  • addon_operator_module_hook_run_errors_total{module="", hook="", binding="", activation="", queue=""} – this is the counter of hooks’ execution errors. It only tracks errors of hooks with the disabled allowFailure (i.e. respective key is omitted in the configuration or the allowFailure: false parameter is set). This metric has a “hook” label with the name of a failed hook.

  • addon_operator_module_hook_run_allowed_errors_total{module="", hook="", binding="", activation="", queue=""} – this is the counter of hooks’ execution errors. It only tracks errors of hooks that are allowed to exit with an error (the parameter allowFailure: true is set in the configuration). The metric has a “hook” label with the name of a failed hook.

  • addon_operator_module_hook_run_success_total{module="", hook="", binding="", activation="", queue=""} – this is the counter of hooks’ success execution. The metric has a “hook” label with the name of a succeeded hook.

  • addon_operator_module_hook_run_sys_cpu_seconds{module="", hook="", binding="", activation="", queue=""} — a histogram with module hook system cpu seconds.

  • addon_operator_module_hook_run_user_cpu_seconds{module="", hook="", binding="", activation="", queue=""} — a histogram with module hook user cpu seconds.

  • addon_operator_module_hook_run_max_rss_bytes{module="", hook="", binding="", activation="", queue=""} — a gauge with module hook max rss usage in bytes.

  • addon_operator_module_discover_errors_total – a counter of errors during the modules discover process. It increases in these cases:

    • an ‘enabled’ script is executed with an error
    • a module hook return an invalid configuration
    • a call to the Kubernetes API ends with an error (for example, retrieving Helm releases).
  • addon_operator_module_run_errors_total{module=x} – counter of errors on module start-up.

  • addon_operator_module_delete_errors_total{module=x} – counter of errors on module deletion.

  • addon_operator_module_run_seconds{module=""} — a histogram with module execution timings.

  • addon_operator_module_helm_seconds{module="", activation=""} — a histogram of module’s helm upgrade timings.

  • addon_operator_helm_operation_seconds{module="", activation="", operation=""} — a histogram of different helm operations timings.

  • addon_operator_convergence_seconds{activation=onStartup} — a counter of seconds spent to execute “reload all modules” processes. “activation=OnStartup” label value can be used to retrieve information about first “reload all modules” when operator starts.

  • addon_operator_convergence_total{activation=onStartup} — a counter of “reload all modules” processes.

  • addon_operator_tasks_queue_length{queue=""} – a gauge showing the length of the working queue. This metric can be used to warn about stuck hooks. It has the “queue” label with the queue name.

  • addon_operator_task_wait_in_queue_seconds_total{module="", hook="", binding="", queue=""} — a counter with seconds that the task is elapsed in the queue.

  • addon_operator_live_ticks – a counter that increases every 10 seconds. This metric can be used for alerting about an unhealthy Addon-operator. It has no labels.

  • addon_operator_kube_jq_filter_duration_seconds{module="", hook="", binding="", queue="", kind=""} — a histogram with jq filter timings.

  • addon_operator_kube_event_duration_seconds{module="", hook="", binding="", queue="", kind=""} — a histogram with kube event handling timings.

  • addon_operator_kube_snapshot_objects{module="", hook="", binding="", queue=""} — a gauge with count of cached objects (the snapshot) for particular binding. “module” label is empty for global hook.

  • addon_operator_kube_snapshot_bytes{module="", hook="", binding="", queue=""} — a gauge with size in bytes of cached objects for particular binding. Each cached object contains a Kubernetes object and/or result of jqFilter depending on the binding configuration. The size is a sum of the length of Kubernetes object in JSON format and the length of jqFilter‘s result in JSON format.

  • addon_operator_kubernetes_client_request_result_total — a counter of requests made by kubernetes/client-go library.

  • addon_operator_kubernetes_client_request_latency_seconds — a histogram with latency of requests made by kubernetes/client-go library.

  • addon_operator_tasks_queue_action_duration_seconds{queue_name="", queue_action=""} — a histogram with measurements of low level queue operations. Use QUEUE_ACTIONS_METRICS=”no” to disable this metric.

Custom metrics

Hooks can export metrics by writing a set of operations in JSON format into $METRICS_PATH file.

Operation to register a counter and increase its value:

{"name":"metric_name","action":"add","value":1,"labels":{"label1":"value1"}}

Operation to register a gauge and set its value:

{"name":"metric_name","action":"set","value":33,"labels":{"label1":"value1"}}

Operation to register a histogram and observe a duration:

{"name":"metric_name","action":"observe","value":42, "buckets": [1,2,5,10,20,50], "labels":{"label1":"value1"}}

Labels are not required, but Shell-operator adds a hook label with a path to a hook script relative to hooks directory.

Several metrics can be exported at once. For example, this script will create 2 metrics:

echo '{"name":"hook_metric_count","action":"add","value":1,"labels":{"label1":"value1"}}' >> $METRICS_PATH
echo '{"name":"hook_metrics_items","action":"add","value":1,"labels":{"label1":"value1"}}' >> $METRICS_PATH

The metric name is used as-is, so several hooks can export the same metric name. It is responsibility of hooks‘ developer to maintain consistent label cardinality.

There are fields “add” and “set” that can be used as shortcuts for action and value. This feature may be deprecated in future releases.

{"name":"metric_name","add":1,"labels":{"label1":"value1"}}

Note that there is no mechanism to expire this kind of metrics except the addon-operator restart. It is the default behavior of prometheus-client.

Grouped metrics

The common cause to expire a metric is a removed object. It means that the object is no longer in the snapshot, and the hook can’t identify the metric that should be expired.

To solve this, use the “group” field in metric operations. When Shell-operator receives operations with the “group” field, it expires previous metrics with the same group and applies new metric values. This grouping works across hooks and label values.

echo '{"group":"group1", "name":"hook_metric_count",  "action":"add", "value":1, "labels":{"label1":"value1"}}' >> $METRICS_PATH
echo '{"group":"group1", "name":"hook_metrics_items", "action":"add", "value":1, "labels":{"label1":"value1"}}' >> $METRICS_PATH

To expire all metrics in a group, use action “expire”:

{"group":"group_name_1", "action":"expire"}

WARNING: “observe” is currently an unsupported action for grouped metrics

Example

hook1.sh returns these metrics:

echo '{"group":"hook1", "name":"hook_metric", "action":"add", "value":1, "labels":{"kind":"pod"}}' >> $METRICS_PATH
echo '{"group":"hook1", "name":"hook_metric", "action":"add", "value":1, "labels":{"kind":"replicaset"}}' >> $METRICS_PATH
echo '{"group":"hook1", "name":"hook_metric", "action":"add", "value":1, "labels":{"kind":"deployment"}}' >> $METRICS_PATH
echo '{"group":"hook1", "name":"hook1_special_metric", "action":"set", "value":12, "labels":{"label1":"value1"}}' >> $METRICS_PATH
echo '{"group":"hook1", "name":"common_metric", "action":"set", "value":300, "labels":{"source":"source3"}}' >> $METRICS_PATH
echo '{"name":"common_metric", "action":"set", "value":100, "labels":{"source":"source1"}}' >> $METRICS_PATH

hook2.sh returns these metrics:

echo '{"group":"hook2", "name":"hook_metric","action":"add", "value":1, "labels":{"kind":"configmap"}}' >> $METRICS_PATH
echo '{"group":"hook2", "name":"hook_metric","action":"add", "value":1, "labels":{"kind":"secret"}}' >> $METRICS_PATH
echo '{"group":"hook2", "name":"hook2_special_metric", "action":"set", "value":42}' >> $METRICS_PATH
echo '{"name":"common_metric", "action":"set", "value":200, "labels":{"source":"source2"}}' >> $METRICS_PATH

Prometheus scrapes these metrics:

# HELP hook_metric hook_metric
# TYPE hook_metric counter
hook_metric{hook="hook1.sh", kind="pod"} 1 -------------------+---------- group:hook1
hook_metric{hook="hook1.sh", kind="replicaset"} 1 ------------+
hook_metric{hook="hook1.sh", kind="deployment"} 1 ------------+
hook_metric{hook="hook2.sh", kind="configmap"} 1  ------------|-------+-- group:hook2
hook_metric{hook="hook2.sh", kind="secret"} 1 ----------------|-------+
# HELP hook1_special_metric hook1_special_metric              |       |
# TYPE hook1_special_metric gauge                             |       |
hook1_special_metric{hook="hook1.sh", label1="value1"} 12 ----+       |
# HELP hook2_special_metric hook2_special_metric              |       |
# TYPE hook2_special_metric gauge                             |       |
hook2_special_metric{hook="hook2.sh"} 42 ---------------------|-------'
# HELP common_metric common_metric                            |
# TYPE common_metric gauge                                    |
common_metric{hook="hook1.sh", source="source3"} 300 ---------'
common_metric{hook="hook1.sh", source="source1"} 100 ---------------+---- no group
common_metric{hook="hook2.sh", source="source2"} 200 ---------------'

On next execution of hook1.sh values for hook_metric{kind="replicaset"}, hook_metric{kind="deployment"}, common_metric{source="source3"} and hook1_special_metric are expired and hook returns only one metric:

echo '{"group":"hook1", "name":"hook_metric", "action":"add", "value":1, "labels":{"kind":"pod"}}' >> $METRICS_PATH

Addon-operator expires previous values for group “hook1” and updates value for hook_metric{hook="hook1.sh", kind="pod"}. Values for group hook2 and common_metric without group are left intact. Now Prometheus scrapes these metrics:

# HELP hook_metric hook_metric
# TYPE hook_metric counter
hook_metric{hook="hook1.sh", kind="pod"} 2 --------------- group:hook1
hook_metric{hook="hook2.sh", kind="configmap"} 1 ----+---- group:hook2
hook_metric{hook="hook2.sh", kind="secret"} 1 -------+
# HELP hook2_special_metric hook2_special_metric     |
# TYPE hook2_special_metric gauge                    |
hook2_special_metric{hook="hook2.sh"} 42 ------------'
# HELP common_metric common_metric
# TYPE common_metric gauge
common_metric{hook="hook1.sh", source="source1"} 100 --+-- no group
common_metric{hook="hook2.sh", source="source2"} 200 --'

Next execution of hook2.sh expires all metrics in group ‘hook2’:

echo '{"group":"hook2", "action":"expire"}' >> $METRICS_PATH

Shell-operator expires previous values for group “hook2” but leaves common_metrics for “hook2.sh” as is. Now Prometheus scrapes these metrics:

# HELP hook_metric hook_metric
# TYPE hook_metric counter
hook_metric{hook="hook1.sh", kind="pod"} 2 --------------- group:hook1
# HELP common_metric common_metric
# TYPE common_metric gauge
common_metric{hook="hook1.sh", source="source1"} 100 --+-- no group
common_metric{hook="hook2.sh", source="source2"} 200 --'