Tempus is the time-based algorithm in Moogsoft AIOps which clusters alerts into Situations based on the similarity of their timestamps.
The underlying premise of Tempus is that when things go wrong, they go wrong together. For example, if a core element of your network infrastructure such as a switch fails and disconnects then it affects a lot of other interconnected elements and send events at a similar time.
Tempus uses the Jaccard index to calculate the similarity of different alerts. It also uses community detection methods to identify which alerts with similar arrival patterns it should cluster into Situations.
As Tempus is time-based, you should not be use it to detect events relating to the slow or gradual degradation of a service from disks filling up or CPU usage.
One advantage of Tempus is it only uses event timestamps for clustering so no alert enrichment is required.
AIOps applies Tempus incrementally to alerts as it ingests them so that it can create Situations in real-time.
The diagrams below show how Tempus sorts and the groups alerts with similar timestamps into Situations:
Raw alerts from either the AlertBuilder or Alert Rules Engine arrive over a period of time. These are shown as gray dots in the diagram below:
Tempus identifies and sorts which alerts have similar arrival patterns:
Alerts with similar arrival patterns are clustered into Situations:
Tempus is configured and tuned using parameters in
moog_farmd.conf. The Moolet parameters configure general information about each Sigaliser. The Output parameters control where the output processed by Tempus originates from. The Trigger and Sigalising parameters control the Sigaliser execution and duration.
The parameters that relate to the Tempus Moolet are as follows:
run_on_startup: Determines whether Tempus runs when Moogsoft AIOps starts. If enabled, Tempus captures all alerts from the moment the system starts, without you having to configure or start it manually.
persist_state: Enables Tempus to save its state for High Availability systems so if a failover occurs, the second moogfarmd can continue from the same point.
description: Describes the Situation produced by the Sigaliser.
A Tempus (a.k.a. Sigaliser V2) Situation
The default Tempus parameters are as follows:
classname are hardcoded and should not be changed.
These parameters control the output processed by the Sigaliser:
process_output_of: Defines the Moolet source of the alerts that Tempus processes. By default, the Sigaliser connects directly to the Alert Builder and Alert Rules Engine is only being used if automations are desired prior to Situation resolution.
AlertBuilder, AlertRulesEngine, MaintenanceWindowManager, EmptyMoolet
entropy_threshold: Sets the minimum entropy value for an alert to be clustered into a Situation. Tempus does not include any alerts with an entropy value below the threshold in Situations. Set to a value between 0.0 and 1.0. The default of 0.0 means all alerts are processed.
The default output parameters are as follows:
Trigger and Sigalising Window Parameters
The execution and duration of Tempus is controlled by the trigger, window and bucket parameters:
- The sig_interval trigger determines when Tempus starts to run
- The window is the total span of time in seconds in which Alerts will be analyzed each time Tempus runs
- Time buckets are small five-second subdivisions of the window in which the Alerts are captured.
sig_interval: Executes the Tempus algorithm after a defined number of seconds. In the example above, the Sigaliser will run every 120 seconds (two minutes).
window_size: Determines the length of time of the window in which Alerts are analysed and a Situation develops each time the Sigaliser is run. By default the Sigalising window is 1200 seconds (20 minutes).
bucket_size: Determines the time span of each bucket in which Alerts are captured in seconds. By default each bucket is five seconds long so there will be 240 buckets per window.
Moogsoft do not recommend you change the bucket size. If you do want to change the
bucket_size then change with caution because Tempus is designed to use small bucket sizes
arrival_spread: Sets the acceptable latency or arrival window for each Alert in seconds. This can be used to minimise or reduce the impact of multiple Alerts arriving over a small amount of time and landing in separate buckets.
min_arrival_similarity: Determines how similar Alerts must be to be consider for clustering. This is useful way to determine what proportion of the events two Alerts need to share to have a similar pattern of arrival. By default this is 0.6667 which means Tempus will disregard any Alerts with less than two-thirds similarity.
The default trigger and sigalising window parameters are as follows:
Partitioning is set to 'null' by default. There are two methods to partition data into Situations. The first is 'partition_by' which splits the clusters according to the parameters specified. The second is 'pre_partition', which splits the incoming event stream before clustering.
Pre-partitioning is recommended as it does not interfere with the results of the clustering algorithms
partition_by: After clustering has taken place and before you enter merging and resolution, you can split clusters into sub-clusters based on a component of the events. For example, you can use the
manager parameter to ensure the Situations only contain events from the same manager. In general, and by default, you should comment out the
Partitioning by components is not recommended
pre_partition: An alternative way of partitioning is to use
pre_partition which allows you to specify a component field (from the list of specified components) around which the event stream will be partitioned before clustering occurs. The Alerts in the resulting Situations will each contain a single value for the component field chosen.
You can configure Tempus to only create Situations from alerts that meet a certain degree of constant significance based upon Poisson distribution calculations.
significance_test: Calculation that determines how significant a cluster of alerts or potential Situation must be for Tempus to detect it. The default,
Poisson1, looks at the data of a single alert cluster to calculate how significant it is. The default is more likely to detect all significant alert clusters but with a higher risk of creating insignificant alert clusters. Use this option when your alerts originate from different networks or unrelated topologies.
Poisson2 is a more thorough test that looks at an alert cluster and all alerts outside the cluster with a similar event rate. It is more likely to exclude all insignificant alert clusters but with a high risk of excluding significant alert clusters. Use this option if you expect all of your alerts to come from the same connected network.
significance_threshold: Sets the maximum significance score in order for Tempus to create a Situation. The score is proportional to the probability that the alert cluster or potential Situation was coincidence. The lower the score, the more significant the cluster and the least likely it was a coincidence. The
significance_threshold score ranges from 0-100.
Tempus appears in
moog_farmd.conf as shown below: