Page tree

For Moogsoft AIOps 7.2 and later, see
For Moogsoft AIOps 7.1 and prior, see

Skip to end of metadata
Go to start of metadata

High Availability (HA) deployments of Moogsoft AIOps comprise multiple instances of Moogsoft AIOps, moogfarmd, and the associated LAMs to minimize downtime and data loss. Component redundancy protects against single points of failure. It also provides reliable mechanisms to enable failover from one component to another to avoid performance degradation and data loss.

See HA - Deployment Scenarios for deployment examples.

HA Components

Moogsoft AIOps is made up of the following set of processes and services components, which can be implemented in a distributed environment: HA in Moogsoft AIOps

  • Event ingestion (LAMs)
  • Event processing (moog_farmd)
  • User interface (Nginx, servlets running in Tomcat and Elasticsearch)
  • RabbitMQ broker (MooMS messaging system)
  • Database (MySQL 5.7)

See HA - Setup for Dependencies for more information on how to set these up for distributed installations.

Introducing component redundancy (for example two identically configured event processing (moog_farmd) components) makes HA system architecture possible. Implementing HA system architecture enables failover of Moogsoft AIOps components without loss of data or performance.

Failover of Moogsoft AIOps components is manually triggered using the ha_cntl command line utility which also allows the status of all components in the HA installation to be viewed. 

Moogsoft AIOps HA Features at a Glance

  • LAMs, moog_farmd and Tomcat servlets can run in 'Active' mode (normal operation) or 'Passive' mode (not processing messages)
  • Instance/Process Group/Cluster naming convention for LAMs, moog_farmd and Tomcat servlets to build logical groupings for failover scenarios
  • ha_cntl utility to show running HA components and to trigger manual failovers (by setting Instances/Process Groups/Clusters to Active or Passive)
  • 'Leader' capability to allow only defined Instances to become Active when their parent Cluster/Process Group becomes Active
  • Moolet state sharing ability (persistence) for moog_farmd to facilitate data integrity during failover of moog_farmd
  • MySQL 'failover' connection definition to allow Moogsoft AIOps components to failover to backup MySQL servers if the primary connection goes down
  • Handling for the UI to continue normal operation in the event of a UI failover
  • Self Monitoring pages in UI and moog_monitor command line utility show HA information
  • Product installation using split RPMs (by functional component) for easier distributed deployment

Moogsoft AIOps HA architecture key concepts

ComponentAn instance of a Moogsoft AIOps LAM, servlet or moog_farmd. HA introduces redundancy on a component, Process Group or Cluster level.
InstanceA name for each Moogsoft AIOps component.
Process GroupA group of one or more of the same type of Moogsoft AIOps components (such as a group of load sharing socket LAMs). All moog_farmd components in a Process Group must have identical configuration.
ClusterOne or more Process Groups. A Cluster must contain at least one Process Group.
ZoneAny number of Clusters, Process Groups and Instances can be defined within a single MooMS 'Zone' (RabbitMQ broker vhost). Failover actions for those Clusters/Process Groups/Instances are limited to be within that Zone.

Moogsoft AIOps Architecture

Instances are individual components that run on a single machine. Process Groups and Clusters, however, can span multiple machines. Their configuration allows the flexibility to define architectural groupings for failover actions as long as they are within the same MooMS Zone:

Active / Passive modeInstances are configured to operate in Active or Passive mode. Instances that are operational are set to Active mode. Instances that are backup (redundant components) operate in Passive mode

Defines which Instance in a Process Group becomes Active when the whole Process Group switches from Passive to Active state. Normally, only one Instance per Process Group should be defined as a Leader. Leader definition availability is as follows:

ComponentLeader definition availability
moog_farmdMandatory for a Process Group with more than one moog_farmd
Socket LAM, Logfile LAM,TrapdLAMOptional
REST LAM, UI servletsNot applicable

Leadership status is a property of Process Groups. There are two states of group leadership status (as seen in the output of ha_cntl --view command. See below) as follows:

"only leader should be active"This is the default setting for components where Leader definition is supported (in moog_farmd, Socket LAM, Logfile LAM, Trapd LAM)
"no leader - all can be active"This is the default setting for components where Leader definition is not supported (REST LAM and the UI servlets), OR if there is one or more Instances in the Process Group configured with "only_leader_active = false,". The behavior is dynamic, i.e. terminating such an Instance will change back the Process Group's status to "only leader should be active"

All of the above are defined when starting each component. If any of the values are not explicitly defined as parameters at startup, values are taken from the component configuration file (or if not defined there, values in system.conf are used).

Example component definition

$MOOGSOFT_HOME/bin/moog_farmd --cluster Surbiton --instance MASTER --leader yes --mode active

The above creates an Instance of moog_farmd and defines it as a member of the Surbiton Cluster, with the Instance name MASTER. It also defines it as the Leader Instance in its Process Group and configures it to operate in Active mode. No Process Group (--group) is defined, so the default name (from the component configuration file) moog_farmd is used.

Moogsoft AIOps HA Configuration

The information in the table describes how to configure Moogsoft AIOps components for a HA architecture.


Default Cluster

$MOOGSOFT_HOME/config/system.confha section, cluster property
 "cluster": "NY"
The name of the Cluster. This supersedes anything set in system.conf (can also be overwritten by the command line)


LAMs configuration fileha section
  cluster: "NY",
  group: "socket_lam",
  default_leader: true,
  only_leader_active: true,
  accept_conn_when_passive: true
clusterThe name of the Cluster. This supersedes anything set in system.conf (can also be overwritten by the command line)
groupThe name of the Process Group. This defaults to the LAM process name if no value is specified (for example socket_lam)
default_leader A Boolean, indicating if the LAM is the Leader within its Process Group (see above). The default value is true if not specified
only_leader_activeA Boolean that changes the type of Process Group from a Leader Only group to a Process Group where more than one process can be Active. The default is true, except for the REST LAM where it is not supported and it is always treated as false
accept_conn_when_passiveA Boolean instructing the LAM what to do in Passive mode. If true (or not set), the LAM accepts incoming connections but discards any events received. If false, the LAM does not accept incoming connections, and closes the socket from socket/Trapd LAMs. This is to prevent a load balancer from detecting them as unavailable and routing traffic elsewhere


moog_farmd.conf ha section
  cluster: "NY",
  group: "moog_farmd",
  default_leader: true,
clusterThe name of the Cluster. This supersedes anything set in system.conf (can also be overwritten by the command line)
groupThe name of the Process Group. This defaults to moog_farmd
default_leaderA Boolean, indicating if this moog_farmd is the Leader within its Process Group (see above). Defaults to true if no value is specified

Command line overwrites

ComponentDescriptionCommand line
Clustercluster SF to the command line for starting the componentcluster SF 
Process Groupgroup cool_group to the command line for starting the componentgroup cool_group 
Instanceinstance instance_3 (for example) to the command line for starting this Instance of the componentinstance instance_3

Passive Modemode passive to the command line for starting this Instance of the component

Instance not Process Group leader
(where only_leader_active is set) 
leader no to the command line for starting this Instance of the component. This will overwrite the default_leader in the configuration files


$MOOGSOFT_HOME/bin/moog_farmd --instance TEST_INSTANCE --group TEST_GROUP --cluster TEST_CLUSTER --mode passive
$MOOGSOFT_HOME/bin/socket_lam --instance SOCK1 --group SOCKGROUP --cluster CLUSTER1 --leader no --mode passive



ha section

ha :
instance: "servlets",
group: "UI",
start_as_passive: false
  • Note that all servlets defined in this file act as one HA "instance" - hence will all failover together

  • If cluster is not specified, the name of the Cluster is taken from the system.conf file

  • If group is not specified, the name defaults to "servlets"

  • If start_as_passive is not specified, then the servlet defaults to a setting of false for this property; hence, it is Active on startup

  • Servlets do not support any 'leader' settings
  • The apache-tomcat service must be restarted to apply configuration changes made to the servlets

Active and Passive mode behavior

In a High Availability deployment, Moogsoft AIOps components may operate in either Active or Passive mode. In Active mode, their behavior is unchanged from non-HA Moogsoft AIOps installations, carrying out data ingestion, processing, presentation, etc. In Passive mode, these activities do not occur - the component is effectively on standby, waiting for an instruction to start the processing activities defined by its component type and configuration setup. Failover is the process of converting one or more processes from Active to Passive mode while converting other processes from Passive to Active mode.

The Active/Passive state of the HA components in a Cluster can be viewed in the Moogsoft AIOps UI using Self Monitoring or via the ha_cntl utility (see below). In the UI, Passive processes are indicated by the  icon.
Further details of how components behave in Passive mode and how, where relevant, the Passive mode may be identified from the command line are given below.


When the UI is in Passive mode, the moogsvr servlet will reject all requests with an HTTP status of 503 (server unavailable) and the moogpoller servlet will not accept incoming websocket upgrade requests. When switching from active to passive the moogsvr servlet will start rejecting requests and the moogpoller servlet will disconnect any existing websocket sessions.

A Load balancer can therefore determine whether a UI is running in Active or Passive mode by sending a GET request to https://<server>:<port>/moogsvr/hastatus. A 204 response indicates that the UI is Active, a 503 response indicates Passive mode.

The following example curl command can be sent from the command line to check servlet status:

curl -k https://moogbox2/moogsvr/hastatus -v

The output is < HTTP/1.1 204 No Content if the servlet is in Active mode, or < HTTP/1.1 503 Service Unavailable if the servlet is in Passive mode.


A moog_farmd process running in Passive mode will not process events or detect Situations. When it fails over to Active mode, it will be able to carry on using the state from the previously Active Instance if this has been persisted (see below).

When the moog_farmd state is being persisted, only one moog_farmd process is allowed to run in Active mode at any given time within a single moog_farmd Process Group. If more than one moog_farmd process is started in Active mode, all but the first to become Active will be automatically converted to run in Passive mode within a few seconds. The same applies to new moog_farmd processes started in Active mode when an Active moog_farmd is already running. This prevents a condition known as 'split brain'; where two Active processes both believe that they are responsible for executing functionality.

All Instances of moog_farmd within the same Process Group must have identical configuration


LAMs operating in Passive mode do not send Events to the MooMS bus. The REST LAM in Passive mode will reject POST requests with an HTTP status of 503 (server unavailable).

Example curl command to check rest_lam status:

curl -x POST http://moogbox2:9876 -v

The output is < HTTP/1.1 503 Service Unavailable if the rest_lam is in Passive mode. If the rest_lam is in Active mode, then the response code is dependent on the format of data sent to it as per normal rest_lam behavior.

Configuring persistence of state in moog_farmd 

The state of moog_farmd can be persisted to ensure that context is not lost when failover occurs from one Instance of moog_farmd to another. This means that information held in memory about the Situations created by the Sigalisers and the current state of the Sigalisers themselves will not be lost. The new Instance of moog_farmd will continue to process events and detect the same Situations as would have been detected if there had been no failover.

The state of the in-memory database (and the Constants module) will always be persisted if persistence is turned on. For each of the following Sigalisers:

    • Classic Sigaliser
    • Speedbird
    • Nexus
    • Cookbook
    • Template Matcher

The persist_state configuration parameter in moog_farmd.conf must be set to true to ensure that the state for each Sigaliser is persisted.

The state of the Alert Rules Engine Moolet can also be persisted using the persist_state configuration parameter. Similarly, setting "persist_state" for the AlertBuilder (or any other moolet) ensures that any tasks queued for that moolet - in this case Events that have not yet been processed - are persisted to Hazelcast while queueing and will be processed by another instance of farmd after failover.

When failover occurs, events and other pieces of information may be queued in Moolets, waiting to be processed. To ensure that these tasks are processed in the newly Active Instance of moog_farmd after failover, the persist_state flag is again used. This flag may be used for any Moolet that has a queue of tasks awaiting processing which, for all practical intents and purposes, is every Moolet other than the Scheduler.

To take advantage of this feature and to ensure that the newly Active moog_farmd Instance takes over from where the previous one left off, the message_persistence property in the MooMS section of the system.conf file must be set to true

Choice of persistence mechanism and configuration

Persistence may be carried out using a Hazelcast in-memory Cluster. The persistence mechanism is configured in system.conf in the persistence section, for example:

  # Persistence configuration parameters.
    "persistence" :
            # Set persist_state to true to turn persistence on. If set, state
            # will be persisted in a Hazelcast cluster.
            "persist_state" : true,

            # Configuration for the Hazelcast cluster.
            "hazelcast" :
                    # The port to connect to on each specified host.
                    "network_port"      : 5701,

                    # If set to true Hazelcast will increment the port number to
                    # an available one if the configured port is unavailable.
                    "auto_increment"    : true,

                    # A list of hosts to allow to participate in the cluster.
                    "hosts" : ["localhost"],

                    # Additional config to allow cluster info to be viewed via
                    # Hazelcast's Management Center UI, if running.
                    "man_center"    :
                            "enabled"   : false,
                            "host"      : "localhost",
                            "port"      : 8091

and as previously mentioned, ensure that the message_persistence property in the MooMS section of the system.conf file is set to true:

	"zone": "MOOG",
	"brokers": [
		"host": "localhost",
		"port": 5672
	"username": "moogsoft",
	"password": "m00gs0ft",
	"message_persistence": true,
	"max_retries": 100,
	"retry_interval": 200,
	"cache_on_failure": false,
	"cache_ttl": 900

Clearing Persistence Data on Start-up

If persistence is configured, once all moog_farmd Instances have been stopped, the in-memory persistence data is lost.

moog_farmd also has a command line option --clear_state which, when specified at start-up, clears any current persistence data for the Process Group that the moog_farmd Instance is a member of. This ensures a clean start for that particular Instance (i.e it would have no memory of previously created Situations) but also impacts any other running moog_farmd Instances in that Process Group.

This option does not remove moog_farmd persistence data from other Process Groups

[root@moogbox2 regression-tests]# moog_farmd --help
\n-------- Copyright MoogSoft 2012-2015 --------\n\n  Executing:  moog_farmd\n\n------------ All Rights Reserved -------------\n
usage: moog_farmd [ --config=<path to config file> ] [ --loglevel
                  (INFO|WARN|ALL) ] [--clear_state] [ --instance <name> [
                  --cluster <name> --group <name> [ --mode
                  <passive|active> ] [ --leader <yes|no> ] ] ] [ --version

MoogSoft moog_farmd: Container for our herd of moolets
    --clear_state      Clears any persisted state information associated
                       with this process group on startup.
    --cluster <arg>    Name of HA cluster (to overwrite the config file)
    --config           Specify a full path to the configuration file of
                       this farmd
    --group <arg>      Name of HA group (to overwrite the config file)
    --instance <arg>   Give this farmd herd a name for use with farmd
    --leader <arg>     Is this instance an HA leader within its group
                       (yes, no)
    --loglevel <arg>   Specify (INFO|WARN|ALL) to choose the amount of
                       debug output - warning ALL is very verbose!
    --mode <arg>       Start the process in passive or active mode
                       (default will be active)
    --version          Return current version of the Moog software

Configuring Automatic Failover for moog_farmd

When configured in an active/passive HA configuration moog_farmd, has the capability for automatic failover. This allows a passive moog_farmd to automatically take over processing from another (active) moog_farmd in the same HA process group  if the passive moog_farmd detects that the active moog_farmd has become inactive and is failing to report its status.

This feature is controlled by three configuration properties:

    • automatic_failover
    • keepalive_interval
    • margin

These properties are in the "failoverblock in $MOOGSOFT_HOME/config/system.conf:

"failover" :
    "persist_state" : false,
    # Configuration for the Hazelcast cluster.
    "hazelcast" :
    # Failover configuration below currently applies only to moog_farmd.
    # Interval (in seconds) at which processes report their
    # active/passive status and check statuses of other processes.
    "keepalive_interval" : 5,
    # At next keepalive_interval, processes will allow <margin> seconds
    # before treating active processes who have not reported their
    # status as being dead.
    "margin" : 3,
    # Number of seconds to wait for previously active process to
    # become passive during manual failover. After this time has
    # expired the new instance will become active and force the
    # process to become passive.
    "failover_timeout" : 10,
    # Allow a passive process to automatically become active if
    # no other active processes are detected in the same process group
    "automatic_failover" : false,
    # Process will stop indicating that it is active if it fails
    # to send <value> consecutive heartbeats.
    "heartbeat_failover_after": 2
automatic_failover property enables or disables the featuretrue|false
keepalive_interval (seconds and defaults to 5) defines how often a moog_farmd process reports its active/passive status to the database and checks the status of other reporting moog_farmd processesinactive
margin(seconds and defaults to 3) defines how long after a passive moog_farmd has detected that a formerly active moog_farmd (in its same process group) is no longer reporting status and should therefore become active and takeover processinginactive
failover_timeout(seconds and defaults to 10) active
heartbeat_failover_after(number and defaults to 2) defines that the process will stop indicating that it is active if it fails to send <value> consecutive heartbeatsactive

Example Automatic Failover Tuning

Assuming a highly simplified multi-host HA setup such as:

+---------------------+             +---------------------+
|server1|             |             |server2|             |
+-------+             |             +-------+             |
| moog_farmd (active) |             | moog_farmd (passive)|
+----------+----------+             +-----------+---------+
           |                                    |
           |                                    |
           |                                    |
           |         +---------------+          |
           |         |server3|       |          |
           |         +-------+       |          |
           +---------+      DB       +----------+

The moog_farmds on server1 and server2 are in the same process group but in different clusters. All other config is identical.

  • With automatic_failover: false, set in system.conf, on both server1 and server2, then if the active moog_farmd process on server1 is killed, becomes unresponsive, loses contact with the DB or drops off the network, then moog_farmd on server2 will remain passive and not take over processing unless a manual failover is triggered using ha_cntl
  • With automatic_failover: true, set in system.conf, on both server1 and server2 and with default keepalive_interval and margin settings, then if the active moog_farmd process on server1 is killed*, becomes unresponsive, loses contact with the DB or drops off the network, then moog_farmd on server2 will automatically become active and take over processing between 3-8 seconds later (depending on when next keepalive_interval occurs). If the moog_farmd on server1 is then restarted, resumes processing or rejoins the network, it will establish that there is already another active moog_farmd running in its process group (i.e. the instance now active on server2) and it will become passive to prevent split-brain processing occurring

Thus, the keepalive_interval and margin properties can be used to tune the sensitivity of automatic failover. In the above example (and with default settings) automatic failover happens promptly. Users may wish to increase or decrease the interval at which moog_farmd reports its status and also allow more time before a passive moog_farmd tries to take over processing (possibly useful if the active moog_farmd suffered a short interruption but has quickly resumed). Setting (for example) automatic_failover : true, keepalive_interval : 3 and margin : 10 would mean for the above system:

  • the active moog_farmd process on server1 is killed*, becomes unresponsive, loses contact with the DB or drops off the network, then moog_farmd on server2 will automatically become active and take over processing between 10-13 seconds later (depending on when next keepalive_interval occurs).  If the moog_farmd on server1 resumes processing or rejoins the network within 10secs of the passive moog_farmd on server2 detecting it as down, then it will continue as the active instance and the moog_farmd on server2 will remain as passive and not take over. Conversely if the moog_farmd on server1 had been restarted instead then it would not continue as the active process and the passive moog_farmd on server2 would become active and take over

* see note below on failover behaviour when process is killed or shutdown "cleanly".

Important Notes and Limitations:

  • For moog_farmd only, automatic failover of LAMs or UI Servlets is not part of this implementation
  • Identical configuration is needed on all servers running as part of the HA setup (as per other HA configuration)
  • time sensitive: requires all servers to be time synchronised. A change to the system time on the DB server in a running HA setup could trigger automatic failover between moog_farmd instances
  • Requires communication with the DB. If the DB or DB server becomes unresponsive to all moog_farmd instances then the feature will not work as expected
  • If, in an automatic failover setup, an active moog_farmd instance is shutdown cleanly (i.e. using normal kill, service stop or ctrl-c) then a passive moog_farmd will take over processing at its next keepalive_interval and will not wait the additional <margin> seconds

A Note on Process Startup

If automatic_failover is enabled and a moog_farmd instance is started in passive mode and no other active moog_farmd is running in its process group, it will switch to active. Users may wish to factor this in when starting up a system i.e. it is easiest to startup active moog_farmd instances first.

A Note on Split-Brain Handling

HA implementation has built in handling to prevent split-brain processing occurring i.e. two active moog_farmds (in the same process group) running at the same time and potentially leading to duplicate processing. At its simplest it prevents a second moog_farmd being started in active mode if there is another active instance already running in the same process group (regardless of cluster or instance name). The second moog_farmd will startup but will immediately switch to passive mode. 

Controlling Moogsoft AIOps HA (ha_cntl)

Moogsoft AIOps includes a High Availability Control utility to control the HA architecture.
Use the ha_cntl utility to:

  • failover (change status of) Instances, Process Groups or Clusters
  • view the current status of all Instances, Process Groups and Clusters

There is also help available for the ha_cntl utility.

The UI will not continue to function correctly after a failover if only one of the Tomcat servlets is failed over using activate or deactivate commands at a servlet Process Group level. Currently, the UI must be failed over at a Cluster level to ensure continued smooth operation

ha_cntl utility commands are as follows: 

-a,--activate <arg>Specify cluster[.group[.instance_name]] to activate all Process Groups within a Cluster, a specific Process Group within a Cluster or a single Instance
-d,--deactivate <arg>Specify cluster[.group[.instance_name]] to deactivate all Process Groups within a Cluster, a specific Process Group within a Cluster or a single Instance
-h,--helpPrint help text, that describes ha_cntl commands
-l,--loglevel <arg>Specify (INFO|WARN|ALL) to choose the amount of debug output
-t,--time_out <arg>Specify an amount of time (in seconds) to wait for the last answer. If not set, the default is 2 seconds
-v,--viewView the current status of all Instances, Process Groups and Clusters
-y,--assumeyes Answer yes for all prompts. Useful for automation


Command lineDescription
$MOOGSOFT_HOME/bin/ha_cntl -a SURBITON.socket_lam.SOCK1
This activates the socket_lam Instance SOCK1 within the Process Group socket_lam within the Cluster SURBITON
If the socket_lam Process Group is configured to be leader_only (see above) all other socket LAMs in the Process Group are deactivated
This activates all Process Groups in the KINGSTON Cluster and deactivates all other Clusters
$MOOGSOFT_HOME/bin/ha_cntl -a KINGSTON.rest_lam -y
This activates the rest_lam Process Group in the KINGSTON Cluster (and the -y means there is no 'are you sure?' prompt) and deactivates all other rest_lam Process Groups in all other Clusters
This deactivates all Process Groups in the RICHMOND Cluster
$MOOGSOFT_HOME/bin/ha_cntl -d RICHMOND.trapd_lam
This deactivates the trapd_lam Process Group in the RICHMOND Cluster
$MOOGSOFT_HOME/bin/ha_cntl -a KINGSTON.UI -y
This activates the UI group in the KINGSTON Cluster and will deactivate the UI group in all other Clusters - triggering a UI failover (of all servlets) to the KINGSTON Cluster.
$MOOGSOFT_HOME/bin/ha_cntl -v

The -v option prints detailed status about all Clusters, Groups and Instances that it can discover:

[root@moogbox2 ~]# ha_cntl -v
Getting system status
Cluster: [KINGSTON] passive
        Process Group: [UI] Passive (no leader - all can be active)
            Instance: [servlets] Passive
                Component: moogpoller - not running
                Component: moogsvr - not running
                Component: toolrunner - not running
        Process Group: [moog_farmd] Passive (only leader should be active)
            Instance: FARM Passive Leader
                Moolet: AlertBuilder - not running (will run on activation)
                Moolet: AlertRulesEngine - not running (will run on activation)
                Moolet: Cookbook - not running (will run on activation)
                Moolet: Nexus - not running
                Moolet: Sigaliser - not running
                Moolet: Speedbird - not running (will run on activation)
                Moolet: TemplateMatcher - not running
        Process Group: [rest_lam] Passive (no leader - all can be active)
            Instance: REST2 Passive
        Process Group: [socket_lam] Passive (only leader should be active)
            Instance: SOCK2 Passive Leader
Cluster: [SURBITON] active
        Process Group: [UI] Active (no leader - all can be active)
            Instance: [servlets] Active
                Component: moogpoller - running
                Component: moogsvr - running
                Component: toolrunner - running      
        Process Group: [moog_farmd] Active (only leader should be active)
            Instance: FARM Active Leader
                Moolet: AlertBuilder - running
                Moolet: AlertRulesEngine - running
                Moolet: Cookbook - running
                Moolet: Default Cookbook - running
                Moolet: Nexus - not running
                Moolet: Sigaliser - not running
                Moolet: Speedbird - running
                Moolet: TemplateMatcher - not running
        Process Group: [rest_lam] Active (no leader - all can be active)
            Instance: REST1 Active
        Process Group: [socket_lam] Active (only leader should be active)
            Instance: SOCK1 Active Leader

farmd_cntl changes for HA

The farmd_cntl utility has 2 changes for HA:

To send farmd_cntl commands to a specific moog_farmd Instance within an HA environment, the <Cluster>.<Process Group>.<Instance> notation should be used for the --instance option. For example:

farmd_cntl --instance SURBITON.moog_farmd.FARM --moolet AlertBuilder --start

farmd_cntl now also gives more feedback on the results of the operation(s) requested:

[root@moogbox2 ~]# farmd_cntl --instance CLUSTER1.GROUP1.FARM1 --all-moolets --stop
Response from: CLUSTER1.GROUP1.FARM1
Status: Action(s) completed successfully.
    Moolet AlertBuilder Stopped.
    Moolet Default Cookbook Stopped.
    Moolet Sigaliser Stopped.
    Moolet SituationMgr Stopped.
    Moolet Cookbook Stopped.

[root@moogbox2 ~]# farmd_cntl --instance CLUSTER1.GROUP1.FARM1 --all-moolets --stop
Response from: CLUSTER1.GROUP1.FARM1
Status: Action(s) completed with failures.
No Moolets to stop...

[root@moogbox2 ~]# farmd_cntl --instance CLUSTER1.GROUP1.FARM1 --all-moolets --start
Response from: CLUSTER1.GROUP1.FARM1
Status: Action(s) completed with failures.
    Moolet AlertBuilder Started.
    Moolet Speedbird Started.
    Moolet Default Cookbook Started.
    Moolet TokenCounter could NOT be started.
    Moolet Sigaliser Started.
    Moolet Nexus Started.
    Moolet TemplateMatcher Started.
    Moolet AlertRulesEngine Started.
    Moolet SituationMgr Started.
    Moolet Cookbook Started.
    Moolet Notifier Started.

[root@moogbox2 ~]# farmd_cntl --instance CLUSTER1.GROUP1.FARM1 --moolet Sigaliser --restart
Response from: CLUSTER1.GROUP1.FARM1
Status: Action(s) completed successfully.
    Moolet Sigaliser Stopped.
    Moolet Sigaliser Started.

[root@moogbox2 ~]# farmd_cntl --instance CLUSTER1.GROUP1.FARM1 --moolet Sigaliser --moolet Speedbird --restart --reconfig
Response from: CLUSTER1.GROUP1.FARM1
Status: Action(s) completed successfully.
    Moolet Sigaliser Stopped.
    Moolet Sigaliser Configuration Reloaded.
    Moolet Sigaliser Started.
    Moolet Speedbird Stopped.
    Moolet Speedbird Configuration Reloaded.
    Moolet Speedbird Started.

MySQL failover for Moogsoft AIOps components

Moogsoft AIOps allows the definition of a list of MySQL servers that, in the event that the primary connection (as defined in goes down, Moogsoft AIOps components that have a MySQL connection (moog_farmd, tomcat, rest_lam) will automatically connect to the next available MySQL server in the failover_connections list.

This is defined in the failover_connections section of the $MOOGSOFT_HOME/config/system.conf file, as follows:

"mysql" :
            "host"            : "localhost",
            "database"        : "moogdb",
            "username"        : "ermintrude",
            "password"        : "m00",
            "port"            : 3306
            # New deadlock retry configuration - default values are as below if
            # the config remains commented out.
            # "maxRetries"      : 5,
            # "retryWait"       : 10
            # To use Multi-Host Connections for failover support use:
            #  "failover_connections" :
            #    [
            #      {
            #          "host"  : "",
            #          "port"  : 3306
            #      },
            #      {
            #          "host"  : "",
            #          "port"  : 3306
            #      },
            #      {
            #          "host"  : "",
            #          "port"  : 3306
            #      }
            #    ]

This is useful when the system is used with a replicated/clustered MySQL environment.


For the following mysql section in system.conf:

 "mysql" :
            "host"            : "moogbox1",
            "database"        : "moogdb",
            "username"        : "ermintrude",
            "password"        : "m00",
            "port"            : 3306,
            "failover_connections" :
                      "host"  : "moogbox2",
                      "port"  : 3306
                      "host"  : "moogbox3",
                      "port"  : 3306

On startup, the Moogsoft AIOps components that make a MySQL connection will connect to the MySQL server on moogbox1.

If the MySQL server on moogbox1 goes down then the Moogsoft AIOps components will automatically failover their MySQL connection to moogbox2 next. If that is not available or subsequently goes down then the connection will failover to moogbox3. Whilst the failover is occurring, some temporary MySQL connection errors or warnings may be seen in the Moogsoft AIOps components log output.

If the primary or another failover_connection higher up the list becomes available again, the connection will not automatically failback to that until the Moogsoft AIOps component is restarted or makes a new connection

  • No labels