Demystifying Metric Intelligence: How ServiceNow Detects Anomalies

rtbryan
May 30
3 min read

Recap & Context

This post continues my Metric Intelligence blog series by diving into how data flows through the ServiceNow platform. If you haven’t read the first entry where I introduce the concept and its purpose, I highly recommend starting there.

Unfortunately, there is no dedicated NowLearning course explaining the inner workings of Metric Intelligence. This inspired me to reverse-engineer the data flow within the tool. This article looks at the flow of data as it passes the Metric Intelligence application, highlighting the capabilities and tables that it touches as it moves through its lifecycle.

Figure 1 – High level architecture of the Metric Intelligence solution including key components such as Agent Client Collector, MID Server, and Machine Learning Algorithms

Metric Collection & Clotho Database

Life of a metric starts out being received from the Agent Client Collector (ACC) into the platform and is stored in a database dedicated to receiving and processing metrics called “Clotho Database”. The Clotho database is created when activating the Metric Intelligence plugin and is not accessible by end users. The Agent Client Collector is ServiceNow’s agent capability that can be installed on servers and end user compute machines so that I can collect information for applications such as Discovery, Event Management and Metric Intelligence.

As show in Figure 1, the Agent Client Collector (ACC) on each monitored device collects metrics, and crucially there is one agent per ServiceNow Instance on each device. The Clotho database is not something that as an end user you have access to and will require help from ServiceNow if there are any issues (see my next blog for reference to this). The ACC connects to a MID Server is required to collect Metrics for the current release of ServiceNow but speaking to people at Knowledge 25 there are plans to remove the dependency of a MID Server.

Data Processing & Machine Learning

Depending on how you have configured the platform will depend on what happens next. Assuming you have configured the platform to process the metrics into full IT Alerts, the platform will use a combination of Supervised Machine Learning Algorithms such as KNN, Naïve Bayes, SVM, or Decision Tree to develop statistical models using seasonal and trend Statistical models.

Figure 2 – Process flow of the Metrics once they have been received into the platform

These models will automatically develop Thresholds bounds which indicate when a metric is going outside of its normal behaviour, as well as generating Anomaly Alerts [em_alert_anomaly] if required. Think of Anomaly Alerts as something of interest but perhaps not requiring immediate human attention.

Alert Promotion via APE

Following the generation of Anomaly Alerts, the Advanced Promotion Engine (APE) will assess the Anomaly Alerts based on pre-defined criteria to determine if this Anomaly requires some attention and generating an IT Alert [em_alert]. The APE [mi_advanced_promotion_def] allows you to specify the type of CI’s, the severity, number of alerts, and time window before generating an IT Alert.

IT Alert Handling & Automation

Once in the IT Alert has been generated by the APE, the process will then follow the usual Event Management Alert steps. First any related or associated alerts will be grouped together using various different factors documented here. Then the NOC team can either acknowledge, assign or promote the Alert.

Alternatively, an Incident is automatically created and associated with the Alert based on predefined criteria that is configured in the Alert Management Rules. These Alert Management Rules leverage flow designer, therefore you can automate the resolution of an Alert without anyone getting involved, such as restarting a service or providing extra memory for a Virtual Machine.

Conclusion

Metric Intelligence in ServiceNow is a powerful yet often under-explored capability that enables proactive detection and handling of performance issues across your infrastructure. In this article, we unpacked how metrics flow from collection via the Agent Client Collector (ACC), through the Clotho Database, and into a series of machine learning-driven models that determine what’s normal and what’s not. We also looked at how anomalies are evaluated by the Advanced Promotion Engine (APE) and ultimately escalated to actionable IT Alerts. Understanding this behind-the-scenes process is essential for maximizing the value of Metric Intelligence—especially since no official NowLearning course currently covers these internals. By demystifying the data flow and decision logic, you’re better equipped to configure, troubleshoot, and extend the platform for your organisation’s needs.In the next post, I’ll be diving into some Lessons Learnt including what happens when things go wrong and how to work with ServiceNow support to resolve backend issues.