Lessons Learned Implementing ServiceNow Metric Intelligence (AIOps)
- rtbryan
- Oct 15
- 4 min read
If you’ve ever tried implementing Metric Intelligence (MI) in ServiceNow, you’ll know it’s one of those features that sounds simple on paper — until you start configuring it.
When I first worked on an AIOps project that involved Metric Intelligence, I assumed most of the work would just be enabling the plugin and pointing it at some metrics. Turns out, there’s a lot more to it. The documentation helps, but there are a few quirks and “aha” moments that you only discover through trial and error.
This post is a summary of those lessons - the stuff I wish someone had told me before I started configuring thresholds, anomaly detection, and alert rules.
1. Metric Maps: The First Gotcha
When you start playing with threshold bounds, the first thing you’ll probably try to do is pick a metric to test. If that metric isn’t listed in the sa_metric_map table, nothing will show up when you test - even if you know you’re collecting the data.
This tripped me up early on.
When you’re using the Agent Client Collector (ACC), ServiceNow automatically creates the CI-to-Metric map behind the scenes when it receives metrics. So, in most cases, it “just works.”
But if you’re dealing with custom metrics, you’ll need to manually extend the metric map. There’s a ServiceNow Knowledge Article that walks through the process, but the short version is make sure your metric appears in sa_metric_map, or you’ll be troubleshooting a problem that isn’t really a problem.
Here’s a quick shortcut to the table (you can paste this into your browser after your instance name):
now/nav/ui/classic/params/target/sa_metric_map_list.do?sysparm_clear_stack=true
2. Detection Levels: What They Don’t Tell You
One thing the docs do mention, but not very clearly, is how the different detection levels work.
There are five, and they build on each other:
Metrics Only – just collects data.
Bounds – creates statistical models and shows threshold bounds in Insights Explorer.
Anomaly Scores – adds anomaly scoring on top of bounds.
Anomaly Alerts – generates alerts when something looks off.
IT Alerts – creates full alerts in the em_alert table.
Here’s the important part: each level includes the ones before it. So, if you pick “Anomaly Scores”, you’ll automatically get metrics and bounds. If you pick “IT Alerts”, you’ll get everything — metrics, bounds, anomalies, and alerts.
I learned that the hard way after wondering why no alerts were showing up.
Quick tip: If you actually want alerts to appear in the em_alert_list table, make sure the detection level is set to IT Alerts.
3. Avoiding Threshold Conflicts
At some point, you might want to override the automatically generated thresholds and create your own. Totally fine — just be careful not to have multiple threshold configurations for the same metric.
If you do, MI gets confused about which one to use, and you’ll see errors like this in the logs:
ERROR *** TimeSeriesModelLearner : Exception in model learner:
java.lang.NullPointerException: Cannot invoke "java.util.Map.containsKey(Object)"
because "classParams" is null
After chasing that error for a while, I learned the simple rule: one metric = one threshold configuration. It’ll save you hours of debugging.
4. Metric Rules: The Modern (and Sometimes Painful) Way
The Metric Rules interface is another way to configure anomaly detection. It’s part of both Metric Intelligence and DEX, and I like to think of it as the modern workspace version of MI.
That said, my first experience with it wasn’t smooth. When we activated the plugin and opened the workspace, it was completely blank. Repairing the plugin fixed that.
Then, even after setting some extreme thresholds, we noticed no anomaly alerts were being generated. After a bit of back and forth, it turned out to be a database-level sync issue with Clotho, which only ServiceNow Support could fix.
So, if you ever find yourself scratching your head because “everything looks right but nothing’s firing,” it might not be you - it might be the platform. Raise a Hi case.
5. Expect Some Lag (and Don’t Panic)
Here’s something most people don’t tell you upfront: Metric Intelligence lags when it’s learning thresholds.
Even though metrics come in near real-time, MI needs historical data to calculate those statistical bounds. So, during early testing, you might find that thresholds don’t appear immediately, or they seem off.
That’s normal. It gets better once the system has enough data.
If you need to refresh things faster during testing, run the “Operational Intelligence - Metric Learner job” manually — it forces a recalculation of bounds and speeds up feedback loops.
6. My Go-To Troubleshooting Checklist
Here’s the rough order I follow when something doesn’t look right:
Check the metric map – make sure your metric is actually mapped in sa_metric_map.
Confirm detection level – “IT Alerts” if you want actual alerts.
Look for threshold conflicts – only one per metric.
Check the logs – the model learner errors are your first clue.
Repair plugins – especially if the Metric Rules workspace is acting up.
Run the Metric Learner job – refresh your bounds manually.
Ask for help early – if it smells like a platform sync issue, open a Hi case.
Final Thoughts
Metric Intelligence is one of those features that starts off a little mysterious but gets powerful once you understand how it thinks.
The biggest takeaway for me? Be patient with it. MI takes time to learn from your data, and sometimes the weird behavior you see isn’t misconfiguration — it’s just the system catching up.
If you’re about to start your own implementation, hopefully these lessons save you a few late nights of “why isn’t this working?” troubleshooting. And if you’ve already been through it, I’d love to hear your war stories — drop them in the comments or message me.
Because honestly, we’ve all been there.



Comments