Machine learning is headline news. Whether it’s Google AI discovering a new planet or studies claiming that machine learning will replace millions of white-collar workers, artificial intelligence is creating the prospect of a very different future. There’s the inevitable hype, but one thing is certain – for the first time, technology is starting to compete with the human mind, in the same way it challenged the dominance of human labor two centuries ago during the Industrial Revolution.
As someone deeply involved in the IT industry, I see the promise – and risks – of machine learning every day. Event management is a perfect case in point. Some event management vendors tout machine learning as a panacea, a complete solution for managing the availability and performance of mission-critical business services and underlying IT infrastructure. That’s dangerous. Machine learning complements existing approaches that use configured logic, but it doesn’t replace them.
The Case for Configured Logic
Let’s take event rules as an example. These capture what we already know about an IT environment, deduplicating, filtering and normalizing data from multiple monitoring tools. This reduces noise and creates a consistent event data set for further analysis. Without event rules, all we have is chaos. It’s just like human language – we can express incredibly complex and nuanced thoughts, but first we need an agreed grammar and vocabulary. That’s what event rules give us.
Admittedly, event rules have a mixed reputation. However, that’s because legacy event management tools used scripting to define event rules. Scripting is bad. It adds layers of manual work and can’t deal with the complexity of today’s highly connected digital world. Some vendors are going so far as to promote “Rules are evil” type language. We think that’s a bit over the top, because modern event management platforms now take a more enlightened approach, including codeless rules engines that let you create rules using drop-down menus and drag-and-drop interfaces. This dramatically simplifies rule creation and makes them easy to support – no more hard-to-maintain code.
So, here’s the question. If you’re a large enterprise struggling with centralized visibility and event noise, why on earth would you throw away proven event rules and pin your hopes on machine learning? With a codeless rules engine and strong bi-directional integrations with your monitoring systems, you can squelch event noise with a few hours work. This simply doesn’t require machine learning to be done well and quickly. And, in this case, machine learning creates risk because it throws away knowledge that already exists. Why have a machine try to learn how to normalize, deduplicate, and filter events from scratch when you already know how to do this? “AIOps”, “ML”, “ITOps” and “AI” are sexy buzzwords, but you shouldn’t buy based on initial impressions alone.
Where Does Machine Learning Fit?
Don’t get me wrong. Machine learning can be incredibly useful, particularly for predictive analytics. For instance, it can spot recurring event patterns, and then use these patterns to predict future service issues. And, by learning from ITSM data, it can also predict the business impact of these issues. That means you can prioritize and proactively address these issues before they affect your customers. This is true value-add – humans just don’t have the time or ability to sift through huge volumes of data to identify these subtle leading indicators.
Avoid Black Box Machine Learning
On the other hand, you don’t actually need machine learning for other event management processes, such as topological correlation or even root cause analysis. And, automation is best handled using well-defined workflows, rather than machine learning. In fact, using “black box” machine learning technology across the board is counter-productive. It’s all about trust. IT staff won’t blindly accept recommendations if they don’t understand the underlying reason. And, they certainly won’t turn over control to machine intelligence. To have value, machine learning can’t be a black box – it needs to be transparent and extensible, so that IT staff can understand what it’s doing and modify its behavior when needed.
Unsupervised Learning Is Crucial
Finally, let’s talk about supervised and unsupervised learning. Some industry commentators are pushing supervised learning for IT infrastructure management. For example, this SiliconANGLE article says that “Organizations should also bring more data scientists into their organizations to build and supervise machine learning models that are trained and tuned to their specific requirements.” Here’s why that’s wrong:
- You don’t need to tell machine learning what to look for. It’s already smart enough to do that by itself. If a vendor tells you that their event management tool needs supervised learning, then look for another vendor.
- If you need a data scientist to build or run your event management process, then you have a “black box” as far as your IT staff are concerned. See my previous comments on trust, transparency and extensibility.
- There aren’t enough data scientists to go around. One study found that there are between 11,000 19,400 data scientists in the world. And, as of April 2017 there were 13,700 unfilled data scientist jobs in the US alone. Even if you manage to find a data scientist, you’ll pay top dollar and struggle to keep them. Are you willing to take that risk with your event management platform?
The Bottom Line
Machine learning complements configured logic, but it doesn’t replace it. Throwing away proven event management processes for black box machine learning is a mistake. Choose an event management vendor that understands this and applies machine learning in the right way. Avoid platforms that rely exclusively on machine learning, and particularly those that need supervised learning. Instead, look for machine learning that is unsupervised, transparent and extensible – an event management platform that your IT staff understand and trust.
Want to learn how Evanios connects monitoring, event management, incident and ITSM processes with automated workflows and machine learning? Check out our Evanios Capabilities brochure (also known as “The Top 16 Things We Do Really Well”).