This company employs approximately 7,000 people in North America and the U.K. and was included on Forbes’ list of Most Trustworthy Companies four consecutive times (2012-15). Looking to create a single pane of glass for all enterprise monitoring, and needing their manager of managers to filter, correlate, and auto-resolve events and incidents, they approached Evanios for an initial conversation.
A Senior Engineer leads the Enterprise Systems Monitoring (ESM) team, which maintains SolarWinds and other tools including vCenter, Azure, OMS, and SAP, among others. When given the additional responsibility of building out SCOM, he quickly realized that management and maintenance of multiple separated, primary rule sets was unscalable. “We were managing different actions on distinct event types while trying to coordinate disparate monitoring thresholds with limited resources,” he explains. Trying to resolve a complicated issue without the benefit of additional manpower, the engineer took a step back to consider what he ultimately needed to accomplish.
“Our primary responsibility is automation of surveillance services, which includes monitoring service health and notifying support teams or customers when service health degraded or resource consumption breached an agreed-upon threshold,” he summarized. “We weren’t set up to effectively deliver our required level of service while maintaining high availability for unplanned emergencies.” Thinking further ahead, he knew that integration and maintenance was likely to become even more complicated as the business began importing events from additional sources including Cisco, APC, and Riverbed. He needed a way to easily collect events from any source, standardize the formatting, filter, correlate, and auto-resolve events and incidents (including the ability to resolve tickets). Enhanced, prioritized notification was an important consideration, too.
First and foremost, the Senior Engineer needed to gain efficiency from his team’s existing tools. “We were resource-constrained,” he explained, “so buying another platform, especially one that could require another FTE, would be heading in the wrong direction.”
Since this company had chosen ServiceNow as their ITSM platform, the engineer began looking for the tightest possible event management integration. While evaluating Evanios, a custom ServiceNow application, he realized “The configuration engine is exactly what we needed: it allows us to centrally aggregate events from every source managed by ESM and other teams. All we need is the event data and Evanios takes care of the rest, including advanced correlation to configuration items in the CMDB, notifications, and auto-resolution that’s enabled through Evanios’ integration with ServiceNow’s Incident, Problem, and Change Management modules.”
Their evaluation process, which included the main engineer, the ITSM manager, and their Director of Infrastructure Security architecture as well as the VP of IT, revealed that Evanios’ supported functionality was far beyond that of competing platforms. “We would have had to do a lot more development work to get anywhere near the level of functionality Evanios offered us out of the box,” the main proponent said. “Plus, the implementation time and learning curve were exceptionally short – we were able to go from development to production in about 30 days.”
The completed integrations led to an immediate and dramatic reduction of daily incident volume for IT teams to manage. Overall, Evanios is doing exactly what they had hoped – standardize the process and relieve pressure on manual operator intervention.
“The required interaction of our resources to resolve tickets is greatly reduced,” the process champion stated. “When an event is used to raise an incident and the incident can be auto-resolved, teams don’t have to commit resource cycles to manage all of the tickets. As an example, one member of our network engineering team was resolving 200-300 incidents per month before Evanios – that’s now down to 10-15 per month.”
The company also discovered longer-ranging benefits that they hadn’t initially expected. “We had a foundation,” he explained. “Lots of CIs were populated through SCCM and there was a pretty well substantiated CMDB. However, we may have gone too wide at first. It didn’t take long to realize that SCCM and SolarWinds could be conflicting sources of CI data, plus the individuals and teams who would be responsible for managing and maintaining the CMDB weren’t fully engaged. It was like shining a light on an area where we needed improvement; the development of a single source of truth required further pruning.”
However, the lead engineer also cautions that “perfect” would have been the enemy of progress. “We were trying to rapidly build out additional ITSM services,” he explained. “We had ServiceNow Incident, CMDB, and Problem modules, and were working to launch Change Management, too. It was a lot of work to take on, so we had to closely evaluate event management, with a critical eye toward the minutes, hours or days we were going to need getting it into flight.” Would the organization focus on fixing underlying inefficiencies in their process (optimizing the event management layer, in order to bubble up efficiency through fewer incidents and lower resolution time) or focus on deployment of additional ITSM services?
Ultimately, working on event management first led to immediate benefits that allowed the team to free up resource cycles needed for higher-level, more visible activities. “We got Evanios into production and it only took a few days before we could see the impact and ripple effects of resource savings. I believe this will accelerate acceptance and maturity of change management as a process,” he explained, “allowing us more focused time on configuration management and business service mapping within the CMDB. This has had a compounding effect on maturing event management and configuration management processes together.”
They have also begun rolling out Evanios predictive analytics. “I see this being one of the first dashboards our infrastructure or applications team will look at to get a sense of where things are going,” their lead engineer says. “The early warning indicators and root cause analysis improves our ability to take proactive action before service availability thresholds have been reached,” he states. “This was exactly our goal during the initial evaluation process.”
After “cleaning up” events and incidents and setting the stage for further integrations with SAP, vCenter, and other tools, this medical supplies conglomerate is focusing on leveraging suppression for approved changes. As teams mature their use of change management (and link CIs to tasks), ESM can enable suppression of event notifications or incidents on approved changes. Overall, this is leading to greater, measurable collaboration and two-way benefits between the change management process lead and the event management team.