Logan Kragt - Software Developer

Back

DevOps Monitoring with Icinga

2019

Implemented proactive infrastructure monitoring and alerting with Icinga, scripting, and dashboards.

Tech stack & architecture

Icinga

Python

Linux

MySQL

Business Problem

Existing distribution centers lacked proactive monitoring, leading to surprise outages and long time-to-detect for infrastructure issues. The team needed customizable checks that matched warehouse workloads rather than generic ping/CPU alerts.

Non-Technical Solution Summary

I implemented Icinga-based monitoring with tailored checks and dashboards so ops teams could see problems before they affected throughput. Custom Python scripts captured domain-specific metrics, and alerts routed to on-call responders with actionable context.

Technical Architecture and Implementation

Platform

Icinga for core monitoring, scheduling, and alert routing.
Linux hosts with MySQL backing store for state and history.

Custom Checks

Python scripts to monitor warehouse-specific signals (queue depths, message lag, PLC heartbeat proxies, etc.).
Thresholds tuned to each DC’s workload to reduce noise and highlight true issues.

Dashboards and Alerts

Visual dashboards for NOC and operations.
On-call notifications with runbooks linked to each alert type.

Key Features

Icinga-based monitoring with MySQL-backed history.
Python custom checks for domain-specific metrics.
Actionable alerting with runbook links and tuned thresholds.
Dashboards for quick situational awareness across DCs.

Results and Impact

Operations teams detected issues faster, reducing downtime and unplanned stoppages. Noise levels dropped thanks to tuned thresholds, and on-call responders had clear guidance to remediate problems quickly.

Software Developer | Azure Certified | MBA

DevOps Monitoring with Icinga