Edge-AI Telemetry NOC

A DevSecOps deep-dive into running local LLM inference on legacy hardware. Engineering air-gapped deployments, hardware bottlenecks, and "Happy Accident" security features.

Docker Python/Systemd Ollama (1B) C++ / ESP32

// 01. Legacy Hardware Challenge

The premise was simple: Can I run a fully functional Generative AI agent locally? I utilized an old Dell OptiPlex 3020 as my host node. It’s legacy hardware running DDR3 memory—not exactly an Nvidia H100 cluster.

The Happy Accident: The DDR3 WAF

Memory bandwidth was the primary bottleneck. To keep the system from crashing, I hardcoded a strict limit in the API payload: "num_ctx": 1024.

"By physically capping the context memory, I accidentally built an un-hackable Web Application Firewall (WAF). Large malicious prompt-injection payloads are natively truncated by the hardware constraints."

// 02. Telemetry & Systemd Recovery

I wrote a custom Python telemetry broker to log hardware cost per token generated. To ensure high availability, I wrapped this in a native Systemd service unit. The kernel tracks the process, allowing for 3-second automatic recovery, moving beyond the polling limitations of crontabs.

// 03. Edge UI (CYD Dashboard)

I built a physical Network Operations Center (NOC) monitor using an ESP32 Cheap Yellow Display (CYD).