// 01. Legacy Hardware Challenge
The premise was simple: Can I run a fully functional Generative AI agent locally? I utilized an old Dell OptiPlex 3020 as my host node. It’s legacy hardware running DDR3 memory—not exactly an Nvidia H100 cluster.
The Happy Accident: The DDR3 WAF
Memory bandwidth was the primary bottleneck. To keep the system from crashing, I hardcoded a strict limit in the API payload: "num_ctx": 1024.
"By physically capping the context memory, I accidentally built an un-hackable Web Application Firewall (WAF). Large malicious prompt-injection payloads are natively truncated by the hardware constraints."
// 02. Telemetry & Systemd Recovery
I wrote a custom Python telemetry broker to log hardware cost per token generated. To ensure high availability, I wrapped this in a native Systemd service unit. The kernel tracks the process, allowing for 3-second automatic recovery, moving beyond the polling limitations of crontabs.
// 03. Edge UI (CYD Dashboard)
I built a physical Network Operations Center (NOC) monitor using an ESP32 Cheap Yellow Display (CYD).
- Dynamic Charting: Real-time CPU/RAM visualization.
- String Parsing: Custom logic to wrap JSON data on a restricted screen.
- Dollar Offset: Visualizing compute-cost savings in real-time.