Role overview
*KissMyButton · Greece (Remote/Hybrid)
About Us**
KissMyButton is a dedicated team of professional software developers. We are passionate about our work and aim to extend the potential of our clients through high-impact, technical excellence.
The Role
We are building a high-performance, secure, and eventually sovereign RAG engine. We are looking for a Senior Go Engineer who is a "security-first" architect. You will start by building the hardened back-end services for our AI Assistant and lead the transition to on-prem, high-throughput inference using cutting-edge GPU orchestration.
What you'll work on
- Go Orchestration: Develop lightning-fast, type-safe APIs and middleware in Go (Golang) to handle streaming LLM data and complex RAG logic.
- AI Security & Guardrails: Design and implement the "Shield"—protecting the system from prompt injection, data exfiltration, and ensuring strict PII/PHI filtering.
- Inference Engineering: Deploy and tune vLLM and llm-d on Kubernetes to maximize GPU throughput and minimize "Time to First Token" (TTFT).
- System Observability: Build the backend infrastructure for RAG evaluation (integration with tools like LangSmith or Arize) to track the "faithfulness" of our AI.
- Performance: Optimize Go routines and memory management to support high-concurrency enterprise traffic.
What we're looking for
- Go Mastery: 3+ years of professional experience building resilient distributed systems, durable execution & high-performance backends in Go.
- Architecture: Solid understanding of system design, concurrency, and data consistency
- Infrastructure DNA: Strong experience with Kubernetes (K8s) and Docker. You should be comfortable managing GPU-enabled nodes.
- Security-First Mindset: Deep understanding of modern AuthN/AuthZ (OIDC, OAuth2) and API hardening.
- AI Enthusiast: You are an active user of AI agentic tools (Claude, Cursor, Copilot) and have a strong curiosity about how LLMs work under the hood (Weights, Quantization, Context Windows).
- *Strong communication and collaboration skills.
- Experience with vLLM, llm-d, TGI, or NVIDIA Triton Inference Server.
- Familiarity with Python (for data/AI scripting) and the Lang* ecosystem.
- Knowledge of Vector Databases (Qdrant, Weaviate, or Milvus).
- Experience with Open Policy Agent (OPA) for fine-grained access control.