KissMyButton
AI

Senior Go AI Infrastructure Engineer

KissMyButton · Απομακρυσμένη Εργασία, GR

Actively hiring Posted about 3 hours ago

Role overview

*KissMyButton · Greece (Remote/Hybrid)

About Us**

KissMyButton is a dedicated team of professional software developers. We are passionate about our work and aim to extend the potential of our clients through high-impact, technical excellence.

The Role

We are building a high-performance, secure, and eventually sovereign RAG engine. We are looking for a Senior Go Engineer who is a "security-first" architect. You will start by building the hardened back-end services for our AI Assistant and lead the transition to on-prem, high-throughput inference using cutting-edge GPU orchestration.

What you'll work on

  • Go Orchestration: Develop lightning-fast, type-safe APIs and middleware in Go (Golang) to handle streaming LLM data and complex RAG logic.
  • AI Security & Guardrails: Design and implement the "Shield"—protecting the system from prompt injection, data exfiltration, and ensuring strict PII/PHI filtering.
  • Inference Engineering: Deploy and tune vLLM and llm-d on Kubernetes to maximize GPU throughput and minimize "Time to First Token" (TTFT).
  • System Observability: Build the backend infrastructure for RAG evaluation (integration with tools like LangSmith or Arize) to track the "faithfulness" of our AI.
  • Performance: Optimize Go routines and memory management to support high-concurrency enterprise traffic.

What we're looking for

  • Go Mastery: 3+ years of professional experience building resilient distributed systems, durable execution & high-performance backends in Go.
  • Architecture: Solid understanding of system design, concurrency, and data consistency
  • Infrastructure DNA: Strong experience with Kubernetes (K8s) and Docker. You should be comfortable managing GPU-enabled nodes.
  • Security-First Mindset: Deep understanding of modern AuthN/AuthZ (OIDC, OAuth2) and API hardening.
  • AI Enthusiast: You are an active user of AI agentic tools (Claude, Cursor, Copilot) and have a strong curiosity about how LLMs work under the hood (Weights, Quantization, Context Windows).
  • *Strong communication and collaboration skills.
  • Experience with vLLM, llm-d, TGI, or NVIDIA Triton Inference Server.
  • Familiarity with Python (for data/AI scripting) and the Lang* ecosystem.
  • Knowledge of Vector Databases (Qdrant, Weaviate, or Milvus).
  • Experience with Open Policy Agent (OPA) for fine-grained access control.

Tags & focus areas

Used for matching and alerts on DevFound
Fulltime Remote Ai