VAST Data
VAST Data

DevOps Engineer

Our Iceland team is developing a next-generation cloud resource management platform that provides a unified API surface for managing infrastructure across multiple cloud providers. We're building systems that abstract complexity while maintaining the power and flexibility that enterprise customers demand.

Join us and discover just how VAST the possibilities are.

We're looking for a DevOps Engineer to join our engineering team in Iceland and help shape how our platform is deployed, operated, and observed in real-world environments.

You'll work at the intersection of infrastructure, reliability, and developer enablement—designing and operating Kubernetes-based environments, building deployment and release workflows, and establishing best practices for staging and production as the platform evolves.

This is a highly hands-on role with significant autonomy. You'll collaborate closely with backend engineers, influence architectural and operational decisions, and play a key role in defining how we run and scale production systems. You'll also be encouraged to leverage modern tooling—including AI-assisted workflows—to debug, maintain, and continuously improve the reliability of our environments.

Helstu verkefni og ábyrgð

- Design, deploy, and evolve Kubernetes-based environments across development, staging, and production

- Own and improve the release workflow, including CI/CD pipelines and Git-based deployment practices

- Build and maintain containerized workloads, Docker images, and Kubernetes runtimes

- Operate and improve our GitOps setup using Argo CD

- Define and implement monitoring, alerting, and observability best practices across services

- Develop and maintain Prometheus metrics, dashboards, and alerting rules

- Work with distributed tracing and service mesh technologies (e.g. Linkerd) to improve reliability and visibility

- Collaborate closely with backend engineers to ensure services are production-ready from day one

- Identify gaps in our infrastructure and proactively propose improvements to reliability, scalability, and developer experience

- Automate repetitive operational tasks and environment setup using infrastructure-as-code and scripting

- Help define what "good" production and staging environments should look like as the platform matures

- Leverage AI tools to assist with debugging, incident analysis, root-cause investigation, and operational decision-making in staging and production environments

Menntunar- og hæfniskröfur

Required:

- Strong hands-on experience with Kubernetes in real-world environments

- Solid understanding of Docker, container images, and containerized application workflows

- Experience deploying and operating applications in Kubernetes clusters

- Familiarity with CI/CD pipelines and release workflows

- Experience working with Git-based workflows and infrastructure repositories

- Ability to reason about system reliability, failure modes, and operational best practices

- Comfortable taking ownership and driving initiatives independently

- Good communication skills in English

Preferred:

- Experience with Helm and/or Kustomize

- Experience using Argo CD or other GitOps tools

- Familiarity with Prometheus, Grafana, and Alertmanager

- Experience with distributed tracing and observability tooling

- Familiarity with service meshes (Linkerd, Istio, or similar)

- Experience operating and scaling production-grade environments

- Background in automating infrastructure and environment provisioning

- Experience working in early-stage or evolving platform environments

- Strong comfort using AI-assisted tools to help manage, debug, and maintain staging and production systems

- Using AI for log analysis, alert triage, configuration validation, and incident investigation

- Applying AI tools pragmatically to accelerate operational work while maintaining system understanding and accountability

- Curiosity and openness toward integrating AI into day-to-day SRE and DevOps workflows

- Experience using AI-assisted development tools (GitHub Copilot, Claude Code, Cursor, or similar)

Advertisement published1. February 2026
Application deadlineNo deadline
Language skills
EnglishEnglish
Required
Intermediate
Location
Bræðraborgarstígur 16, 101 Reykjavík
Type of work
Professions
Job Tags