Dacodes: Senior GCP DevOps with Specialization in MLOps & GenAI
remote-job.net Job Summary: 💶 Salary: Not specified ⏰ Weekly Working Hours: Full-time 🔍 Recommended Experience: Senior 🎓 Recommended Education: Not specified 🏭 Industry: Artificial Intelligence 📋 Main Responsibilities: Building and operating scalable GCP infrastructure with Terraform and CI/CD. Managing Kubernetes/GKE clusters and deploying ML/LLM services. Setting up MLOps workflows, observability, and cost and performance optimization. ✅ Key Requirements: ≥4 years of hands-on experience with GCP in production environments. ≥3 years of experience with Terraform, Kubernetes/GKE, and CI/CD pipelines. Experience in deploying ML models and basic knowledge of LLM/GenAI integrations. About the Company Headquarters location: Mexico. We are a technology-driven company collaborating with global brands and disruptive startups to bring AI-powered solutions into production. The team is multicultural and offers remote work opportunities for LATAM. The company supports professional growth (access to courses and certifications), organizes meetups and virtual events, and is certified as a Great Place to Work. Responsibilities Designing, automating, and operating infrastructure in Google Cloud (IAM, VPCs, Cloud Run, Compute Engine, Pub/Sub, Cloud SQL). Implementing Infrastructure as Code with Terraform (modules, remote state, multi-environment workspaces) and building/maintaining GitLab CI/CD pipelines. Managing Kubernetes/GKE clusters (including GPU node pools, autoscaling, security, networking) and deploying AI/ML inference services in GKE or Cloud Run. Integrating and operating MLOps platforms (e.g., Vertex AI, MLflow), deploying models (online/batch), and managing experiment tracking and model registry. Developing and operating GenAI/LLM-based workflows (RAG, embeddings, multi-agent pipelines) and optimizing performance and costs. Setting up observability, monitoring, and alerting (Grafana, Datadog, Looker Studio) and monitoring LLM token consumption, GPU/CPU resources, and GCP costs. Requirements At least 4 years of production experience with GCP (IAM, networks, compute services, Pub/Sub, etc.). At least 3 years of experience with Terraform (advanced usage) and Kubernetes/GKE (preferably with GPU experience). At least 3 years of experience building CI/CD pipelines (GitLab), along with solid knowledge of Docker, cloud security, networking, and observability. Experience in deploying ML models (online/batch endpoints), collaborating with data/AI teams, and basic knowledge of GenAI/LLM APIs (OpenAI, Gemini, etc.). Desired: GCP certifications (Cloud Architect, Data/ML Engineer), experience with Dataflow/BigQuery, and familiarity with NLP frameworks (LangChain, LlamaIndex) are a plus. Benefits 100% remote work for LATAM (hybrid/on-site arrangements possible if needed). Flexible working hours, Monday to Friday, team/project-based arrangements. Day off on your birthday. Health insurance with higher coverage (applies to Mexico) and life insurance (applies to Mexico). Multicultural teams, access to training and certifications, meetups, and guest lectures. Virtual integration events, English lessons, and internal development opportunities. Work with global brands and innovative startups; Great Place to Work certification.