β’ 5+ years of successful experience in a similar DX / DevOps / SRE role.
β’ Proficiency in software development (Python, Go...) and programming best practices.
β’ Exposure to site reliability engineering: root cause analysis, in-production troubleshooting, on-call rotations...)
β’ Exposure to infrastructure management: CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability...).
β’ Technical product mindset (e.g. understanding how to debug poor adoption).
β’ Excellent problem-solving and communication skills (ability to contextualizing, gauging risks and getting buy-in for high stakes and impactful solutions).
β’ Ownership, high agency and constantly seeking to learn and improving things for others.
β’ Autonomous, self-driven and able to work well in a fast-paced startup environment.
β’ Low ego and team spirit mindset.
Your application will be all the more interesting if you also have:
β’ First hand Bazel (or equivalent) experience.
β’ Strong knowledge of Python's ecosystem.
β’ Familiarity with GPU based workloads and ecosystems.
β’ Experience of full remote environments (you're comfortable with having some of your users on the other side of the globe).