Overview

Xometry is looking for an experienced Site Reliability Engineer who is excited about containers and container orchestration with Kubernetes, understands microservices, and has a passion for infrastructure as code. This person also has a passion for building tooling that makes it easier for others to build, deploy and scale their software in a cloud environment.

What You’ll Do
Automate all the things
Build new tools and platforms when you see repeatable patterns across the team workflows
Coach Software Engineering and Data Science teams on best practices and architectural decisions
Own the security operations that protect our customer data while maintaining development velocity
Obsess over feedback loops: build, measure, and improve
Have a passion for resolving reliability issues and identifying strategies to mitigate repeat issues
Enable the software engineering community to build faster with less friction
On-call support rotations

What We’re Looking For
5+ years experience as an SRE or DevOps engineer in an eCommerce, API based, or B2C platform company. Said differently – this isn’t your first SRE rodeo
Architectural experience designing highly available and secure internet facing web-based services
Strong experience with AWS (preferred), Azure, or Google cloud infrastructure
Strong container management expertise with Docker, Kubernetes, Helm, Service Mesh, and Microservices
Versed in automating infrastructure (Terraform preferred, though similar experience with Ansible, Cloudformation, etc. considered)
Experienced with CI/CD Pipeline creation and operations
Knowledgeable in full system monitoring, metrics, KPIs, and reporting (Datadog preferred, though not necessary)
Strong experience with API fundamentals (REST and GraphQL in particular)
Experience developing software tools to support operations and development (language agnostic)
A master of root cause analysis, especially of complex distributed systems
Capable of writing documentation on complex topics for easy digestion
Able to clearly present technical project plans, issues, system status, policies and procedures, etc. to all levels of management
Excellent understanding of Internet technologies and protocols (TCP/IP, DNS, HTTP, SSL, etc.)