Overview

ABOUT CORDIAL

When we founded Cordial in 2014, we were determined to change the way brands communicate with their customers. We build technology to help brands send a better message—ones that are personal, relevant, and emotionally intelligent across any channel. Leading brands such as Revolve, Eddie Bauer, 1-800 Contacts, and TOMS use our cross-channel messaging platform to increase customer engagement, drive transformational speed-to-market, and future-proof their marketing technology.

We chose the name Cordial to symbolize how we empower our clients to communicate with their customers, as well as how we do business: with transparency, collaboration, and trust. We’re building a passionate team of individuals willing to learn, grow, and be thoughtfully challenged on a daily basis to continuously improve our product, company, and culture every single day.

CORDIAL VALUES

  • Communicate better than the rest
  • Tenacious about the client and the problems we solve for them
  • We’re owners and we act like it
  • Always #becordial

POSITION SUMMARY

We are looking for a motivated and talented Site Reliability Engineer to help us monitor, develop, and scale the Cordial platform. Our goal is to provide our clients with a delightful experience in their day to day interaction with the platform and to create trust that the expected jobs and background processes will run without issue. You will work with our DevOps and Product teams to ensure that bugs are squashed, performance is optimized, and blind spots are revealed through comprehensive monitoring.

YOU WILL

  • Utilize your knowledge of Web, App, Network, Server, Storage and Security technologies to administer, monitor and troubleshoot application and network components in our cloud based environment.
  • Actively contribute to Infrastructure Design and Implementation discussions.
  • Provide production support for the Product Development teams.
  • Participate in an on-call rotation.
  • Work with the team to develop and deploy monitoring and alerting architecture, and implement monitoring/logging solutions.
  • Troubleshoot complex issues in a timely manner as necessary to maintain the performance and stability of our production Application environment.
  • Help build out SLOs and document and monitor SLAs.

ABOUT YOU

  • 3+ years UNIX/Linux Systems (Unix/Linux) & Network Administration (DNS, IPsec, VPN, Load Balancing, process tracing).
  • Experience with AWS (we use EC2, EKS).
  • Experience with monitoring, logging and alerting tools.
  • Previous positions held as a SRE and/or DevOps role.
  • Development experience in PHP.
  • Experience with Docker/containers & Kubernetes.
  • Comfortable working in a globally distributed team across time zones.
  • Strong teamwork and communication skills.
  • A genuine desire to learn new technologies and grow.

BONUS

  • Experience with MongoDB.
  • Experience deploying and/or maintaining Kubernetes/EKS clusters.
  • Experience with Prometheus/Grafana/Datadog.
  • Experience implementing SLOs, reliability targets, error budgets.