Overview

Art of Problem Solving (AoPS) develops educational opportunities for many of the most eager students in the world.  Since 2003, we have trained tens of thousands of the country’s top students, including nearly all the members of the US International Math Olympiad team, through our online school, learning centers, textbooks, and online learning systems. Over the years, our international online community of advanced problem-solvers has grown to over 800,000 members. While our primary focus has been math for most of our history, we have started expanding into new subjects, such as language arts, science, and computer science.

We are seeking an experienced Sr. Site Reliability Engineer with a vision of creating scalable, secure infrastructure and evolving the value of our growing engineering efforts, in order to create reliable and optimized applications that educate and inspire the next generation of builders. This individual will help bootstrap a new SRE team to strategize, design, implement, monitor, and troubleshoot our web infrastructure. They will lead and mentor other SREs and collaborate with the engineering team to manage and improve Linux-based web servers, automate infrastructure, including CI/CD pipelines, configuration management, and application monitoring.

Potential Projects:

  • Implementing new monitoring and scaling solutions for mission-critical services.
  • Creating and improving CI/CD pipelines and Ansible plays for our online classrooms, multiplayer math game, grading tools, and more.
  • Establishing cloud infrastructure and tools for new microservices, including email delivery, math parsing, LaTeX rendering, homework grading, etc.

Responsibilities & Duties:

  • Work closely with engineering leadership to strategize and advocate for the short and long term needs of our systems, as well as lead the design, implementation, and maintenance of web infrastructure and pipelines.
  • Provide hands-on technical expertise by utilizing strong coding skills and SRE best practices to improve security, reliability, and monitoring of web applications.
  • Automate and document administration and configuration processes for web servers and databases using Ansible.
  • Utilize technical knowledge to both prevent, mitigate, and respond quickly to service failures.
  • Motivate, give technical direction to, and foster the growth of team members.

Ideal candidates will have…

  • Experience with creating and maintaining Linux-based / LAMP-stack systems.
  • Experience with Apache and/or Nginx.
  • Experience with JavaScript, TypeScript, or PHP.
  • Experience planning, designing, implementing, securing, and monitoring scalable infrastructure for web applications.
  • Familiarity with creating Ansible plays and CI/CD pipelines for web applications.

Schedule:

Many AoPS classes will happen during weekend and weekday evening hours. In the rare event of an unexpected site outage or service interruption, this full-time position may require occasional night or weekend work.

Background Check:

Please note that employment is contingent on the successful completion of a background check.

Perks and Benefits:

This full-time position will be based at our headquarters in San Diego, CA. Some benefits of the position include:

  • Casual work environment
  • Hybrid work week with flexible schedule
  • Medical, Dental and Vision benefits
  • 401K plan with company match
  • Year-end bonus based on individual and company performance
  • Starting Bonus
  • Relocation bonus (if currently located outside of San Diego)

We’re an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.