DataRobot is looking for an experienced software engineer to join our Data Management team working on Big Data problems.  As a Data Management team member, you will work on building a data processing framework that allows the DataRobot application to scale to new heights. The ideal candidate should have experience in distributed computing and storage architectures and be able to think at scale.

Main Requirements:

  • Bachelor’s degree in Computer Science, Engineering, or related field
  • 5+ years experience building large scale, highly available, distributed computing systems
  • 1+ years experience in Scala or Java
  • 2+ years experience working in the Hadoop stack (HDFS, Yarn, ZooKeeper, etc.)
  • 3+ years experience in Python.
  • Fundamental knowledge of data structures, algorithms, and complexity analysis
  • Ability to expertly design and produce high quality, high-performance code ready to ship
  • Hands on experience with Big Data technologies (e.g. Hadoop MapReduce, Spark, Hive, Vertica, Netezza, Greenplum, Aster)
  • Familiarity with the internals of distributed data processing engines such as Spark
  • Ability to evaluate and optimize performance and scalability in the context of big data processing and storage
  • Experience working in GNU/Linux environments
  • Understanding of software design principles and best practices (test driven development, source control management)
  • Open-minded, curious, and thorough.
  • Proficient in English

Desired Skills:

  • PhD or Master’s degree in Computer Science, Engineering, or related field
  • Past experience designing and building scalable ETL systems/data pipelines
  • General Data Science knowledge (basic AI + Machine learning concepts)
  • Familiar with the following tools: Git, Docker, Jenkins,
  • Experience in performance optimization and implementing high-performance code
  • Developing fault tolerant systems
  • Knowledge of cloud infrastructure (e.g ec2, s3)