Build solutions to problems that interrupt availability, performance, and stability in our systems, services, and products at scale.
Perform a wide variety of technical and administrative duties in overall systems design, development, and delivery.
Work in conjunction with IT, engineering, and business groups to understand functionality, scalability, performance, security, and integration requirements.
Develop and maintain an in-depth understanding of the application, systems, database architecture, and the general application functionality used to maintain data integrity.
Manage the establishment and configuration of SaaS infrastructure in an agile way by storing infrastructure as code and employing automated configuration management tools with a goal to be able to re-provision environments at any point in time.
Develop and implement instrumentation for monitoring the health and availability of services including fault detection, alerting, triage, and recovery (automated and manual).
Be accountable for proper backup and disaster recovery procedures.
Develop, improve, and thoroughly document operational practices and procedures.
Drive operational cost reductions through service optimizations and demand based auto scaling.
Identify automation opportunities to improve DevOps operations
Review architecture and offer recommendations for improvements
System troubleshooting and problem solving across multiple platforms (dev/test/prod)
Embrace continuous integration and continuous delivery (CI/CD) processes
Collaborate with development and infrastructure teams to ensure new environments meet requirements
Identify and document IT best practices that will improve the systems deployment function
Provide status updates of assigned tasks in projects and feedback to peers and appropriate managers.
Requirements
4+ years of experience in DevOps or provisioning environments, deploying applications, and maintaining infrastructures
Strong experience building and maintaining production systems on AWS using EC2, RDS, S3, ELB, Cloud Formation, etc. and familiarity interacting with the AWS APIs.
You should be equally comfortable in a traditional datacenter setting.
Proficient in high level script languages (Python and/or Ruby) as well as script environments like bash
Deep experience administering Linux (Centos, RHEL, Ubuntu) systems.
Thorough understanding of configuration management concepts. Puppet experience is highly desired, both master based and headless. Bonus points for experience with Chef, Ansible, or Salt.
Experience with monitoring, metrics, and visualization tools for network, server, and application status (Zenoss, Sensu, Nagios, Graphite, Collectd, Ganglia, etc.)
Experience with hardware and software firewalls, IPS, WAF, and additional security layers (LDAP, SSO, 2Factor)
Experience with continuous integration, testing, and deployment.
Experience with RDBMS (PostgreSQL and MySQL). Bonus points for NoSQL (Cassandra, DynamoDB, Couchbase, Mongo)
A desire to automate yourself out of a job. We will always have new challenges and problems to solve.
Understanding of ITIL terminology like incident and problem.
Configuration Management, Continuous Integration, and Continuous Delivery (CI/CD) processes