Senior SRE

Remote, US Only

No. of opening

Job Description

We are seeking a site reliability engineer. This individual will work with customers and other colleagues in automation and systems architecture to architect solutions and improve reliability.
They must be able and willing to document their processes using agile methodologies. In addition, this individual should be inquisitive to new technologies and learn at a fast pace. Your collection is already set up for you with fields and content.
Add your own content or import it from a CSV file. Add fields for any type of content you want to display, such as rich text, images, and videos.
Be sure to click Sync after making changes in a collection, so visitors can see your newest content on your live site.

Monitor all customer-facing applications and infrastructure to ensure they are working optimally.
Must be a fast-learning individual who is resourceful and has an inquisitive mindset.
Resolve support escalation cases by troubleshooting issues and finding opportunities to improve our application’s architecture and system infrastructure.
Must be willing to obtain an industry certification (i.e., AWS Solutions Architect, Redhat RHCE, HashiCorp Terraform Associate, etc.).
Must quantify failures and availabilities in a prescriptive manner (SLIs, SLOs, SLAs).
Must work efficiently and have an agile mindset.
Willing to embrace risk and accept failures and perform post-mortem analysis of such failures.
Self-driven and motivated to expand knowledge quickly is a must.
Familiarity with implementation of gradual changes, phased deployments (canary deployments), and intermediate change.

Minimum of 2 years of SRE experience.
College-level associate degree or higher preferred; or equivalent of related work experience.
Working knowledge of scripting languages such as Terraform, Ansible, Python, and CloudFormation.
Must have a good understanding of VMware hyper-converged infrastructure and architecture and Saas, PaaS, and FaaS Solutions.
Must have Working knowledge of Kubernetes, System Monitoring, OS Level patching, and overall system and application support.