Skip to main content

This job has expired

Site Reliability Engineer Manager

Employer
Kelly Science, Engineering, Technology & Telecom
Location
Eden Prairie
Salary
Competitive

View more

Job Title: Site Reliability Engineer Manager

Location: Remote

Type: Contract to hire

Pay Rate - Open for discussion


Note: This company doesn't provide sponsorship, so if you will require sponsorship now or in the future please do not apply.

Can only submit US Citizens without dual citizenship, no exceptions.

Responsibilities:

  • Manages and leads a team of site reliability engineers as direct reports.
  • Facilitates day-to-day managerial duties such as: creation, prioritization, and delegation of work along with personnel administration (reviews,
  • hiring, performance, etc).
  • Serve as the face and lead of site reliability engineering and operations across all facets of the business units.
  • Handles budgeting and resource allocations to stay aligned with the allotted organizational/business financials.
  • Provide vision and leadership for SRE operations within a digital landscape.
  • Responsible for understanding the requirements (services, features, and timing) from the customers of the cloud platform services to
  • effectively support in an operational scale. Develops strategic plans, roadmaps, and business cases in collaboration with technology leaders
  • and architects.
  • Engage teams in highly collaborative activities to drive alignment and partnership. Use effective negotiation and influence skills to ensure
  • priorities are met.
  • Set and communicate team priorities that support the broader organization's goals. Align strategy, processes, and decision-making across
  • teams.
  • Function as an engineering manager working in an agile environment, which includes but not limited to: story writing workshops, backlog
  • refinement, planning, standups, all maintained through Jira.
  • As a manager, provides technical guidance to the team and mentorship to less experienced team members.
  • Engage with key stakeholders, internal and external to help foster and strengthen working relationships.
  • Provides analytical, logical, and rational thinking abilities to build enterprise level, scalable, highly available, and performant systems.
  • Provide and operate SRE functions (as needed) within a Kubernetes / EKS environment in AWS GovCloud.
  • Serve as an SRE (as needed) with an emphasis on Operations to reactively respond, triage, and remediate reported categorized issues
  • based on severity.
  • Serve as an SRE (as needed) to proactively establish the means (through tooling) to effectively monitor, analyze, report, and observe the
  • health and upkeep of the systems and/or environments.
  • Establish key practices to ensure the availability, stability, scalability, performance, monitoring, incident response are handled appropriately
  • through a means of Automation.
  • Provide on-call rotation to field issues and support issues as they may arise.
  • As a senior engineer, provides technical guidance and mentorship to less experienced team members.
  • Collaborate with specific SMEs from various teams to investigate, troubleshoot, and resolve issues.
  • Implement automation to mitigate risks and faults based on reactive and proactive measures.
  • Construct and maintain incident response playbooks with documented corrective actions.
  • Adhere to an established and well defined escalation process to handle reported incidents.
  • Function as an engineering team lead/manager in an agile environment, which includes but not limited to: story writing workshops, backlog
  • refinement, planning, standups, all maintained through Jira.
  • Participate in the investigation and breakdown of technical issues, thoroughly, and support in troubleshooting, identifying, and addressing root
  • causes.
  • Establishes proactive solutions to prevent faults within the system and underlying infrastructure.
  • Build automation practices across applicable aspects that improve the overall efficiency and scalability of our applications and infrastructure.
  • Documents on a consistent basis for knowledge sharing and redundancy as a part of the definition of done.
  • Engage with key stakeholders, internal and external to help foster and strengthen working relationships.
  • Provides analytical, logical, and rational thinking abilities to build enterprise level, scalable, highly available, and performant systems.
  • Demonstrate proficiency and ability in creating reusable tools through scripting or development languages such as: Python, PowerShell, Perl,
  • Java, BASH, Shell or other languages.
  • Automates pipelines used for SRE functions in a continuous delivery and deployment (CI/CD) model.
  • Analyze all platform level changes and monitors for resulting issues to effectively formulate technical solutions.
  • Work with cross functional teams within the internal teams in North America and Europe.

Qualifications:

  • Bachelor's in computer science or a related field or equivalent work experience.
  • 5+ years of experience in a leadership and/or software engineering managerial role with direct reports.
  • Possesses a deep knowledge of AWS (or Cloud) foundation principles and design, cloud security and compliance, cloud networking and
  • pipelines.
  • Extensive experience with Agile development methodologies, Automation, SRE, and/or DevOps principles.
  • Experience managing large scale environments.
  • Communicates with honesty and kindness and creates the space for others to do the same.
  • Leads with courage, knowing the possibility of greatness is bigger than the fear of failure.
  • Fosters connection by putting people first and building trusting relationships.
  • Hands on working knowledge or familiarity of Observability Services such as: ELK stack, CloudWatch, Jaeger, Kiali, Grafana, Prometheus,
  • New Relic, Datadog, Netdata.
  • Experience with being on call and working with incident response tools such as: PagerDuty, VictorOps.
  • 5+ years of experience working with Cloud providers: AWS, Microsoft, Google.
  • 5+ years of experience working with Deployment Automation such as: Ansible, Helm, Chef, Puppet, Vagrant.
  • 5+ years of experience working with IaaC such as: Terraform, CDK, CloudFormation.
  • 5+ years of experiencing working with source control tools such as BitBucket, Git, SVN.
  • 5+ years of experience working with CI/CD tools such as Jenkins, Bamboo, TeamCity, GitLab.
  • 5+ years of experience working as an SRE or DevOps Engineer
  • Hands on working knowledge or familiarity with service mesh architectures (specifically Istio) is a plus

If this position may be interested to you, please email me back at (with your most up to date resume in word format) and advise the best time and number at which you can be reached


Get job alerts

Create a job alert and receive personalized job recommendations straight to your inbox.

Create alert