Site Reliability Engineer

dla Product Madness w Gdansk or remotely

Praca stała Inne

Site Reliability Engineers at Product Madness live and breathe in production looking for opportunities to improve reliability through observability and engaging with the technical groups. We are looking for an SRE to join our Infrastructure Group. The ideal candidate will take part in creating the next generation of cloud infrastructure, Observability as a service, CI/CD pipelines and our SRE/DevOps practices & culture inside Product Madness, using cutting-edge automation and modern technologies.
The successful applicant for this position will be a person with relevant experience, highly motivated and a team player.
Leading incident response and taking ownership of Post-Mortems, the SRE are responsible for keeping all user-facing services and other production systems running smoothly. SRE apply sound engineering principles, operational discipline, and mature automation to the platform.
The key to the SRE's success is engagement across teams, contributing to the development and operation of games and services which meet reliability targets.
Product Madness is growing fast, which means as an SRE you will have to balance speed with production while focusing on crucial reliability metrics and processes.
The SRE will Become a primary point responsible for the overall health, performance, and capacity of customer-facing services in cloud infrastructure, maintain back-end services in a cloud environment and assist in the roll-out and deployment of new product features. will be a key and improve our Engineering best practice across the observability.

Responsibilities
● Debug production issues across services and levels of the stack.
● Plan and support the growth of Product Madness Infrastructure.
● Help maintain Productmadness services in the Cloud environment, Like GCP, Newrelic,
and Grafana, Slack.
● Ability to learn deep knowledge of our complex applications.
● Assist in the roll-out and deployment of new product features and installations to new
cloud infrastructure our rapid iteration and constant growth.
● Develop tools to improve our ability to rapidly recover and effectively monitor custom
applications in a large-scale UNIX environment.
● Function well in a fast-paced, rapidly-changing environment.
● Design, manage and monitor Product Madness’s auto-scaling mechanism to help us
manage Millions of customers worldwide in a modern and scalable way.
● Design, Build and maintain multiple environments on GCP using infrastructure as a code
approach.
● Monitor the containerise environment/services and manage them with leading
orchestration frameworks (Kubernetes).
● Pay attention to both infrastructure and security aspects.
● Maintaining, optimising and automating processes in large-scale of production environments.
● Supporting other Engineers & Developers by providing necessary training, advice and mentorship.
● Support production environments - troubleshooting and root cause analysis.
● Participate in a 24x7 rotation for second and third-tier escalations
● Be on an on-call rotation to respond to incidents that impact availability, and provide
support for Cloud Operation Engineers
● Interface and work closely with various R&D Groups (Architects, Principal Engineers
Developers, and Product Managers).

Qualifications & Experience
● Proven experience as a DevOps/SRE/Infra Engineer
● Proven experience with troubleshooting in Unix/Linux
● Expertise in Java, Python, Ruby, Bash or experience in another programming language
● Experience and knowledge of CI/CD design and practice
● Public Cloud, preferably GCP but AWS and Azure are good too!
● Experience with Cloud Architecture Design principles and Cloud Architect certification
● Experience creating infra-as code solutions using tools such as Terraform, Azure ARM
templates, Cloudformation - a must
● Experience with CI/CD tools and methodologies such as Jenkins, ArgoCD, CircleCI,
GitHub Action etc - a must
● Hands-on implementation of Continuous Integration and Continuous Delivery in complex
environments.
● Proven experience working in a production environment - a must
● Solid experience implementing production-grade Kubernetes Clusters with containerised
environments and microservices (Docker, Kubernetes)
● Experience working with Configuration Management tools (Chef, Puppet, Ansible is an
advantage)
● Solid understanding of networking technologies mainly focuses on networking for clouds
● Experience with Service Mesh solutions such as Anthos, Istio, Consul-Advantage
● Experience with monitoring and log analysis tools such as ELK, Prometheus, Grafana,
New Relic, Splunk etc. - Advantage
● Experience in configuration and maintenance of applications such as web servers, load
balancers, relational databases, storage systems and messaging systems

About you
● Good communication skills in English
● Naturally curious with a boldness to pursue aspirations as a committed lifelong
learner
● You have great interpersonal skills and are able to communicate effectively with
your team members and other teams across the business
● You have an eye for detail and can apply logical thinking when managing tasks
● Good organisational and prioritising skills
● You are flexible, team-oriented and willing to work in a very fast-paced environment

● Allergic to manual, repetitive tasks with a desire to remove Toil
● Obsesses over systems performance and values simplicity over complexity
● Passionate about customer user journeys

Benefits:
Medical Insurance Prestige VIP Plus package 100% covered by Product Madness
Sport and Wellbeing package - 270 PLN per month - Includes: MultiSport Plus Cards 100% covered by Product Madness or Refund for your chosen sport and well-being services such as massage, psychotherapy, physiotherapy
1000 PLN bonus for blocked 10 day annual leave once per year
Free additional holidays during the year
Shorter working hours during summer time
Free holiday on your Birthday week

Data publikacji: 2023-05-02

Liczba wyświetleń: 367

Praca w branży gier

Site Reliability Engineer