You will be responsible to provisioning and managing of cloud infrastructure on Azure public cloud to support organizational needs. is responsible for ensuring the reliability, availability, and performance of cloud-based infrastructure and applications deployed on Microsoft Azure. This role involves automating operations, monitoring system health, optimizing performance, and troubleshooting complex issues to maintain a highly available and secure cloud environment. The SRE will work closely with development, security, and IT operations teams to enhance cloud solutions, implement best practices, and support scalable and resilient systems.
- Deploy and manage Azure cloud services including Virtual Machines, Storage, Redis, Azure SQL databases, virtual networks, and AKS clusters (Azure Kubernetes Service).
- Automate provisioning, configuration, and deployments using PowerShell, Bash, and Ansible.
- Deliver and deploy Azure infrastructure using Infrastructure as Code (IaC), specifically Azure bicep
- Review, Configure and implement monitoring functionalities to provide best visibility and transparency to level 1 support teams.
- Implement and Troubleshoot CI/CD pipelines for application deployments in Azure DevOps, Team City, Octopus
- Maintain system reliability using Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana, Splunk, Ops-Genie, Slack.
- Optimize performance and cost efficiency of Azure resources.
- Train junior members of the team to deliver best of breed solutions on top of Azure public cloud.
- Review, manage, and troubleshoot Azure Kubernetes Service (AKS) clusters.
- Review and Manage Cloud and On-Prem servers including AKS in terms of OS, RMQ Upgrades, Security Patches, Application Service support.
- Respond to system alerts, failures, and security incidents Perform root cause analysis (RCA) and implement preventive measures
- Provide Level 2 support in on-call capacity based on pre-approved schedule (including weekends)
- Review the network and security design for all infrastructure and applications hosted in Azure.
- Continuously promote better ways to deliver Infrastructure solutions on Azure cloud.
- Propose adoption of new approaches, patterns, techniques, and ideas recommended by industry standards and industry trends.
- Work closely with Software development and network teams to enhance platform reliability and identity better approaches.
- Administer and optimize Linux-based systems used for application hosting, ensuring stability, security, and performance in production and non-production environments.
- Troubleshoot issues in Linux operating systems, services, and middleware components to support application availability.
Requirements
- At least 3 years of proven experience in delivering infrastructure solutions on Azure cloud.
- 5+ years of hands-on experience with infrastructure design and deployment utilizing PaaS, SaaS and IaaS cloud offerings.
- At least 2 years of experience with Windows Server
- Experience with either Azure ARM templates or Azure Biceps
- At least 3 years of experience in Linux Administration and managing Linux Based OS, Applications
- At least 2 years of hands-on experience designing, building, and deploying containerized runtime environments based on Azure Kubernetes Services
- 1+ years of proven experience administering RabbitMQ clusters and Nginx
- Proven experience with scripting languages like: PowerShell, Python, JavaScript, Bash
- Experience using Splunk, Grafana, Ops-Genie is an asset
Advantageous skills: