about role
- Maintain the stability, scalability, and performance of an Azure-hosted AI platform.
- Lead proactive monitoring, fault detection, and incident response to minimize downtime.
- Establish and manage disaster recovery and business continuity frameworks.
- Oversee cybersecurity operations, including risk assessments, vulnerability remediation, and security audits.
- Collaborate with engineering and AI/ML teams to improve automation, observability, and operational processes.
skills and requirement
- Bachelor’s degree in Computer Science, Engineering, or a related discipline.
- At least 6 years of hands-on experience in cloud infrastructure management.
- Strong proficiency in Azure operations, including monitoring and diagnostic tools (e.g., Azure Monitor, Log Analytics, Application Insights).
- Proven experience in SRE methodologies, incident management, and recovery planning.
- Skilled in cloud security operations such as IAM, SIEM/SOAR, vulnerability management, and endpoint protection.
- Proficient in Infrastructure-as-Code (Terraform, Bicep, ARM) and scripting languages (PowerShell, Python).
- Understanding of AI/ML workloads and related infrastructure components (e.g., AKS, GPU environments, data pipelines).
- Awareness of industry security frameworks such as ISO 27001, CIS, or NIST.
To apply online please use the 'apply' function, alternatively you may contact Stella at 96554170 (EA: 94C3609 /R1875382)
...