SysOps Engineer
The SysOps Engineer is responsible for the configuration, reliability and efficiency of systems. He/She optimizes the capacity and performance of infrastructure, using knowledge of coding and scripting to automate the resolution of recurring issues and elimination of tasks, as well as enabling scalable and distributed systems. He also supports system installation and upgrades, performs continuous monitoring of infrastructure and ensures security and compliance in leveraging cloud platforms. He possesses a high level of proficiency in scripting and programming languages. He is familiar with cloud platforms, scaling and management of infrastructure. He works well with a variety of internal and external stakeholders. He is able to work on an on-call and shift basis, with the ability to prioritize effectively and operate under pressure. The SysOps Engineer enjoys hands-on problem-solving and is driven by investigating challenging, complex problems. He is a resourceful and self-directed individual who performs independently with minimal guidance. He is also an analytical thinker who demonstrates strong interpersonal skills in cross-team collaboration.
Skills and Competencies
Technical Skills & Competencies
Generic Skills & Competencies
Critical Work Functions and Key Tasks
• Develop processes and standards for system or application reliability in areas of availability, performance, latency, capacity,
emergency response, capacity planning, change management, security and monitoring
• Translate business needs into cloud architectural requirements
• Design scalable, robust systems using cloud architecture
• Create procedures and documentation for site reliability and incident management
• Build and run large-scale, massively distributed and fault-tolerant systems
• Perform provisioning of cloud resources
• Configure infrastructure environment for software development and prototyping
• Conduct pre-deployment testing of systems to ensure reliability
• Implement operational cost control mechanisms for cloud infrastructure
• Identify and resolve deployment issues
• Oversee configuration of operational systems to ensure alignment with technical and security requirements
• Conduct measurement and monitoring of overall performance, system health, system availability, and latency
• Provide proactive updates or alerts on infrastructure availability to relevant stakeholders
• Address gaps in performance or availability based on identified metrics
• Carry out testing and release procedures to ensure rigour of infrastructure and services
• Resolve service operation issues and prevent recurrence using automation
• Perform regular tuning of infrastructure and services
• Conduct capacity planning for cloud infrastructure and systems performance analysis
• Identify opportunities to enhance operational workflows, systems and processes through automated deployment
• Develop tools and scripts to automate deployments and optimise performance
• Create an operating environment for monitoring, alerting, self-healing and automated recovery
• Devise strategies and roadmap for scaling of infrastructure operations
• Design and write code for scalable systems
• Scale systems through automation to manage recurring tasks
• Propose suggestions to enhance infrastructure architecture