簡介
Develop and configure software systems either end-to-end or for a specific stage of product lifecycle. Apply knowledge of technologies, applications, methodologies, processes and tools to support a client, project or entity.
Responsibilities:
Design, provision, configure and maintain GCP & AWS cloud infrastructure defined as code that is secure, scalable, and highly available on GCP & AWS
Work collaboratively with software engineering and DevOps to define infrastructure and deployment requirements
Ensure configuration and compliance with configuration management tools
Contribute to maintenance and improvement of operational tools for deployment, monitoring, and analysis of AWS infrastructure and systems
Responsible development of standard operating procedures, runbooks, service levels, project status reports, and providing ongoing metrics and reporting
Monitoring client cloud infrastructure, identifying and reporting metrics on the network
Identification of incidents and subsequent analysis and investigation to determine their severity and the response required
Ensure that incidents are correctly reported and documented in accordance with client/government policy and procedures
Provide technical escalation coverage during incidents, establishing the extent of the incident, the business impacts, and advising on how best to contain the incident along with advice on systems hardening and mitigation measures to prevent a reoccurrence
Troubleshoot, identify, and resolve technical identity and access management & encryption keys related issues
Coach other members of the Project on the best practices that should be followed in Cloud Operations and adhere to defined SLA/KPI in the client contract.
Must have good knowledge about REST APIs, networking, software applications to troubleshoot & maintain applications.
Must have good troubleshooting & problem-solving skill to address application (APIs) & system issues.
Must have good knowledge about standard HTTP error codes & messages to address application issues.
Diagnose & resolve application issues (APIs) and challenges.
Monitor system potential issues and address them proactively before they become major problems.
Must have good knowledge on application or system log analysis.
Collaborate with SRE professionals and developers to address & solve technical problems.
Maintain records of application (APIs) bugs, glitches and the steps taken to resolve them.
Review application release changes & upgrade application services. Minimum 5 year(s) of experience is required
職位要求
Please refer to job description.