Twiga Senior Site Reliability Engineer Jobs in Kenya

Twiga Senior Site Reliability Engineer Jobs in Kenya



About Twiga

Twiga is a B2B e-commerce company that builds fair and reliable markets for agricultural producers, food manufacturers and retailers based on transparency and efficiency. Our Mission is to build a closed ecosystem for the African retail, anchored on affordable access to food and grocery across urban cities. Our Ambition is to leverage technology, the ubiquity of mobile phones, modern distribution and logistics to modernize African retail.

Senior Site Reliability Engineer Vacancy

The role holder will be responsible for leading the end-to-end design, development and deployment of engineering solutions to run scalable, distributed and fault-tolerant software systems for Twiga Foods. The role holder will lead the implementation of automated solutions to ensure uptime, reliability and improvement of Twiga Food’s systems in line with set service level objectives.

He/she will be required to provide leadership in determining software engineering needs from product/engineering requirements and collaborating across the organisation to clarify requirements and expected outcomes.

They are also accountable for work assigned, ensuring that it is broken down into a plan with estimates, priorities and deliverables; ensuring that adherence to the plan and communicating when any adjustments to scope are needed to meet deadlines.

Additionally, he/she will contribute to the wellbeing of the Twiga technology ecosystem by tracking production systems’ capacity and performance, fixing issues and taking on-call responsibilities.

Key Responsibilities

Site Reliability

  • Collaborate with other cross-functional teams to design, develop, and deliver required software

  • Develop, manage and support SRE tools and applications.

  • Lead/own and drive the development/implementation of SRE tools within the Product/Technical Requirements Document.

  • Develop or review technical specification documents within the SRE team and wider engineering team.

  • Lead the deployment, training, and rollout of major/minor SRE tools across various engineering/tech teams.

  • Deliver feature work consistently and on time whilst still tackling tech debt. Ensure that code fits agreed, accuracy, testability, and efficiency and style guidelines. Software systems that meet agreed SLO for performance and reliability

  • Produce a work breakdown structure with estimates, deadlines, and deliverables. Own features from technical specification, implementation right through to deployment into production

  • Engage in improving the software development lifecycle, providing feedback on requirements, architecture, designs, and solutions.

  • Build resilience into systems so underlying failures are handled gracefully and do not impact end users.

  • Develop automated predictive analysis of future capacity needs and proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage.

  • Manage individual priorities, deadlines, and deliverables.

  • Defend and challenge technical decisions made through solution design and code review feedback

  • Finalise and own technical documentation for the developed features

    On-Call Technical Support

  • Monitor application availability and performance, take steps to improve overall application performance and stability, and follow through with implementation

  • Participate in on-call technical support rotation, respond to all incidents, and lead minor/major incidents in collaboration with relevant engineering/product stakeholders.

  • Triage system issues and debug/track/resolve by analysing the sources and offering corrective measures. Through end-to-end incident response and management.

  • Drive efficiencies through systems improvement and root cause analysis resulting in service delivery, maturity, and scalability.

  • Analyze logs and telemetry data by writing monitoring and automation code.

  • Identify and automate repetitive, manual, and non-tactical work that impacts software development and deployment.

    Innovation

  • Investigate site reliability technologies and their applicability to the Twiga ecosystem.

  • Identify significant projects that result in substantial improvements in reliability, cost savings and/or revenue.

  • Provide reports on findings, with recommendations and a viable plan of action.

  • Lead design reviews with peers and stakeholders to decide amongst available technologies

  • Evaluate and review existing
    systems, SRE processes, & tools.

  • Develop and lead implementation of a viable technical specification document in collaboration with members of SRE or engineering team.

  • Contribute to the definition of SLOs for services/applications.

    In-Team Collaboration

  • Work with peers to build a stronger engineering team

  • Lead process improvements that boost productivity and quality of Twiga engineering

  • Regularly contribute improvements to existing documentation and codebase as per agreed standards.

  • Review code developed by others and provide feedback to ensure adherence to Twiga Engineering best practices.

  • Contribute regular knowledge shares through a variety of mediums including lunch and learn sessions.

  • Provide mentorship for SRE engineers and interns in the section.

  • Mentor/Coach/Train engineers on system design, reliability, monitoring, and availability concepts to help improve the overall system quality.

  • Develop and maintain relationships with various engineering teams and their members.

  • Acquire and maintain an understanding of multiple engineering teams processes and tools.

  • Influences the engineering roadmap and works with engineering and/or product counterparts to influence improved resiliency and reliability of Twiga systems.

  • Deep domain knowledge and radiation that knowledge through recorded demos, technical presentations, discussions, and Incident Reviews.

    Self-management

  • Model Twiga’s culture and way of working.

  • Deliver the performance objectives set for the team. Hold monthly 1-on-1 performance reviews with line manager, and institute corrective action where performance falls below expectation.

  • Proactively manage own learning and development

  • Adhere to the annual leave plan agreed with the line manager

  • Adhere to people management policies

    Compliance

  • Comply with all organization policies, procedures, and statutory guidelines. Minimize and mitigate risks to the organization and enforce zero-tolerance to non-compliance.

  • Close gaps/lapses identified as an outcome of audits; risk and/or any other compliance review; investigations; or other assessment mechanisms and take corrective/preventive actions within the agreed timelines.

    Minimum Qualifications & Requirements

  • Degree in Engineering, Computer Science, Information Technology or a related discipline. Or demonstrated equivalent skill/competence.

  • Minimum of 5 years of relevant experience

  • Observability and monitoring of infrastructure, applications, services, and networks

  • Troubleshooting issues across the entire stack (hardware, software, network etc.)

  • Writing infrastructure as code and automation scripts

  • Building and maintaining CI/CD pipelines

  • Building, running, and optimising containers with Docker or ContainerD

  • Setting up, running, and managing Virtual machines, Kubernetes clusters, Databases and Virtual Private Networks

  • Operating highly available and reliable infrastructure

  • At least 3 years’ experience working with relational databases (Postgres, MySQL or Microsoft SQL Server) non-relational, and in-memory data stores

  • At least 2 years' experience creating/managing SLIs/SLOs/Error Budgets.

  • Strong technical understanding of android, front-end and backend development

  • Experience in design, implementing and securing distributed systems

  • Strong experience with; Analysing logs, metrics and traces.

  • Creating system reports and system alerts.

  • The use, maintenance and configuration of monitoring, observability and telemetry metrics and logging infrastructure (Prometheus, Grafana, ELK, or Sentry)

  • Understanding of Agile/Scrum development principles

  • Understanding of ITIL incident and problem management practices

  • Can work accurately and quickly, to ensure key project milestones are achieved within set timelines, even when working under pressure.

  • Always have a positive attitude and approach to the role and team.

    How to Apply

    For more information and job application details, see; Twiga Senior Site Reliability Engineer Jobs in Kenya

    Find jobs in Kenya. Jobs - Kenya jobs. Search our career portal & find the latest Kenyan job positions, career opportunities & jobs in Kenya.

    Jobs in Kenya - banking jobs, IT jobs, accounting jobs, NGO jobs, business administration, ICT, UN jobs, procurement jobs, education jobs, hospital jobs, human resources jobs, engineering, teaching jobs, and other careers in Kenya.

    Find your dream job from 1000s of vacancies in Kenya posted and updated daily - click here!

  • Click here to post comments

    Join in and write your own page! It's easy to do. How? Simply click here to return to 2 Best Africa Jobs.