Data Reliability Engineers are responsible for ensuring the reliability, performance, and scalability of our data platforms and pipelines. You will collaborate with data engineering, product, and business teams to design, build, and maintain data systems that drive key business decisions. Operating within an Agile environment, you’ll implement best practices from DevOps and SRE disciplines, leveraging automation to reduce operational overhead and improve system reliability.
Data Platform Reliability: Maintain the health, performance, and uptime of data infrastructure (including data lakes, warehouses, and pipelines) through proactive monitoring and automated systems.
Monitor and Optimize Data Pipelines: Build observability into data pipelines to identify and resolve bottlenecks or failure points in real-time. Utilize A/B testing and canary releases to test pipeline changes in production safely.
Automate Incident Response: Develop automated incident response systems and playbooks that detect anomalies, resolve common issues, and reduce Mean Time to Recovery (MTTR).
Scalable Data Infrastructure: Partner with data engineers to ensure systems scale efficiently, managing storage and compute resources within cloud environments (AWS, Azure, GCP).
DataOps Implementation: Work within an Agile framework to apply DataOps methodologies, driving continuous integration, delivery, and deployment for data systems.
SLA/SLI/SLO Management: Establish and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for data platforms. Monitor Service Level Agreements (SLAs) and ensure data systems meet defined expectations for uptime and performance.
Reliability Automation: Build and manage tools for automated testing, health checks, and disaster recovery processes to ensure reliable data ingestion, transformation, and storage.
Collaboration: Work closely with data engineering, data analysts, and business teams to ensure data systems meet organizational needs while maintaining reliability and agility.
Cost and Performance Optimization: Identify opportunities to optimize costs for storage and compute resources while maintaining or improving performance.
Incident and Problem Management: Lead incident resolution for data outages or pipeline failures, conduct post-mortems, and implement preventative measures to reduce future incidents.
Continuous Improvement: Identify and implement ongoing improvements for data reliability, observability, and operational excellence within the platform.
Experience: 5+ years of experience in data engineering, data operations, or SRE roles, with a focus on data infrastructure and platform management.
Agile & SRE Methodologies: Strong understanding of Agile frameworks, SRE principles, and DevOps practices with a passion for reducing manual operations through automation.
Cloud Experience: Hands-on experience with cloud platforms (AWS, Azure, GCP) and data services (Redshift, BigQuery, Synapse).
Automation & Infrastructure as Code (IaC): Expertise in automation tools such as Terraform, Ansible, Kubernetes, and Helm to manage infrastructure at scale.
Observability Tools: Experience with monitoring and logging systems (Datadog, New Relic, Prometheus) specifically for data platforms.
Database and Storage Systems: Experience managing both SQL and NoSQL databases.
Incident Response: Proven track record of managing high-severity incidents and conducting post-mortems.
Vista is a leading global investment firm that exclusively invests in enterprise software, data and technology-enabled organizations across private equity, permanent capital, credit and public equity strategies, bringing an approach that prioritizes creating enduring market value for the benefit of its global ecosystem of investors, companies, customers and employees. Vista’s investments are anchored by a sizable long-term capital base, experience in structuring technology-oriented transactions and proven, flexible management techniques that drive sustainable growth. Vista believes the transformative power of technology is the key to an even better future – a healthier planet, a smarter economy, a diverse and inclusive community and a broader path to prosperity. Further information is available at vistaequitypartners.com. Follow Vista on LinkedIn, @Vista Equity Partners, and on X, @Vista_Equity.
Software Powered by iCIMS
www.icims.com