Senior Site Reliability Engineer (SRE) - (Dublin, CA) Job at Articul8 AI, Dublin, CA

L2VHSTQrditscTdjcE9sb1ArREFTeEhaaVE9PQ==
  • Articul8 AI
  • Dublin, CA

Job Description

About Us

Articul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. Our platform empowers organizations to leverage the power of artificial intelligence in a reliable, scalable, and secure environment.

Position Overview

We are seeking an experienced Site Reliability Engineer (SRE) to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform. As an SRE, you will bridge the gap between development and operations, implementing automation and best practices to maintain our service reliability objectives while supporting rapid innovation.

Key Responsibilities

  • Architect and maintain scalable, highly available infrastructure for our GenAI platform.
  • Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance.
  • Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency.
  • Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality.
  • Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact.
  • Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads.
  • Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives.
  • Optimize infrastructure for performance, scalability, and cost-effectiveness—especially for high-demand AI workloads.
  • Implement and enforce security best practices across all systems and environments.
  • Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge.

Qualifications

Required

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
  • 8+ years of experience in DevOps, SRE, or similar roles
  • Strong experience with cloud platforms (AWS, GCP, or Azure)
  • Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)
  • Hands-on experience with infrastructure as code tools (Terraform, CloudFormation, etc.)
  • Solid background in containerization technologies (Docker, Kubernetes)
  • Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)
  • Strong understanding of CI/CD pipelines and automation
  • Exceptional troubleshooting and problem-solving skills and ability to troubleshoot complex systems

Preferred

  • Experience supporting AI/ML systems in production
  • Knowledge of GPU infrastructure management and optimization
  • Familiarity with distributed systems and high-performance computing
  • Experience with database systems (SQL and NoSQL)
  • Certifications in cloud platforms (AWS, GCP, Azure)
  • Experience with chaos engineering and resilience testing
  • Knowledge of security best practices and compliance requirements

Ready to shape the future of resilient software systems? Apply now and help drive the reliability of tomorrow’s AI at Articul8 AI!

Job Tags

Similar Jobs

Aerotek

Assembler Job at Aerotek

**Now Hiring!** Mechanical Assembler- Medical Machines Manufacturing**Location:** Buffalo Grove, IL**Shifts Available:**+ Morning Shift: 5:45 AM - 2:00 PM (Monday-Friday) | $18 - 19/hr+ Evening Shift: 2:15 PM - 10:30 PM (Monday-Friday) | $20 - 21/hr**Key Responsibilities... 

ICON Plc

Clinical Research Associate Job at ICON Plc

 ...) ICON plc is a world-leading healthcare intelligence and clinical research organization. Were proud to foster an inclusive environment...  ...development. We are currently seeking a Senior Clinical Research Associate or Clinical Research Associate II to join our diverse and... 

Mysource Digital Marketing

Digital Marketing Company Seeking a Talented Web Developer Job at Mysource Digital Marketing

 ...Strong collaborative skills to work effectively in team-oriented environments. Additional digital marketing skills such as SEM, SEO, Google Analytics, Content Management, and Social Media Marketing will serve as a strong advantage. Don't miss this chance to join a... 

Federal Express Corporation

Seasonal Package Handler - Part Time (Warehouse like) Job at Federal Express Corporation

IMMEDIATE OPENINGS! Come for a job and stay for a career! Federal Express Corporation (FEC) is part of the rapidly growing warehouse and transportation sector that helps keep America, and our economy, moving. Be part of a winning team and workplace community that cares...

Red Bull

Manager, Employee Experience Job at Red Bull

 ...Experience & Development within the Talent department. This role is responsible for ensuring employees across all North American RedBull entities feel engaged and supported by owning, designing, adapting, implementing, and managing scalable solutions across the end-to...