CAREERS

Platform / Infrastructure Engineer

LoopSmart is a technology research and development company. We build advanced software, AI systems, and infrastructure that power our partners' products. We focus on creating tangible technology solutions that solve complex problems at scale.

We are seeking a Platform / Infrastructure Engineer to build the foundations that make our systems predictable and secure. In this role, you will treat infrastructure as a product, designing the CI/CD pipelines, observability stacks, and deployment environments that enable our researchers and engineers to ship faster and safer. You will value boring reliability and secure defaults over bleeding-edge complexity.

You will be responsible for the full lifecycle of our platform: from defining the infrastructure-as-code modules that standardize our deployments to debugging complex production incidents. You will work to reduce toil and improve the developer experience, ensuring that the "right way" to deploy is also the easiest way.

Key Job Responsibilities

Build and maintain CI/CD pipelines with safe rollout patterns and clear visibility into failures.
Implement infrastructure-as-code (Terraform/CDK) and standardize environments (dev/stage/prod) to reduce drift.
Improve observability: logs, metrics, traces, alerting, and runbooks.
Design platform guardrails: least-privilege IAM, secrets handling, network boundaries, and secure defaults.
Drive reliability work with measurable SLOs and post-incident learning.
Collaborate with development teams to optimize resource usage and cost.

A day in the life

Your day might start by investigating an alert about increased latency in our inference cluster. You identify a noisy neighbor issue and adjust the resource limits in our container orchestration configuration. Later, you work on a Terraform module that standardizes how we deploy vector databases, ensuring that backups and encryption are enabled by default. In the afternoon, you help a research engineer debug a failed deployment pipeline, teaching them how to interpret the build logs. You wrap up by writing a post-mortem for a minor outage, identifying the root cause and proposing a fix to prevent recurrence.

About the team

You will join a small, high-density team of researchers and engineers who value shipping over hype. We operate like a lab: we form hypotheses, run experiments, and document results. We are not a feature factory; we are an asset factory. We value clear writing, intellectual honesty, and the discipline to finish what we start. We work asynchronously and respect deep work time.

Basic Qualifications

3+ years of experience operating production systems and improving reliability.
Comfort with AWS primitives and infrastructure-as-code concepts (Terraform, CDK).
Strong debugging instincts across app + infrastructure layers.
Security-minded engineering: you prefer secure defaults and explicit access boundaries.
Bachelor's degree in Computer Science or equivalent practical experience.

Preferred Qualifications

Experience with Kubernetes or serverless patterns.
Experience designing on-call and incident response processes that are humane and effective.
Experience with compliance-minded logging and auditability.
Familiarity with MLOps or data engineering pipelines.