Job Description
DiploAI, an NYC-based startup at the forefront of regulatory intelligence, is looking for a Senior Data Engineer to help shape the foundation of our data platform from the ground up.
You’ll be a key player in building, scaling, and maintaining our data systems—working directly with founders and executives, owning critical systems, and influencing both technical direction and team culture. This is a rare opportunity to join early, move fast, and leave a lasting imprint on a category-defining product.
We operate hybrid in NYC. If you're not in the city, we expect relocation within the next few months.
🔍 In this role, you'll:
-
Own and scale our core web-scraping infrastructure for regulatory data
-
Translate high-level business needs into scalable, testable data systems
-
Collaborate with product and ML teams to enable rapid experimentation
-
Drive technology decisions and select the right stack and tools for long-term growth
-
Balance speed vs. scalability and clearly articulate trade-offs
-
Set standards and mentor others as one of the first hires on the data team
✅ Ideal candidates will have:
-
5+ years of experience as a Data Engineer, ideally in startup environments
-
Expertise in large-scale scraping pipelines, data normalization, and scalable database design
-
Strong grasp of monitoring, failure-tolerant systems, and production-ready pipelines
-
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
-
Excellent communication skills and a low-ego, high-collaboration mindset
-
U.S. work authorization (we do not support visa transfers)
🌟 Nice-to-haves:
-
Experience with AWS, Kubernetes, Terraform or similar infrastructure tooling
-
Interest in Generative AI / LLMs, and excitement to collaborate closely with ML engineers
-
Passion for building from zero and contributing beyond your job description