Job Description

Founding Research Scientist (Long-Horizon RL) at Lanturn

Location: San Francisco (preferred) / Remote (US)

Compensation: $300K base + 0.5–1% equity

Type: Full-time · Founding Team

At Lanturn, we are building the next generation of reinforcement learning systems for real-world agents. Our focus is on enabling AI systems to learn from behavioral data and long-horizon workflows, through:

High-fidelity RL environments
Synthetic data generation
Closed-loop training systems

We are looking for a Founding RL Researcher to push the frontier of:

Long-horizon RL
Environment design
Post-training for agents

About us:

Lanturn is building the end-to-end behavioural learning stack for AI systems. We believe current approaches to RL and post-training are limited by short-horizon optimisation, weak or proxy reward signals, and a lack of grounded environments. Our approach is to build closed-loop RL systems where environments, data, training, and evaluation are tightly integrated and based on real-world behavioral data.

The role:

As a Founding RL Researcher, you will lead efforts to develop novel reinforcement learning algorithms and environments for training autonomous agents. You will work across:

Algorithm design
Environment modelling
Training systems
Evaluation frameworks

This role sits at the intersection of:

Frontier Labs-style RL research (environments + algorithms)
Modern LLM post-training (RLHF, preference optimisation)

Key responsibilities:

Design and implement RL systems for long-horizon tasks (10–100+ steps)
Develop and extend modern post-training methods:
PPO, DPO, ORPO
GRPO / GRPO++ and ranking-based optimization methods
Build RL environments grounded in real-world workflows
Work on meta-RL and adaptive learning systems:
Generalization across tasks
Rapid adaptation to new environments
Design reward systems for:
Behavioural correctness
Efficiency and robustness
Develop evaluation frameworks aligned with real-world outcomes
Collaborate with engineering teams to scale training systems

Ideal candidate:

You are a researcher with strong theoretical grounding and real-world system intuition, capable of working on open-ended problems in RL. You thrive in environments where:

Problems are not well-defined
Systems must be built from first principles
Research directly translates into deployed systems

Minimum qualifications:

Experience at a top-tier AI lab or company: OpenAI, DeepMind, Anthropic, FAIR, or equivalent
Strong background in reinforcement learning and post-training systems
Experience training large-scale models (LLMs or similar)
Strong programming skills (Python, PyTorch/JAX)

Preferred qualifications:

Experience with long-horizon RL or sequential decision-making systems
Experience designing or working with RL environments
Familiarity with: Preference optimization (DPO, ORPO), RLHF pipelines, and automated RL env generation
Experience with meta-RL / adaptive learning systems
Strong publication record in top-tier ML conferences

Core technical skills:

Deep understanding of: Policy gradient methods (PPO and beyond), KL-regularized optimization, and credit assignment in long-horizon settings
Experience with: Cascading RL pipelines (SFT → RL → evaluation), distributed training systems, and stability and scaling challenges
Strong intuition for: Exploration vs exploitation, reward shaping vs reward learning, and trajectory-level optimization

What makes this role unique ?

Focus on long-horizon behavioral learning, not short-form RLHF
Treats environment design and generation as a first-class problem
Opportunity to define GRPO++-style next-generation algorithms and publish to NeurIPS

Why join Lanturn ?

Founding ownership (0.5–1% equity)
Work on unsolved problems in RL and agent systems
High autonomy and research freedom
Direct impact on how real-world AI systems are trained
Work with second time founders directly who have worked with various big tech companies and enterprises.

If you’ve worked on RL at a top lab or have had production RL experience and want to push beyond current paradigms into real-world, long-horizon intelligence, this is your opportunity.

Job Tags

Full time, Remote work

Similar Jobs

Mind Friend

Psychology Research Intern (Remote) Job at Mind Friend

...growing mental health technology startup committed to improving psychological well-being through innovative digital solutions. We connect... ...Position Overview We are seeking a dedicated Psychology Research Intern to join our remote team. This internship offers a unique...

The H&K Group

1st & 2nd Shift Groundsperson Job at The H&K Group

...pays attention to detail, and excels at teamwork. Why work for H&K Group, Inc.? ~ Competitive salary commensurate with experience ~100% Company-paid Health Benefits ~401(k) Savings and Investment Plan ~ Tuition reimbursement programs available to qualifying...

New River Electrical Corporation

Project Manager - Transmission Line (On site position) Job at New River Electrical Corporation

...Position Title: Project Manager - Transmission Line (On site position) Location: Granville , OH Pay Range: N/A Application Instructions... ...with existing customers. # Demonstrate the highest level of business ethics and consistently adhere to and promote New...

Kelly Professional & Industrial

Clerk I - Intern/Co-Op (Graduate/MBA) Job at Kelly Professional & Industrial

...Great opportunities are right here in your backyard. Kelly is looking for a Clerk I - Intern/Co-Op (Graduate/MBA/PhD) - Summer Internship CADD to work at a premier global healthcare innovator, Johnson & Johnson Vision, in San Diego, CA 92121. Let us help you grow at...

Confidential

Business Systems Analyst Job at Confidential

...for this role, including but not limited to F-1 CPT, F-1 OPT, F-1 STEM OPT, J-1, H-1B, TN, O-1, E-3, H-1B1, or L-1. Business Systems Analyst Responsibilities: Provide Tier II technical support to end-users on various issues and be responsible for...

Founding RL Researcher Job at Lanturn, San Jose, CA

eXpkVXZHRmpJMUw1Y29mSjJYcnZkT256RFE9PQ==