Open to ML Research Roles · 2026

Prabhjyot
Singh

ML Researcher & Engineer at the University of Waterloo.
Ethical AI · Reinforcement Learning · Multi-Agent Alignment

2 Papers
(AIES 2026)

4.0 Grad
GPA

6 Co-op
Terms

View Research View Projects Download Resume

About

About Me

I'm a Master's student at the University of Waterloo researching ethical AI and reinforcement learning under the UWECEML Lab, with two papers (one accepted to AIES 2026). My work builds evaluation frameworks that surface misaligned agent behavior and training methods that respect human values — from single-agent ethical constraints to multi-agent enforcement.

Across 6 co-op work terms, I've applied ML and software engineering in industry — from building PyTorch computer vision models at Kindred AI to designing LLM-powered agentic systems at BrainRidge Consulting. I enjoy sitting at the intersection of research and engineering.

Outside of research, I'm drawn to the philosophical questions behind alignment: what does it mean for an AI to behave "ethically," and how do we measure that rigorously?

AI / ML

JAX PyTorch TorchRL Reinforcement Learning TensorFlow Keras Scikit-Learn Hugging Face CUDA

Languages

Python C / C++ TypeScript JavaScript SQL R

Infrastructure

Docker AWS Terraform Compute Canada Redis NestJS GitHub Actions CI/CD

Robotics & Embedded

FANUC / ABB KAREL RTOS Webots Vivado

Research

Ethical AI & Reinforcement Learning

Building reinforcement learning agents that are both capable and aligned with human values — from single-agent ethical constraints to multi-agent enforcement. Graduate Research Assistant on the Moral AI Systems team at the UWECEML Lab, University of Waterloo.

Ethical Constraint Learning

Enforcing strict, non-linear ethical constraints in RL through a pipeline of Heuristic Oracles, Inverse Preference Learning (IPL), and Expected Scalarized Returns (ESR) over multi-objective rewards.

Multi-Agent Alignment

Studying when cooperation stays stable under self-interest. Dual-Enforcement RL pairs intrinsic constraints with social enforcement to close free-rider vulnerabilities in multi-agent systems.

Scalable RL Infrastructure

Engineering JAX-native, Compute Canada-compatible training and evaluation pipelines on the high-dimensional Craftax environment — built for reproducible, large-scale agent-behavior benchmarking.

Publications

UWECEML Lab · Moral AI Systems Team · University of Waterloo

Accepted · AIES 2026

The Forager's Dilemma: Peer Enforcement Alone Is Not Enough for Stable Multi-Agent Alignment

AAAI/ACM Conference on AI, Ethics, and Society (AIES) · 2026

Introduced Dual-Enforcement RL, combining intrinsic constraints with social enforcement to solve free-rider vulnerabilities in multi-agent AI systems.
Trained independent PPO agents in a custom gridworld, reducing deceptive signaling to 1% and improving sustainable resource gathering to 90% under adversarial stress tests.

Multi-Agent RL PPO AI Alignment Mechanism Design Python

In Preparation · 2026

Methods for Training and Evaluating Ethical Reinforcement Learning Behaviour

Moral AI Systems Team · UWECEML Lab · Oct 2025 – Present

Engineered a highly scalable, JAX-native Multi-Objective RL framework within the high-dimensional Craftax environment, using deep autoencoders for state compression.
Architected a novel AI alignment pipeline synthesizing Heuristic Oracles, Inverse Preference Learning (IPL), and Expected Scalarized Returns (ESR) to mathematically enforce strict, non-linear ethical constraints.
Built a Compute Canada-compatible framework for reproducible experiments — full configuration management, seeding, logging, and batch runs.

Reinforcement Learning JAX Craftax Ethical AI Value Alignment Compute Canada

Experience

Work History

6 co-op terms and industry experience across ML, robotics, and software engineering.

Graduate Research Assistant

UWECEML Lab, University of Waterloo · Waterloo, Canada

Oct 2025 – Present Research

Author on two papers on ethical reinforcement learning — one accepted to AIES 2026 (AAAI/ACM Conference on AI, Ethics & Society), one in preparation.
Engineered a JAX-native Multi-Objective RL framework on the high-dimensional Craftax environment, with an alignment pipeline (Heuristic Oracles, IPL, ESR) enforcing non-linear ethical constraints.
Introduced Dual-Enforcement RL for multi-agent settings — cut deceptive signaling to 1% and raised sustainable resource gathering to 90% under adversarial stress tests.
Built a Compute Canada-compatible framework for scalable, reproducible experiments (configs, seeding, logging, batch runs).

Software Engineer

BrainRidge Consulting · Toronto, Canada

May 2025 – Sep 2025 Industry

Designed and developed LLM-powered agents using Claude Sonnet 4, implementing advanced prompt engineering and validation loops for reliable structured outputs.
Built and deployed scalable NestJS microservices enabling secure GitHub and Jira REST API integration for automated issue and repository management.
Architected a role-based authentication system with Auth0, Redis, and JWT for consistent RBAC across distributed services.

Robotics Software Developer

Lincoln Electric Automation · Waterloo, Canada

Sep 2024 – Dec 2024 Industry

Developed and optimized embedded firmware for FANUC and ABB robotic systems in TypeScript, C++, and KAREL.
Improved trajectory planning and motion control algorithms, reducing erratic robotic movement by 30%.
Led a codebase refactoring initiative, reducing file count by 20% and improving overall architecture clarity.

XTL

System Analyst

XTL Transport Inc · Toronto, Canada

Jan 2024 – Apr 2024 Industry

Deployed an ITSM solution reducing ticket turnaround time by 45% and standardizing support workflows.
Rolled out networked digital signage across five warehouses, linking plug-and-play devices into a centralized dashboard.

KAI

Robotics Test Engineer

Kindred AI · Toronto, Canada

Jan 2023 – Apr 2023 Industry

Designed and implemented a PyTorch supervised learning model to adjust image brightness/contrast, reducing segmentation error from 30% to 10%.
Expanded automated end-to-end test coverage to 95% using Python and Cucumber in a virtual simulation environment.
Uncovered a 30% error rate in low-light scenarios, driving firmware calibration improvements for enhanced sensor reliability.

VTS

QA Engineer

View The Space Inc · Remote

May 2022 – Aug 2022 Industry

Implemented automated regression testing with Cypress, reducing manual QA effort by 15%.
Collaborated with product and engineering teams to improve documentation and accelerate feature releases by 20%.

QA Engineer

Plooto Inc · Toronto, Canada

Sep 2021 – Dec 2021 Industry

Built automated test suites in JavaScript with Ghost Inspector to validate OCR workflows.
Optimized HubSpot web pages (HTML/CSS/JS), boosting Lighthouse performance scores by 30 points.

Fullstack Engineer

FleetOperate · Remote

Jan 2021 – Apr 2021 Industry

Developed AWS-integrated REST APIs and built responsive frontends with Angular, TypeScript, and CSS.
Configured Ubuntu and AWS Linux servers for scalable, production-ready deployments.

Projects

Selected Work

Projects spanning ML systems, edge computing, human-robot interaction, and reinforcement learning.

Aether-Edge

Decentralized, edge-native predictive HVAC control with zero Age of Information — cut comfort violations by 76% and peak temperature overshoot by 90% versus centralized baselines.

Edge ComputingPythonSensorsSimulation

Details

NAO Robot Teacher Gender Study

Webots simulation investigating gender bias in human-robot interaction within educational environments using a humanoid NAO robot.

HRIRoboticsBias ResearchPython

Details

Aegis Lights

Self-adaptive urban traffic control system achieving a 45–49% reduction in average trip time across all traffic scenarios.

RLPythonFlaskSimulation

Details

Canary

IoT personal air quality monitor — custom PCB, embedded BLE firmware, 3D-printed enclosure, and Android companion app. Built as a 4th year capstone.

IoTEmbedded CPCB DesignBLE

Details

MTG Archetype Predictor

ML classifier that predicts Magic: The Gathering commander deck archetypes by integrating Scryfall and EDHRec community data.

MLClassificationPythonData Science

Details

Education

Academic Background

University of Waterloo, one of Canada's top engineering programs.

Master of Engineering

Electrical & Computer Engineering — AI Specialization

University of Waterloo

Sep 2025 – Present

GPA 84.67

Coursework

Algorithm Design & Analysis Intro to Artificial Intelligence Intelligent Sensors & Networks Self-Adaptive Software Systems Distributed & Network-Centric Computing Social Robotics

Bachelor of Applied Science

Computer Engineering — AI Option, Honours, Co-op

University of Waterloo

Sep 2020 – Apr 2025

Graduated

6 Co-op Terms

Awards

University of Waterloo President's Scholarship — 2021

Selected Coursework

Reinforcement Learning Introduction to Machine Learning Autonomous Vehicles Engineering Design Project (97) Compilers (87) Computer Architecture (84) Embedded Software Linear Algebra Probability & Statistics

Get In Touch

I'm currently seeking ML research internships and full-time roles for 2026. If you're working on interesting problems in AI alignment, RL, or autonomous systems — I'd love to connect.

prabhjyot045@gmail.com

singh-prabhjyot

GitHub

Prabhjyot045

PrabhjyotSingh

About Me

Ethical AI & Reinforcement Learning

Ethical Constraint Learning

Multi-Agent Alignment

Scalable RL Infrastructure

Publications

The Forager's Dilemma: Peer Enforcement Alone Is Not Enough for Stable Multi-Agent Alignment

Methods for Training and Evaluating Ethical Reinforcement Learning Behaviour

Work History

Graduate Research Assistant

Software Engineer

Robotics Software Developer

System Analyst

Robotics Test Engineer

QA Engineer

QA Engineer

Fullstack Engineer

Selected Work

Aether-Edge

NAO Robot Teacher Gender Study

Aegis Lights

Canary

MTG Archetype Predictor

Academic Background

Get In Touch

Prabhjyot
Singh