Aryansh Shrivastava

I am an undergraduate student pursuing simultaneous BS degrees in Electrical Engineering & Computer Sciences (EECS) and Business Administration, with a minor in Data Science, at UC Berkeley in the Berkeley Artificial Intelligence Research (BAIR) Lab, where I am grateful to be advised by Professor Sergey Levine.

Leveraging my experience as a USA Computing Olympiad (USACO) Platinum contestant and a top contestant in regional through international science fairs for seven years, I am also a problem writer for official USACO contests to help select the USA team for the International Olympiad in Informatics (IOI), and a judge at the Alameda County Science & Engineering Fair (ACSEF) to help select projects to advance to the California State Science & Engineering Fair (CSEF) and the International Science & Engineering Fair (ISEF).

Email  /  CV  /  Scholar  /  Twitter  /  LinkedIn  /  Github

profile photo

Research

deception Deception in Dialogue: Evaluating and Mitigating Deceptive Behavior in Large Language Models.
Marwa Abdulhai, Ryan Cheng, Aryansh Shrivastava, Natasha Jaques, Yarin Gal, Sergey Levine.
In Review
Paper / Code / Website

Large Language Models (LLMs) interact with hundreds of millions of people worldwide, powering applications such as customer support, education and healthcare. However, their ability to produce deceptive outputs, whether intentionally or inadvertently, poses significant safety concerns. In this paper, we systematically investigate the extent to which LLMs engage in deception within dialogue. We benchmark 8 state-of-the-art models on 4 dialogue tasks, showing that LLMs naturally exhibit deceptive behavior in approximately 26% of dialogue turns, even when prompted with seemingly benign objectives Unexpectedly, models trained with RLHF, the predominant approach for ensuring the safety of widely-deployed LLMs, still exhibit deception at a rate of 43% on average. Given that deception in dialogue is a behavior that develops over an interaction history, its effective evaluation and mitigation necessitates moving beyond single-utterance analyses. We introduce a multi-turn reinforcement learning methodology to fine-tune LLMs to reduce deceptive behaviors, leading to a 77.6% reduction compared to other instruction-tuned models.

This website is currently a work in progress! In the meantime, please feel free to check out my CV. ~ 🧸


Template adapted from Jon Barron.