Sergey Levine

University of California, Berkeley

H-index: 156

North America-United States

Description

Sergey Levine, With an exceptional h-index of 156 and a recent h-index of 149 (since 2020), a distinguished researcher at University of California, Berkeley, specializes in the field of Machine Learning, Robotics, Reinforcement Learning.

His recent articles reflect a diverse array of research interests and contributions to the field:

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Learning Visuotactile Skills with Two Multifingered Hands

Multi-stage cable routing through hierarchical imitation learning

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control

Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation

Professor Information

University	University of California, Berkeley
Position	Google
Citations(all)	121612
Citations(since 2020)	111488
Cited By	42036
hIndex(all)	156
hIndex(since 2020)	149
i10Index(all)	429
i10Index(since 2020)	427
Email	Access Email
University Profile Page	University of California, Berkeley

Research & Interests List

Machine Learning

Robotics

Reinforcement Learning

Top articles of Sergey Levine

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

A broad use case of large language models (LLMs) is in goal-directed decision-making tasks (or "agent" tasks), where an LLM needs to not just generate completions for a given prompt, but rather make intelligent decisions over a multi-turn interaction to accomplish a task (e.g., when interacting with the web, using tools, or providing customer support). Reinforcement learning (RL) provides a general paradigm to address such agent tasks, but current RL methods for LLMs largely focus on optimizing single-turn rewards. By construction, most single-turn RL methods cannot endow LLMs with the ability to intelligently seek information over multiple turns, perform credit assignment, or reason about their past actions -- all of which are critical in agent tasks. This raises the question: how can we design effective and efficient multi-turn RL algorithms for LLMs? In this paper, we develop a framework for building multi-turn RL algorithms for fine-tuning LLMs, that preserves the flexibility of existing single-turn RL methods for LLMs (e.g., proximal policy optimization), while accommodating multiple turns, long horizons, and delayed rewards effectively. To do this, our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel: a high-level off-policy value-based RL algorithm to aggregate reward over utterances, and a low-level RL algorithm that utilizes this high-level value function to train a token policy within each utterance or turn. Our hierarchical framework, Actor-Critic Framework with a Hierarchical Structure (ArCHer), can also give rise to other RL methods. Empirically, we find that ArCHer significantly improves efficiency and …

Authors

Yifei Zhou,Andrea Zanette,Jiayi Pan,Sergey Levine,Aviral Kumar

Journal

arXiv preprint arXiv:2402.19446

Published Date

2024/2/29

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Large language models (LLMs) have a tendency to generate plausible-sounding yet factually incorrect responses, especially when queried on unfamiliar concepts. In this work, we explore the underlying mechanisms that govern how finetuned LLMs hallucinate. Our investigation reveals an interesting pattern: as inputs become more unfamiliar, LLM outputs tend to default towards a ``hedged'' prediction, whose form is determined by how the unfamiliar examples in the finetuning data are supervised. Thus, by strategically modifying these examples' supervision, we can control LLM predictions for unfamiliar inputs (e.g., teach them to say ``I don't know''). Based on these principles, we develop an RL approach that more reliably mitigates hallucinations for long-form generation tasks, by tackling the challenges presented by reward model hallucinations. We validate our findings with a series of controlled experiments in multiple-choice QA on MMLU, as well as long-form biography and book/movie plot generation tasks.

Authors

Katie Kang,Eric Wallace,Claire Tomlin,Aviral Kumar,Sergey Levine

Journal

arXiv preprint arXiv:2403.05612

Published Date

2024/3/8

Learning Visuotactile Skills with Two Multifingered Hands

Aiming to replicate human-like dexterity, perceptual experiences, and motion patterns, we explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing. To tackle the first challenge, we develop HATO, a low-cost hands-arms teleoperation system that leverages off-the-shelf electronics, complemented with a software suite that enables efficient data collection; the comprehensive software suite also supports multimodal data processing, scalable policy learning, and smooth policy deployment. To tackle the latter challenge, we introduce a novel hardware adaptation by repurposing two prosthetic hands equipped with touch sensors for research. Using visuotactile data collected from our system, we learn skills to complete long-horizon, high-precision tasks which are difficult to achieve without multifingered dexterity and touch feedback. Furthermore, we empirically investigate the effects of dataset size, sensing modality, and visual input preprocessing on policy learning. Our results mark a promising step forward in bimanual multifingered manipulation from visuotactile data. Videos, code, and datasets can be found at https://toruowo.github.io/hato/ .

Authors

Toru Lin,Yu Zhang,Qiyang Li,Haozhi Qi,Brent Yi,Sergey Levine,Jitendra Malik

Journal

arXiv preprint arXiv:2404.16823

Published Date

2024/4/25

Multi-stage cable routing through hierarchical imitation learning

We study the problem of learning to perform multistage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multistage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of multiple steps that must be executed successfully to complete the entire task. In such settings, learning individual primitives for each stage that succeed with a high enough rate to perform a complete temporally extended task is impractical: if each stage must be completed successfully and has a nonnegligible probability of failure, the likelihood of successful completion of the entire task becomes negligible. Therefore, successful controllers for such multistage tasks must be able to recover from failure and compensate for …

Authors

Jianlan Luo,Charles Xu,Xinyang Geng,Gilbert Feng,Kuan Fang,Liam Tan,Stefan Schaal,Sergey Levine

Journal

IEEE Transactions on Robotics

Published Date

2024/1/11

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control

This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world.The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot's I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps.

Authors

Zhongyu Li,Xue Bin Peng,Pieter Abbeel,Sergey Levine,Glen Berseth,Koushil Sreenath

Journal

arXiv preprint arXiv:2401.16889

Published Date

2024/1/30

Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents

Recent progress in large language models (LLMs) has demonstrated the ability to learn and leverage Internet-scale knowledge through pre-training with autoregressive models. Unfortunately, applying such models to settings with embodied agents, such as robots, is challenging due to their lack of experience with the physical world, inability to parse non-language observations, and ignorance of rewards or safety constraints that robots may require. On the other hand, language-conditioned robotic policies that learn from interaction data can provide the necessary grounding that allows the agent to be correctly situated in the real world, but such policies are limited by the lack of high-level semantic understanding due to the limited breadth of the interaction data available for training them. Thus, if we want to make use of the semantic knowledge in a language model while still situating it in an embodied setting, we must construct an action sequence that is both likely according to the language model and also realizable according to grounded models of the environment. We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives. We demonstrate this guided decoding strategy is able to solve complex, long-horizon embodiment tasks in a robotic setting by leveraging the knowledge of both models. The project's website can be found at grounded-decoding.github.io.

Authors

Wenlong Huang,Fei Xia,Dhruv Shah,Danny Driess,Andy Zeng,Yao Lu,Pete Florence,Igor Mordatch,Sergey Levine,Karol Hausman,Brian Ichter

Journal

arXiv preprint arXiv:2303.00855

Published Date

2023/3/1

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization from existing datasets followed by fast online fine-tuning with limited interaction. However, existing offline RL methods tend to behave poorly during fine-tuning. In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities. Our approach, calibrated Q-learning (Cal-QL), accomplishes this by learning a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q-values are at a reasonable scale. We refer to this property as calibration, and define it formally as providing a lower bound on the true value function of the learned policy and an upper bound on the value of some other (suboptimal) reference policy, which may simply be the behavior policy. We show that offline RL algorithms that learn such calibrated value functions lead to effective online fine-tuning, enabling us to take the benefits of offline initializations in online fine-tuning. In practice, Cal-QL can be implemented on top of the conservative Q learning (CQL) for offline RL within a one-line code change. Empirically, Cal-QL outperforms state-of-the-art methods on 9/11 fine-tuning benchmark tasks that we study in this paper. Code and video are available at https://nakamotoo. github. io/Cal-QL

Authors

Mitsuhiko Nakamoto,Yuexiang Zhai,Anikait Singh,Max Sobol Mark,Yi Ma,Chelsea Finn,Aviral Kumar,Sergey Levine

Journal

Neural Information Processing Systems (NeurIPS)

Published Date

2023/3/9

Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation

Recent years in robotics and imitation learning have shown remarkable progress in training large-scale foundation models by leveraging data across a multitude of embodiments. The success of such policies might lead us to wonder: just how diverse can the robots in the training set be while still facilitating positive transfer? In this work, we study this question in the context of heterogeneous embodiments, examining how even seemingly very different domains, such as robotic navigation and manipulation, can provide benefits when included in the training data for the same model. We train a single goal-conditioned policy that is capable of controlling robotic arms, quadcopters, quadrupeds, and mobile bases. We then investigate the extent to which transfer can occur across navigation and manipulation on these embodiments by framing them as a single goal-reaching task. We find that co-training with navigation data can enhance robustness and performance in goal-conditioned manipulation with a wrist-mounted camera. We then deploy our policy trained only from navigation-only and static manipulation-only data on a mobile manipulator, showing that it can control a novel embodiment in a zero-shot manner. These results provide evidence that large-scale robotic policies can benefit from data collected across various embodiments. Further information and robot videos can be found on our project website http://extreme-cross-embodiment.github.io.

Authors

Jonathan Yang,Catherine Glossop,Arjun Bhorkar,Dhruv Shah,Quan Vuong,Chelsea Finn,Dorsa Sadigh,Sergey Levine

Journal

arXiv preprint arXiv:2402.19432

Published Date

2024/2/29