Pieter Abbeel

University of California, Berkeley

H-index: 154

North America-United States

Description

Pieter Abbeel, With an exceptional h-index of 154 and a recent h-index of 134 (since 2020), a distinguished researcher at University of California, Berkeley, specializes in the field of Robotics, Machine Learning, AI.

His recent articles reflect a diverse array of research interests and contributions to the field:

Language quantized autoencoders: Towards unsupervised text-image alignment

Blockwise Parallel Transformers for Large Context Models

Twisting lids off with two hands

Learning universal policies via text-guided video generation

Accelerating reinforcement learning with value-conditional state entropy exploration

Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

Functional Graphical Models: Structure Enables Offline Data-Driven Optimization

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control

Professor Information

University	University of California, Berkeley
Position	\| Covariant.AI
Citations(all)	135633
Citations(since 2020)	114549
Cited By	58511
hIndex(all)	154
hIndex(since 2020)	134
i10Index(all)	379
i10Index(since 2020)	365
Email	Access Email
University Profile Page	University of California, Berkeley

Research & Interests List

Robotics

Machine Learning

Top articles of Pieter Abbeel

Language quantized autoencoders: Towards unsupervised text-image alignment

Recent progress in scaling up large language models has shown impressive capabilities in performing few-shot learning across a wide range of natural language tasks. However, a key limitation is that these language models fundamentally lack grounding to visual perception-a crucial attribute needed to extend to real world tasks such as in visual-question answering and robotics. While prior works have largely connected image to text through pretraining or fine-tuning, learning such alignments are generally costly due to a combination of curating massive datasets and large computational burdens. In order to resolve these limitations, we propose a simple yet effective approach called Language-Quantized AutoEncoder (LQAE), a modification of VQ-VAE that learns to align text-image data in an unsupervised manner by leveraging pretrained language model denoisers (eg, BERT). Our main idea is to encode images as sequences of text tokens by directly quantizing image embeddings using a pretrained language codebook. We then feed a masked version of the quantized embeddings into a BERT to reconstruct the original input. By doing so, LQAE learns to represent similar images with similar clusters of text tokens, thereby aligning these two modalities without the use of aligned text-image pairs. We show LQAE learns text-aligned image tokens that enable few-shot multi-modal learning with large language models, outperforming baseline methods in tasks such as image classification and VQA while requiring as few as 1-10 image-text pairs.

Authors

Hao Liu,Wilson Yan,Pieter Abbeel

Journal

Advances in Neural Information Processing Systems

Published Date

2024/2/13

Blockwise Parallel Transformers for Large Context Models

Transformers have emerged as the cornerstone of state-of-the-art natural language processing models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands posed by the self-attention mechanism and the large feedforward network in Transformers limit their ability to handle long sequences, thereby creating challenges for tasks involving multiple long sequences or long-term dependencies. We present a distinct approach, Blockwise Parallel Transformer (BPT), that leverages blockwise computation of self-attention and feedforward network fusion to minimize memory costs. By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences 32 times longer than vanilla Transformers and up to 4 times longer than previous memory-efficient methods. Extensive experiments on language modeling and reinforcement learning tasks demonstrate the effectiveness of BPT in reducing memory requirements and improving performance.

Authors

Hao Liu,Pieter Abbeel

Journal

Advances in Neural Information Processing Systems

Published Date

2024/2/13

Twisting lids off with two hands

Manipulating objects with two multi-fingered hands has been a long-standing challenge in robotics, attributed to the contact-rich nature of many manipulation tasks and the complexity inherent in coordinating a high-dimensional bimanual system. In this work, we consider the problem of twisting lids of various bottle-like objects with two hands, and demonstrate that policies trained in simulation using deep reinforcement learning can be effectively transferred to the real world. With novel engineering insights into physical modeling, real-time perception, and reward design, the policy demonstrates generalization capabilities across a diverse set of unseen objects, showcasing dynamic and dexterous behaviors. Our findings serve as compelling evidence that deep reinforcement learning combined with sim-to-real transfer remains a promising approach for addressing manipulation problems of unprecedented complexity.

Authors

Toru Lin,Zhao-Heng Yin,Haozhi Qi,Pieter Abbeel,Jitendra Malik

Journal

arXiv preprint arXiv:2403.02338

Published Date

2024/3/4

Learning universal policies via text-guided video generation

A goal of artificial intelligence is to construct an agent that can solve a wide variety of tasks. Recent progress in text-guided image synthesis has yielded models with an impressive ability to generate complex novel images, exhibiting combinatorial generalization across domains. Motivated by this success, we investigate whether such tools can be used to construct more general-purpose agents. Specifically, we cast the sequential decision making problem as a text-conditioned video generation problem, where, given a text-encoded specification of a desired goal, a planner synthesizes a set of future frames depicting its planned actions in the future, after which control actions are extracted from the generated video. By leveraging text as the underlying goal specification, we are able to naturally and combinatorially generalize to novel goals. The proposed policy-as-video formulation can further represent environments with different state and action spaces in a unified space of images, which, for example, enables learning and generalization across a variety of robot manipulation tasks. Finally, by leveraging pretrained language embeddings and widely available videos from the internet, the approach enables knowledge transfer through predicting highly realistic video plans for real robots.

Authors

Yilun Du,Mengjiao Yang,Bo Dai,Hanjun Dai,Ofir Nachum,Josh Tenenbaum,Dale Schuurmans,Pieter Abbeel

Journal

NeurIPS

Published Date

2023/1/31

Accelerating reinforcement learning with value-conditional state entropy exploration

A promising technique for exploration is to maximize the entropy of visited state distribution, ie, state entropy, by encouraging uniform coverage of visited state space. While it has been effective for an unsupervised setup, it tends to struggle in a supervised setup with a task reward, where an agent prefers to visit high-value states to exploit the task reward. Such a preference can cause an imbalance between the distributions of high-value states and low-value states, which biases exploration towards low-value state regions as a result of the state entropy increasing when the distribution becomes more uniform. This issue is exacerbated when high-value states are narrowly distributed within the state space, making it difficult for the agent to complete the tasks. In this paper, we present a novel exploration technique that maximizes the value-conditional state entropy, which separately estimates the state entropies that are conditioned on the value estimates of each state, then maximizes their average. By only considering the visited states with similar value estimates for computing the intrinsic bonus, our method prevents the distribution of low-value states from affecting exploration around high-value states, and vice versa. We demonstrate that the proposed alternative to the state entropy baseline significantly accelerates various reinforcement learning algorithms across a variety of tasks within MiniGrid, DeepMind Control Suite, and Meta-World benchmarks. Source code is available at https://sites. google. com/view/rl-vcse.

Authors

Dongyoung Kim,Jinwoo Shin,Pieter Abbeel,Younggyo Seo

Journal

Advances in Neural Information Processing Systems

Published Date

2024/2/13

Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods. Code for this project is provided at: https://github.com/kvfrans/fre

Authors

Kevin Frans,Seohong Park,Pieter Abbeel,Sergey Levine

Journal

arXiv preprint arXiv:2402.17135

Published Date

2024/2/27

Functional Graphical Models: Structure Enables Offline Data-Driven Optimization

While machine learning models are typically trained to solve prediction problems, we might often want to use them for optimization problems. For example, given a dataset of proteins and their corresponding fluorescence levels, we might want to optimize for a new protein with the highest possible fluorescence. This kind of data-driven optimization (DDO) presents a range of challenges beyond those in standard prediction problems, since we need models that successfully predict the performance of new designs that are better than the best designs seen in the training set. It is not clear theoretically when existing approaches can even perform better than the naive approach that simply selects the best design in the dataset. In this paper, we study how structure can enable sample-efficient data-driven optimization. To formalize the notion of structure, we introduce functional graphical models (FGMs) and show theoretically how they can provide for principled data-driven optimization by decomposing the original high-dimensional optimization problem into smaller sub-problems. This allows us to derive much more practical regret bounds for DDO, and the result implies that DDO with FGMs can achieve nearly optimal designs in situations where naive approaches fail due to insufficient coverage of the offline data. We further present a data-driven optimization algorithm that inferes the FGM structure itself, either over the original input variables or a latent variable representation of the inputs.

Authors

Jakub Grudzien Kuba,Masatoshi Uehara,Pieter Abbeel,Sergey Levine

Journal

arXiv preprint arXiv:2401.05442

Published Date

2024/1/8

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control

This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world.The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot's I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps.

Authors

Zhongyu Li,Xue Bin Peng,Pieter Abbeel,Sergey Levine,Glen Berseth,Koushil Sreenath

Journal

arXiv preprint arXiv:2401.16889

Published Date

2024/1/30