Trevor Darrell
University of California, Berkeley
H-index: 161
North America-United States
Description
Trevor Darrell, With an exceptional h-index of 161 and a recent h-index of 116 (since 2020), a distinguished researcher at University of California, Berkeley, specializes in the field of Computer Vision, Artificial Intelligence, AI, Machine Learning, Deep Learning.
His recent articles reflect a diverse array of research interests and contributions to the field:
Real-world humanoid locomotion with reinforcement learning
Diffusion hyperfeatures: Searching through time and space for semantic correspondence
Humanoid Locomotion as Next Token Prediction
Promptonomyvit: Multi-task prompt learning improves video transformers using synthetic scene data
InstanceDiffusion: Instance-level Control for Image Generation
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Neural Network Diffusion
Shape-guided diffusion with inside-outside attention
Professor Information
University | University of California, Berkeley |
---|---|
Position | Professor of Computer Science |
Citations(all) | 242160 |
Citations(since 2020) | 158435 |
Cited By | 144112 |
hIndex(all) | 161 |
hIndex(since 2020) | 116 |
i10Index(all) | 468 |
i10Index(since 2020) | 342 |
University Profile Page | University of California, Berkeley |
Research & Interests List
Computer Vision
Artificial Intelligence
AI
Machine Learning
Deep Learning
Top articles of Trevor Darrell
Real-world humanoid locomotion with reinforcement learning
Humanoid robots that can autonomously operate in diverse environments have the potential to help address labor shortages in factories, assist elderly at home, and colonize new planets. Although classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesized that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in context, without updating its weights. We trained our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation …
Authors
Ilija Radosavovic,Tete Xiao,Bike Zhang,Trevor Darrell,Jitendra Malik,Koushil Sreenath
Journal
Science Robotics
Published Date
2024/4/17
Diffusion hyperfeatures: Searching through time and space for semantic correspondence
Diffusion models have been shown to be capable of generating high-quality images, suggesting that they could contain meaningful internal representations. Unfortunately, the feature maps that encode a diffusion model's internal information are spread not only over layers of the network, but also over diffusion timesteps, making it challenging to extract useful descriptors. We propose Diffusion Hyperfeatures, a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors that can be used for downstream tasks. These descriptors can be extracted for both synthetic and real images using the generation and inversion processes. We evaluate the utility of our Diffusion Hyperfeatures on the task of semantic keypoint correspondence: our method achieves superior performance on the SPair-71k real image benchmark. We also demonstrate that our method is flexible and transferable: our feature aggregation network trained on the inversion features of real image pairs can be used on the generation features of synthetic image pairs with unseen objects and compositions. Our code is available at https://diffusion-hyperfeatures. github. io.
Authors
Grace Luo,Lisa Dunlap,Dong Huk Park,Aleksander Holynski,Trevor Darrell
Journal
Advances in Neural Information Processing Systems
Published Date
2024/2/13
Humanoid Locomotion as Next Token Prediction
We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories. To account for the multi-modal nature of the data, we perform prediction in a modality-aligned way, and for each input token predict the next token from the same modality. This general formulation enables us to leverage data with missing modalities, like video trajectories without actions. We train our model on a collection of simulated trajectories coming from prior neural network policies, model-based controllers, motion capture data, and YouTube videos of humans. We show that our model enables a full-sized humanoid to walk in San Francisco zero-shot. Our model can transfer to the real world even when trained on only 27 hours of walking data, and can generalize to commands not seen during training like walking backward. These findings suggest a promising path toward learning challenging real-world control tasks by generative modeling of sensorimotor trajectories.
Authors
Ilija Radosavovic,Bike Zhang,Baifeng Shi,Jathushan Rajasegaran,Sarthak Kamat,Trevor Darrell,Koushil Sreenath,Jitendra Malik
Journal
arXiv preprint arXiv:2402.19469
Published Date
2024/2/29
Promptonomyvit: Multi-task prompt learning improves video transformers using synthetic scene data
Action recognition models have achieved impressive results by incorporating scene-level annotations, such as objects, their relations, 3D structure, and more. However, obtaining annotations of scene structure for videos requires a significant amount of effort to gather and annotate, making these methods expensive to train. In contrast, synthetic datasets generated by graphics engines provide powerful alternatives for generating scene-level annotations across multiple tasks. In this work, we propose an approach to leverage synthetic scene data for improving video understanding. We present a multi-task prompt learning approach for video transformers, where a shared video transformer backbone is enhanced by a small set of specialized parameters for each task. Specifically, we add a set of" task prompts", each corresponding to a different task, and let each prompt predict task-related annotations. This design allows the model to capture information shared among synthetic scene tasks as well as information shared between synthetic scene tasks and a real video downstream task throughout the entire network. We refer to this approach as" Promptonomy", since the prompts model task-related structure. We propose the PromptonomyViT model (PViT), a video transformer that incorporates various types of scene-level information from synthetic data using the" Promptonomy" approach. PViT shows strong performance improvements on multiple video understanding tasks and datasets. Project page: https://ofir1080. github. io/PromptonomyViT/
Authors
Roei Herzig,Ofir Abramovich,Elad Ben-Avraham,Assaf Arbelle,Leonid Karlinsky,Ariel Shamir,Trevor Darrell,Amir Globerson
Journal
Winter Conference on Applications of Computer Vision (WACV), 2024
Published Date
2022/12/8
InstanceDiffusion: Instance-level Control for Image Generation
Text-to-image diffusion models produce high quality images but do not offer control over individual instances in the image. We introduce InstanceDiffusion that adds precise instance-level control to text-to-image diffusion models. InstanceDiffusion supports free-form language conditions per instance and allows flexible ways to specify instance locations such as simple single points, scribbles, bounding boxes or intricate instance segmentation masks, and combinations thereof. We propose three major changes to text-to-image models that enable precise instance-level control. Our UniFusion block enables instance-level conditions for text-to-image models, the ScaleU block improves image fidelity, and our Multi-instance Sampler improves generations for multiple instances. InstanceDiffusion significantly surpasses specialized state-of-the-art models for each location condition. Notably, on the COCO dataset, we outperform previous state-of-the-art by 20.4% AP for box inputs, and 25.4% IoU for mask inputs.
Authors
Xudong Wang,Trevor Darrell,Sai Saketh Rambhatla,Rohit Girdhar,Ishan Misra
Journal
arXiv preprint arXiv:2402.03290
Published Date
2024/2/5
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems. To advance our understanding and reduce the gap between the capabilities of animals and AI systems, we introduce a dataset of pet egomotion imagery with diverse examples of simultaneous egomotion and multi-agent interaction. Current video datasets separately contain egomotion and interaction examples, but rarely both at the same time. In addition, EgoPet offers a radically distinct perspective from existing egocentric datasets of humans or vehicles. We define two in-domain benchmark tasks that capture animal behavior, and a third benchmark to assess the utility of EgoPet as a pretraining resource to robotic quadruped locomotion, showing that models trained from EgoPet outperform those trained from prior datasets.
Authors
Amir Bar,Arya Bakhtiar,Danny Tran,Antonio Loquercio,Jathushan Rajasegaran,Yann LeCun,Amir Globerson,Trevor Darrell
Journal
arXiv preprint arXiv:2404.09991
Published Date
2024/4/15
Neural Network Diffusion
Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also \textit{generate high-performing neural network parameters}. Our approach is simple, utilizing an autoencoder and a standard latent diffusion model. The autoencoder extracts latent representations of a subset of the trained network parameters. A diffusion model is then trained to synthesize these latent parameter representations from random noise. It then generates new representations that are passed through the autoencoder's decoder, whose outputs are ready to use as new subsets of network parameters. Across various architectures and datasets, our diffusion process consistently generates models of comparable or improved performance over trained networks, with minimal additional cost. Notably, we empirically find that the generated models perform differently with the trained networks. Our results encourage more exploration on the versatile use of diffusion models.
Authors
Kai Wang,Zhaopan Xu,Yukun Zhou,Zelin Zang,Trevor Darrell,Zhuang Liu,Yang You
Journal
arXiv preprint arXiv:2402.13144
Published Date
2024/2/20
Shape-guided diffusion with inside-outside attention
We introduce precise object silhouette as a new form of user control in text-to-image diffusion models, which we dub Shape-Guided Diffusion. Our training-free method uses an Inside-Outside Attention mechanism during the inversion and generation process to apply a shape constraint to the cross-and self-attention maps. Our mechanism designates which spatial region is the object (inside) vs. background (outside) then associates edits to the correct region. We demonstrate the efficacy of our method on the shape-guided editing task, where the model must replace an object according to a text prompt and object mask. We curate a new ShapePrompts benchmark derived from MS-COCO and achieve SOTA results in shape faithfulness without a degradation in text alignment or image realism according to both automatic metrics and annotator ratings. Our data and code will be made available at https://shape-guided-diffusion. github. io.
Authors
Dong Huk Park,Grace Luo,Clayton Toste,Samaneh Azadi,Xihui Liu,Maka Karalashvili,Anna Rohrbach,Trevor Darrell
Published Date
2024
Professor FAQs
What is Trevor Darrell's h-index at University of California, Berkeley?
The h-index of Trevor Darrell has been 116 since 2020 and 161 in total.
What are Trevor Darrell's top articles?
The articles with the titles of
Real-world humanoid locomotion with reinforcement learning
Diffusion hyperfeatures: Searching through time and space for semantic correspondence
Humanoid Locomotion as Next Token Prediction
Promptonomyvit: Multi-task prompt learning improves video transformers using synthetic scene data
InstanceDiffusion: Instance-level Control for Image Generation
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Neural Network Diffusion
Shape-guided diffusion with inside-outside attention
...
are the top articles of Trevor Darrell at University of California, Berkeley.
What are Trevor Darrell's research interests?
The research interests of Trevor Darrell are: Computer Vision, Artificial Intelligence, AI, Machine Learning, Deep Learning
What is Trevor Darrell's total number of citations?
Trevor Darrell has 242,160 citations in total.
What are the co-authors of Trevor Darrell?
The co-authors of Trevor Darrell are Pieter Abbeel, Alex `Sandy' Pentland, Alexei A. Efros, Louis-Philippe Morency, Kate Saenko, Judy Hoffman.