Exploring Top-Down Visual Attention for Transportation Behavior Analysis

Term Start:

June 1, 2024

Term End:

May 31, 2025

Budget:

$245,046

Keywords:

Artificial Intelligence, Computer Vision, Machine Learning, Safety, Travel Behavior

Thrust Area(s):

Data Modeling and Analytic Tools, Equity and Understanding User Needs

University Lead:

City College of New York

Researcher(s):

Zhigang Zhu

This project stands at the intersection of cognitive psychology, AI and computer vision, and transportation safety and efficiency. By focusing on the nuanced ways in which humans allocate their visual attention, and how this can inform the development of artificial intelligence (AI) and machine learning (ML) to aid in self-driving cars, transportation safety automation, and transportation planning and scheduling in general, this project promises to contribute significantly to the field, ensuring safer, more intuitive driving experiences, and smoother traveling experiences for the traveling public.  

By performing human behavior analysis with visual attention, we aim to develop best practices for safe and efficient interaction of automated roadway vehicles with existing vehicles, roadside hardware, pedestrians, cyclists, and motorcyclists. The advantages of a top-down attention approach include: prioritizing relevance, improving accuracy, enhancing machine learning efficiency, adapting models to scenarios, and enabling better human interaction. By exploring top-down visual attention, we aim to build machine learning models to achieve the following objectives that are coherently connected with each other, where the first two will be the objectives in the base phase of this proposal and the last two would be in a second phase for a follow-on effort: 

(1) Develop human behavior analysis machine learning architectures that allow autonomous driving and other transportation systems to anticipate the attention and reaction patterns of both human drivers and pedestrians, thereby preventing accidents. These include the human behavior analysis of the interaction between a driver and their vehicle, driver and pedestrians,  humans with the existing vehicles and roadside hardware. The ML architectures explored will be CNN for image encoding for improving accuracy and reducing computation, GCN for relation reasoning focusing on human interaction and actions, and transformers for self-attention and feedback. 

(2) Investigate the potential of using visual attention models to improve autonomous and/or automated vehicle navigation and decision-making processes in complex environments. The visual attention mechanisms will be driven by both data and knowledge, including dynamic transportation information, roadside hardware information, location-based information (maps, events, tasks). As a start point, we will leverage the state-of-the-art (SOTA) model  such as Analysis-by-Synthesis Vision Transformer (AbSViT) to encode feature selection, higher-level feedback and top-down input, added on the typical bottom-up process in deep models. 

(3) Develop multimodal human-machine interface dashboards in self-driving cars and vehicle safety automation system, making them more intuitive for human users. These include audio, visual and haptic features as well as accessibility functions that the team has studied for helping the navigation of people who are blind or have low vision. Supported by the AI/ML-based architectures and attention models, the interface as dashboards will also allow developers, engineers and users to access the intelligent transportation systems for interaction, interpretation and diagnosis. 

(4) Furthermore, collaborative opportunities may arise with existing projects, especially in applying the findings to enhance the travel pattern analysis and other safety features of self-driving and/or existing vehicles and pedestrians. Collaboration could involve sharing data, methodologies, and insights to refine autonomous driving technologies’ perception and decision-making capabilities. 

Scroll to Top