How Effective Are Attitudinal Variables at Improving Travel Behavior Models? Evaluation Using an Overlapping Sample From an Attitude-Rich Survey and the 2017 National Household Travel Survey

June 1, 2024

May 31, 2025




Attitude Imputation, Machine Learning, Survey Datasets

Data Modeling and Analytic Tools, Understanding User Needs

Georgia Institute of Technology


Patricia Mokhtarian

A line of research has recently been launched on attitude imputation using machine learning (ML) functions trained on variables common to two survey datasets (Mokhtarian, 2024). It was discovered that using a handful of attitudinal marker variables (i.e., the one or two attitudinal items most strongly associated with each attitude) as common variables for imputation (Shaw, 2021; Soria and Mokhtarian, 2024) far outperforms other approaches such as using socio-demographic and land-use variables (Malokin et al., 2019) and targeted marketing variables (Shaw, 2021). The basic idea is to use one survey dataset (the “donor sample”) to train an ML function that predicts attitudinal factor scores using marker variables, and then apply that function to another dataset (the “recipient sample”) that contains the same marker variables, to impute attitude scores into it. This allows attitudinal information to be attached to the respondents in the recipient sample without measuring the whole set of attitudinal variables used to reveal the attitudinal factor structure in the donor sample.

In this context, “attitudinal items” indicates statements to which respondents react on a 5-point Likert-type scale (“strongly disagree” to “strongly agree”). A few dozen statements are strategically designed and deployed to measure various attitudinal constructs (i.e., factors), which exploratory factor analysis (EFA) applied to the donor sample empirically identifies based on the correlation patterns of responses. The score of each donor-sample respondent on each attitudinal construct is obtained from the EFA solution. Marker variables are one or two variables that are strongly associated with each construct and, thus, play a key role in predicting the associated factor scores. The recipient sample can obtain “predicted” scores on attitudinal factors with the help of an ML function trained to predict attitudinal factor scores with only one or two marker variables per attitudinal construct.

The proposed study employs responses to the attitude-rich 2017 Georgia Department of Transportation (GDOT)-funded Emerging Technologies survey as the donor sample, and the Georgia add-on responses to the 2017 National Household Travel Survey (NHTS) as the recipient sample. However, it further takes advantage of a distinctive feature not found in previous research along these lines. Specifically, N » 1,500 people responded to both surveys. Because those individuals are in the first sample we will have attitudinal factor scores “observed” for them, but because we will remove those cases from the donor sample (leaving N » 1,800) when training the ML function, while treating them as the recipient sample for predicting their attitudes, they will offer an out-of-sample comparison of observed and predicted attitudes. We will then examine the contributions of the predicted attitudes to modeling various travel behavior variables from the NHTS survey, in comparison to the contributions of the “observed” attitude scores or the marker variables themselves. If the results are positive, it will offer important evidence of the value of inserting attitudinal marker variables into a government-administered survey dataset (i.e., a future “recipient sample”) to enhance the performance of travel behavior models based on that sample.

