Data-driven predictions of summertime visits to lakes across 17 US states


Using a dataset of more than 51,000 US lakes, we estimated the relationship between summertime lake visits, lake water clarity, landscape features, and other amenities, where visits were estimated with counts of geo-located photographs. Given the size and complexity of our dataset, we used a combination of machine learning techniques, imputation techniques, and a Poisson count model to estimate these relationships. We found that every additional meter of average summertime Secchi depth was associated with at least 7% more summertime lake visits, all else equal. Second, we found that lake amenities, such as beaches, boat launches, and public toilets, were more powerful predictors of visits than water clarity. Third, we found that visits to a lake were strongly influenced by the lake’s accessibility and its distance to nearby lakes and the amenities the nearby lakes offered. Our research highlights the need for (1) a better understanding of how representative social media data are of actual recreational behavior, (2) the development of best practices to account for nonrandom patterns in missing natural feature data, and (3) a better understanding of the potential endogeneity in the lake visit–water quality relationships.