publications | Shubham Singh

2025

Under Review

To Bid or Not to Bid: Using Auctions to Understand User Valuation of Digital Accounts

Shubham Singh, Jackie Hu, Cormac Herley, Elissa Redmiles, Siddharth Suri, and Oshrat Ayalon

2025

Under Review

Abs

Most digital service providers offer free accounts to users and then generate revenue by monetizing the data provided by them. However, the way users derive value from their accounts and what factors affect their valuation needs further scrutiny. In this study, we used a novel auction-based methodology to understand users’ valuation of their digital accounts and how this relates to their security decision-making. We conducted a behavioral economics study with n=66 participants at two university campuses to assess their reasons for being willing to provide researchers with their digital accounts’ credentials in exchange for money — hence, compromising security and privacy protection — and what contextual factors governed their decision-making. We found that the main factors influencing participants’ valuation of their accounts were their beliefs about data and privacy, envisioned threats, and the account properties and utility. Finally, we discuss the context-dependent nature of these factors and their implication on future research.
Under Review

Scoring is Not Enough: Addressing Gaps in Fairness-Utility Tradeoff for Ranking

Shubham Singh, Ian A. Kash, and Mesrob I. Ohannessian

2025

Under Review

Abs

Scoring functions are used to represent the relevance of individual documents. In modern information retrieval or recommendation systems, they are often learned from data and play a pivotal role in ranking sets of documents or items in a way that maximizes utility to a query or user. With the recent interest in algorithmic fairness, the success of scoring has naturally led to methods that learn scores that simultaneously trade off fairness and utility. In this work, we show that in stark contrast with utility-centric objectives, scoring is sub-optimal in achieving all utility-fairness trade-offs. We establish this with a series of counter-examples with a generic fairness formulation. We show that the issue persists whether we have a deterministic scoring function or a randomized one, or whether we measure fairness at the scope of a single query or across multiple queries. On the positive side, we empirically demonstrate that semi-greedy post-processing has the potential to achieve much better trade-offs, often approaching the ideal of exhaustive post-processing in a computationally tractable way.
Preprint

Effect of Resources on the Efficiency-Fairness Tradeoff for Allocation Problems

Shubham Singh, Chris Kanich, and Ian A. Kash

2025

Under Review

Abs PDF

Efficiency and fairness are conflicting objectives for resource allocation problems. Although well-studied, its ubiquity in numerous applications keeps presenting researchers with new challenges. In this work, we study the relationship of resources with efficiency and fairness and how it affects their tradeoff. We motivate the effect of resources on group utilities and the evaluation measures through the framework of homogeneous functions. Then, we explore the tradeoff together with three decision choices — allocating resources based on needs vs. outcomes, measuring social welfare as maximizing vs. minimizing objectives, and expressing fairness in terms of absolute vs. relative measures. Each choice presents us with a rich, complex setting and different desirable solutions. We illustrate our findings through stylized examples and then a more realistic example based on restaurant inspections
Oxford Handbook
Evaluating the Social Impact of Generative AI Systems

Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan K. Baker, Su Lin Blodgett, Canyu Chen, III Daumé, Jesse Dodge, Isabella Duan, Ellie Evans, Felix Friedrich, Avijit Ghosh, Usman Gohar, Sara Hooker, Yacine Jernite, Pratyusha Ria Kalluri, Alina Leidinger, Alberto Lusoli, Michelle Lin, Xiuzhu Lin, Sasha Luccioni, Jennifer Mickel, Margaret Mitchell, Jessica Newman, Anaelia Ovalle, Marie-Therese Png, Shubham Singh, Andrew Strait, Lukas Struppek, Arjun Subramonian, and Apostol Vassilev

In The Oxford Handbook of the Foundations and Regulation of Generative AI, 2025

Abs arXiv Bib HTML PDF

Generative artificial intelligence (AI) systems across modalities, ranging from text, code, image, audio, and video, have broad social impacts, but there is little agreement on which impacts to evaluate or how to evaluate them. In this chapter, we present a guide for evaluating base generative AI systems (i.e. systems without predetermined applications or deployment contexts). We propose a framework of two overarching categories: what can be evaluated in a system independent of context and what requires societal context. For the former, we define seven areas of interest: stereotypes and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. For the latter, we present five areas: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. For each, we present methods for evaluations and the limitations presented by such methods.
@incollection{solaiman2024evaluatingsocialimpactgenerative, author = {Solaiman, Irene and Talat, Zeerak and Agnew, William and Ahmad, Lama and Baker, Dylan K. and Blodgett, Su Lin and Chen, Canyu and Daumé, Hal, III and Dodge, Jesse and Duan, Isabella and Evans, Ellie and Friedrich, Felix and Ghosh, Avijit and Gohar, Usman and Hooker, Sara and Jernite, Yacine and Kalluri, Pratyusha Ria and Leidinger, Alina and Lusoli, Alberto and Lin, Michelle and Lin, Xiuzhu and Luccioni, Sasha and Mickel, Jennifer and Mitchell, Margaret and Newman, Jessica and Ovalle, Anaelia and Png, Marie-Therese and Singh, Shubham and Strait, Andrew and Struppek, Lukas and Subramonian, Arjun and Vassilev, Apostol}, year = {2025}, isbn = {9780198940272}, title = {Evaluating the Social Impact of Generative AI Systems}, booktitle = {The Oxford Handbook of the Foundations and Regulation of Generative AI}, publisher = {Oxford University Press}, }

2023

GenLaw
AI and the EU Digital Markets Act: Addressing the Risks of Bigness in Generative AI

Ayse Gizem Yasar, Andrew Chong, Evan Dong, Thomas Krendl Gilbert, Sarah Hladikova, Roland Maio, Carlos Mougan, Xudong Shen, Shubham Singh, Ana-Andreea Stoica, Savannah Thais, and Miri Zilka

In Workshop on Generative AI + Law 2023 (GenLaw ’23), Co-located with the 40th International Conference on Machine Learning (ICML 2023), Jul 23, 2023, Honolulu, HI, USA, Jul 2023

Abs Bib HTML PDF

As AI technology advances rapidly, concerns over the risks of bigness in digital markets are also growing. The EU’s Digital Markets Act (DMA) aims to address these risks. Still, the current framework may not adequately cover generative AI systems that could become gateways for AI-based services. This paper argues for integrating certain AI software as “core platform services” and classifying certain developers as gatekeepers under the DMA. We also propose an assessment of gatekeeper obligations to ensure they cover generative AI services. As the EU considers generative AI-specific rules and possible DMA amendments, this paper provides insights towards diversity and openness in generative AI services.
@inproceedings{yasar-genlaw2023-dma, title = {{AI and the EU Digital Markets Act: Addressing the Risks of Bigness in Generative AI}}, author = {Yasar, Ayse Gizem and Chong, Andrew and Dong, Evan and Gilbert, Thomas Krendl and Hladikova, Sarah and Maio, Roland and Mougan, Carlos and Shen, Xudong and Singh, Shubham and Stoica, Ana-Andreea and Thais, Savannah and Zilka, Miri}, year = {2023}, month = jul, booktitle = {Workshop on Generative AI + Law 2023 (GenLaw '23), Co-located with the 40th International Conference on Machine Learning (ICML 2023), Jul 23, 2023, Honolulu, HI, USA}, }
ConPro
Tracking, But Make It Offline: The Privacy Implications of Scanning QR Codes Found in the World

Rayaan Siddiqi^*, Shubham Singh^*, Lenore Zuck, and Chris Kanich

In 7th Workshop on Technology and Consumer Protection (ConPro ’23), Co-located with the 44th IEEE Symposium on Security and Privacy, May 25, 2023, San Francisco, CA, USA, May 2023

Abs Bib HTML PDF

QR Codes have become a pervasive mechanism for encoding machine-readable digital data in the offline world. As the Internet age has taught us, mechanisms that become pervasive very often engender privacy concerns regarding their use. As such, here we conduct an investigation of the privacy implications of the QR Code ecosystem as it exists today. We find that there are several shortener services with substantial popularity, and investigate the extent to which these shortener services conduct various types of tracking of individuals who interact with the created QR Codes. Additionally, we collect 948 QR codes posted within the world, and evaluate them for various types of tracking as well. Overall, we find no evidence that QR codes are a substantial or unique privacy threat when compared to other link sharing mechanisms available online. Even so, the theoretical potential for surreptitious tracking exists, and more in depth study of the QR Code ecosystem will allow for deeper investigation of the relationship between online and offline tracking.
@inproceedings{conpro-2023-qrcode, title = {Tracking, But Make It Offline: The Privacy Implications of Scanning QR Codes Found in the World}, author = {Siddiqi, Rayaan and Singh, Shubham and Zuck, Lenore and Kanich, Chris}, year = {2023}, month = may, booktitle = {7th Workshop on Technology and Consumer Protection (ConPro ’23), Co-located with the 44th IEEE Symposium on Security and Privacy, May 25, 2023, San Francisco, CA, USA}, }

2022

EAAMO
Fair Decision-Making for Food Inspections

Shubham Singh, Bhuvni Shah, Chris Kanich, and Ian A. Kash

In Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), Arlington, VA, USA, Oct 2022

Abs DOI Bib HTML PDF

We revisit the application of predictive models by the Chicago Department of Public Health to schedule restaurant inspections and prioritize the detection of critical food code violations. We perform the first analysis of the model’s fairness to the population served by the restaurants in terms of average time to find a critical violation. We find that the model treats inspections unequally based on the sanitarian who conducted the inspection and that, in turn, there are geographic disparities in the benefits of the model. We examine four alternate methods of model training and two alternative ways of scheduling using the model and find that the latter generate more desirable results. The challenges from this application point to important directions for future work around fairness with collective entities rather than individuals, the use of critical violations as a proxy, and the disconnect between fair classification and fairness in dynamic scheduling systems.
@inproceedings{singh-fairfood-eaamo2022, title = {Fair Decision-Making for Food Inspections}, author = {Singh, Shubham and Shah, Bhuvni and Kanich, Chris and Kash, Ian A.}, year = {2022}, month = oct, booktitle = {Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO)}, location = {Arlington, VA, USA}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, series = {EAAMO '22}, doi = {10.1145/3551624.3555289}, isbn = {9781450394772}, url = {https://doi.acm.org?doi=3551624.3555289}, articleno = {5}, numpages = {11}, keywords = {food inspections, fairness, scheduling}, }
RDMDE
Open Problems in (Un)fairness of the Retail Food Safety Inspection Process

Tanya Berger-Wolf, Allison Howell, Chris Kanich, Ian A. Kash, Moniba Keymanesh, Barbara Kowalcyk, Gina Nicholson Kramer, Andrew Perrault, and Shubham Singh

In Responsible Decision Making in Dynamic Environments Workshop, Held in Conjunction with ICML 2022, July 23, 2022, Baltimore MD, USA, Jul 2022

Abs Bib HTML PDF Video

The inspection of retail food establishments is an essential public health intervention. We discuss existing work on roles AI techniques can play in food inspections and resulting fairness and interpretability challenges. We also examine open problems stemming from the complex and dynamic nature of the inspections.
@inproceedings{rdmde-2022-openproblems, title = {Open Problems in (Un)fairness of the Retail Food Safety Inspection Process}, author = {Berger-Wolf, Tanya and Howell, Allison and Kanich, Chris and Kash, Ian A. and Keymanesh, Moniba and Kowalcyk, Barbara and Kramer, Gina Nicholson and Perrault, Andrew and Singh, Shubham}, year = {2022}, month = jul, booktitle = {Responsible Decision Making in Dynamic Environments Workshop, Held in Conjunction with {ICML} 2022, July 23, 2022, Baltimore MD, USA}, }

2021

USENIX
Helping Users Automatically Find and Manage Sensitive, Expendable Files in Cloud Storage

Mohammad Taha Khan, Christopher Tran, Shubham Singh, Dimitri Vasilkov, Chris Kanich, Blase Ur, and Elena Zheleva

In 30th USENIX Security Symposium (USENIX Security 21), Aug 2021

Abs Bib HTML PDF

With the ubiquity of data breaches, forgotten-about files stored in the cloud create latent privacy risks. We take a holistic approach to help users identify sensitive, unwanted files in cloud storage. We first conducted 17 qualitative interviews to characterize factors that make humans perceive a file as sensitive, useful, and worthy of either protection or deletion. Building on our findings, we conducted a primarily quantitative online study. We showed 108 long-term users of Google Drive or Dropbox a selection of files from their accounts. They labeled and explained these files’ sensitivity, usefulness, and desired management (whether they wanted to keep, delete, or protect them). For each file, we collected many metadata and content features, building a training dataset of 3,525 labeled files. We then built Aletheia, which predicts a file’s perceived sensitivity and usefulness, as well as its desired management. Aletheia improves over state-of-the-art baselines by 26% to 159%, predicting users’ desired file-management decisions with 79% accuracy. Notably, predicting subjective perceptions of usefulness and sensitivity led to a 10% absolute accuracy improvement in predicting desired file-management decisions. Aletheia’s performance validates a human-centric approach to feature selection when using inference techniques on subjective security-related tasks. It also improves upon the state of the art in minimizing the attack surface of cloud accounts.
@inproceedings{taha_alethia_2021, author = {Khan, Mohammad Taha and Tran, Christopher and Singh, Shubham and Vasilkov, Dimitri and Kanich, Chris and Ur, Blase and Zheleva, Elena}, title = {Helping Users Automatically Find and Manage Sensitive, Expendable Files in Cloud Storage}, booktitle = {30th USENIX Security Symposium (USENIX Security 21)}, year = {2021}, isbn = {978-1-939133-24-3}, pages = {1145--1162}, url = {https://www.usenix.org/conference/usenixsecurity21/presentation/khan-mohammad}, publisher = {{USENIX} Association}, month = aug, }

2020

NetSci-X
NeXLink: Node Embedding Framework for Cross-Network Linkages Across Social Networks

Rishabh Kaushal, Shubham Singh, and Ponnurangam Kumaraguru

In Proceedings of NetSci-X 2020: Sixth International Winter School and Conference on Network Science, Aug 2020

Abs Bib HTML PDF

Users create accounts on multiple social networks to get connected to their friends across these networks. We refer to these user accounts as user identities. Since users join multiple social networks, therefore, there will be cases where a pair of user identities across two different social networks belong to the same individual. We refer to such pairs as Cross-Network Linkages (CNLs). In this work, we model the social network as a graph to explore the question, whether we can obtain effective social network graph representation such that node embeddings of users belonging to CNLs are closer in embedding space than other nodes, using only the network information. To this end, we propose a modular and flexible node embedding framework, referred to as NeXLink, which comprises of three steps. First, we obtain local node embeddings by preserving the local structure of nodes within the same social network. Second, we learn the global node embeddings by preserving the global structure, which is present in the form of common friendship exhibited by nodes involved in CNLs across social networks. Third, we combine the local and global node embeddings, which preserve local and global structures to facilitate the detection of CNLs across social networks. We evaluate our proposed framework on an augmented (synthetically generated) dataset of 63,713 nodes & 817,090 edges and real-world dataset of 3338 Twitter-Foursquare node pairs. Our approach achieves an average Hit@1 rate of 98% for detecting CNLs across social networks and significantly outperforms previous state-of-the-art methods.
@inproceedings{kaushal_nexlink_2020, author = {Kaushal, Rishabh and Singh, Shubham and Kumaraguru, Ponnurangam}, title = {NeXLink: Node Embedding Framework for Cross-Network Linkages Across Social Networks}, booktitle = {Proceedings of NetSci-X 2020: Sixth International Winter School and Conference on Network Science}, year = {2020}, publisher = {Springer International Publishing}, address = {Cham}, pages = {61--75}, isbn = {978-3-030-38965-9}, }

2019

ACM SAC
KidsGUARD: Fine Grained Approach for Child Unsafe Video Representation and Detection

Shubham Singh, Rishabh Kaushal, Arun Balaji Buduru, and Ponnurangam Kumaraguru

In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, Limassol, Cyprus, Aug 2019

Abs DOI Bib HTML PDF

Increasingly more and more videos are being uploaded on video sharing platforms, and a significant number of viewers on these platforms are children. At times, these videos have violent or sexually explicit scenes (referred as child unsafe) to catch children’s attention. To evade moderation, malicious video uploaders typically limit the child unsafe content to only a few frames in the video. Hence, a fine-grained approach, referred as KidsGUARD1, to detect sparsely present child unsafe content is required. Prior approaches to content moderation either flag the entire video as inappropriate or use hand-crafted features derived from video frames. In this work, we leverage Long Short Term Memory (LSTM) based autoencoder to learn effective video representations of video descriptors obtained from using VGG16 Convolutional Neural Network (CNN). Encoded video representations are fed into LSTM classifier for detection of sparse child unsafe video content. To evaluate this approach, we create a dataset of 109,835 video clips curated specifically for child unsafe content. We find that deep learning approach (1) detects fine-grained child unsafe video content with the granularity of 1 second, (2) identifies even sparsely location child unsafe video content by achieving a high recall of 81% at high precision of 80%, and (3) outperforms baseline video encoding approaches based on like Fisher Vector (FV) and Vector of Locally Aggregated Descriptors (VLAD).
@inproceedings{singh_kidsguard_2019, author = {Singh, Shubham and Kaushal, Rishabh and Buduru, Arun Balaji and Kumaraguru, Ponnurangam}, title = {KidsGUARD: Fine Grained Approach for Child Unsafe Video Representation and Detection}, year = {2019}, isbn = {9781450359337}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3297280.3297487}, doi = {10.1145/3297280.3297487}, booktitle = {Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing}, pages = {2104–2111}, numpages = {8}, keywords = {child safety, video analysis, social media analysis}, location = {Limassol, Cyprus}, series = {SAC ’19}, }