Overview

With the rise of generative AI (GenAI), there has been an increased need for participation by large and diverse user bases in AI evaluation and auditing. GenAI developers are increasingly adopting crowdsourcing approaches to test and audit their AI products and services. However, it remains an open question how to design and deploy responsible and effective crowdsourcing pipelines for AI auditing and evaluation.

This one-day workshop at HCOMP 2024, aims to take a step towards bridging this gap. Our interdisciplinary team of organizers will work with workshop participants to explore several key questions, such as how to improve the output quality and workers' productivity for GenAI evaluation crowdsourcing tasks compared to discriminative AI systems, how to guide crowds in auditing problematic AI-generated content while managing their psychological impact, ensuring marginalized voices are heard, and setting up responsible and effective crowdsourcing pipelines for real-world GenAI evaluation. We hope this workshop will produce a research agenda and best practices for designing responsible crowd-based approaches to AI auditing and evaluation.

Call for Participation

We welcome participants who work on related areas in designing and developing responsible crowdsourcing pipeline for generative AI auditing and evaluation. Interested participants will be asked to contribute a brief statement of interest to the workshop by filling out a workshop application form.

Each submission will be reviewed by 1-2 organizers and accepted based on quality of the submission and diversity of perspectives to allow for a meaningful exchange of knowledge between a broad range of stakeholders.

→ Workshop Application Form (deadline 9/23/24)

Key Information

Submission deadline: Monday, September 23, 2024, 11:59pm AoE ~~Friday, September 06, 2024, 11:59pm AoE~~

Notification of acceptance: Friday, September 27, 2024, 11:59pm AoE ~~Friday, September 13, 2024~~

Workshop date: 10 am - 5 pm, Saturday, October 19, 2024

Workshop location: TBA, somewhere at Pittsburgh!

Agenda (Tentative)

The primary goal of this one-day, in-person workshop is to bring together researchers and AI practitioners from academia, industry, and non-profits to share their ongoing efforts around engaging end users in testing, auditing, and contesting AI systems.

Welcome and Introduction (10:00-10:10am): Organizers will welcome the participants, present the topic, and outline the format of the workshop session.
Lightning talk #1 (10:10-10:20am): on psychological aspect for crowd workers conducting AI audits.
Lightning talk #2 (10:20-10:30am): on diversity in AI audits and evaluation.
Lightning talk #3 (10:30-10:40am): on mechanisms for scaffolding AI audits and evaluation.
Lightning talk #4 (10:40-10:50am): on ecological validity for using crowdsourcing for AI audits and evaluation.
Interactive Q/A (10:50-11:30am): Workshop participants will engage in an interactive Q/A session with the four lightening talk speakers.
Lunch break (11:30-1:00pm): Participants will join different break out lunch groups with organizers.
Focus group activity #1 (1:00-2:00pm): Identify current practices and challenges.
Coffee break (2:00-2:30pm).
Focus group activity #2 (2:30-3:30pm): Explore opportunities and future research agenda.
Closing remarks (3:30-4:00pm).
Social hour (4:00-5:00pm).