|
22 | 22 | <div class="hero-layout"> |
23 | 23 | <div class="hero-main"> |
24 | 24 | <div class="eyebrow">Project Website</div> |
25 | | - <h1>Generative AI for Data Management</h1> |
| 25 | + <h1>Generative AI for Multi-modal Data Management</h1> |
26 | 26 | <p class="subtitle"> |
27 | 27 | Generative models as first-class operators in future data systems |
28 | 28 | </p> |
@@ -157,6 +157,8 @@ <h3>Publications</h3> |
157 | 157 | <li><a href="#needle-arxiv-2024">NEEDLE</a></li> |
158 | 158 | <li><a href="#chameleon-vldb-2024">Chameleon</a></li> |
159 | 159 | <li><a href="#needledb-corr-2026">NeedleDB</a></li> |
| 160 | + <h5>Beyong GenAI</h5> |
| 161 | + <li><a href="#crowdsourcing-edbt-2024">Crowdsourcing for Image Data Coverage</a></li> |
160 | 162 | </ol> |
161 | 163 | <p class="toc-note">This list will be updated as project papers are added.</p> |
162 | 164 | </aside> |
@@ -270,6 +272,40 @@ <h3>NeedleDB: A Generative-AI Based System for Accurate and Efficient Image Retr |
270 | 272 | </div> |
271 | 273 | </div> |
272 | 274 | </article> |
| 275 | + |
| 276 | + <article class="paper-card" id="crowdsourcing-edbt-2024"> |
| 277 | + <div class="paper-tag">EDBT 2024</div> |
| 278 | + <h3>Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach</h3> |
| 279 | + <p class="paper-meta"> |
| 280 | + Melika Mousavi, Nima Shahbazi, and Abolfazl Asudeh · <em>International Conference on Extending Database Technology (EDBT)</em>, 2024 |
| 281 | + </p> |
| 282 | + <div class="paper-feature"> |
| 283 | + <figure class="paper-figure"> |
| 284 | + <img src="imgs/crowdsourcing.jpg" alt="Illustration for the data coverage paper on crowdsourcing-based detection of representation bias in image datasets"> |
| 285 | + </figure> |
| 286 | + |
| 287 | + <div class="paper-feature-copy"> |
| 288 | + <p> |
| 289 | + Existing machine learning models often fail on minority groups because the datasets used to train them do not adequately represent those groups. This is especially challenging in social and image datasets, where the relevant protected or demographic attributes may not be explicitly available, making it difficult to determine whether the data sufficiently covers different populations. |
| 290 | + </p> |
| 291 | + <p> |
| 292 | + This paper studies how to identify representation bias in image datasets without relying on explicit attribute values. Using the notion of <em>data coverage</em>, it develops multiple crowdsourcing-based approaches for detecting whether a dataset lacks proper representation for a given group. The core method is a divide-and-conquer algorithm with search-space pruning, designed to efficiently discover coverage gaps while keeping the human labeling effort manageable. |
| 293 | + </p> |
| 294 | + <p> |
| 295 | + Beyond the base algorithm, the work provides a distinct theoretical analysis, including a tight upper bound that establishes near-optimality. It also introduces heuristics that reduce the cost of coverage detection across both intersectional and non-intersectional groups, and it shows that relying only on pre-trained predictors is not sufficient for dependable bias detection in this setting. |
| 296 | + </p> |
| 297 | + <p> |
| 298 | + Finally, the paper extends the framework to make use of existing predictive models when possible, reducing crowdsourcing cost without fully trusting those models as the final answer. Extensive experiments, including live studies on Amazon Mechanical Turk, validate both the problem formulation and the practical performance of the proposed algorithms. |
| 299 | + </p> |
| 300 | + <p class="paper-citation"> |
| 301 | + Citation: Mousavi, Melika, Nima Shahbazi, and Abolfazl Asudeh. 2024. <em>Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach</em>. International Conference on Extending Database Technology (EDBT). |
| 302 | + </p> |
| 303 | + <p class="paper-links"> |
| 304 | + <a href="https://openproceedings.org/2024/conf/edbt/Data_Coverage_EDBT_CRV.pdf" target="_blank" rel="noopener noreferrer">Paper</a> |
| 305 | + </p> |
| 306 | + </div> |
| 307 | + </div> |
| 308 | + </article> |
273 | 309 | </div> |
274 | 310 | </div> |
275 | 311 | </div> |
|
0 commit comments