Skip to content

Commit d0a9761

Browse files
committed
--
1 parent 76e2e41 commit d0a9761

3 files changed

Lines changed: 38 additions & 2 deletions

File tree

genai4dm/imgs/crowdsourcing.jpg

219 KB
Loading

genai4dm/index.html

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
<div class="hero-layout">
2323
<div class="hero-main">
2424
<div class="eyebrow">Project Website</div>
25-
<h1>Generative AI for Data Management</h1>
25+
<h1>Generative AI for Multi-modal Data Management</h1>
2626
<p class="subtitle">
2727
Generative models as first-class operators in future data systems
2828
</p>
@@ -157,6 +157,8 @@ <h3>Publications</h3>
157157
<li><a href="#needle-arxiv-2024">NEEDLE</a></li>
158158
<li><a href="#chameleon-vldb-2024">Chameleon</a></li>
159159
<li><a href="#needledb-corr-2026">NeedleDB</a></li>
160+
<h5>Beyong GenAI</h5>
161+
<li><a href="#crowdsourcing-edbt-2024">Crowdsourcing for Image Data Coverage</a></li>
160162
</ol>
161163
<p class="toc-note">This list will be updated as project papers are added.</p>
162164
</aside>
@@ -270,6 +272,40 @@ <h3>NeedleDB: A Generative-AI Based System for Accurate and Efficient Image Retr
270272
</div>
271273
</div>
272274
</article>
275+
276+
<article class="paper-card" id="crowdsourcing-edbt-2024">
277+
<div class="paper-tag">EDBT 2024</div>
278+
<h3>Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach</h3>
279+
<p class="paper-meta">
280+
Melika Mousavi, Nima Shahbazi, and Abolfazl Asudeh · <em>International Conference on Extending Database Technology (EDBT)</em>, 2024
281+
</p>
282+
<div class="paper-feature">
283+
<figure class="paper-figure">
284+
<img src="imgs/crowdsourcing.jpg" alt="Illustration for the data coverage paper on crowdsourcing-based detection of representation bias in image datasets">
285+
</figure>
286+
287+
<div class="paper-feature-copy">
288+
<p>
289+
Existing machine learning models often fail on minority groups because the datasets used to train them do not adequately represent those groups. This is especially challenging in social and image datasets, where the relevant protected or demographic attributes may not be explicitly available, making it difficult to determine whether the data sufficiently covers different populations.
290+
</p>
291+
<p>
292+
This paper studies how to identify representation bias in image datasets without relying on explicit attribute values. Using the notion of <em>data coverage</em>, it develops multiple crowdsourcing-based approaches for detecting whether a dataset lacks proper representation for a given group. The core method is a divide-and-conquer algorithm with search-space pruning, designed to efficiently discover coverage gaps while keeping the human labeling effort manageable.
293+
</p>
294+
<p>
295+
Beyond the base algorithm, the work provides a distinct theoretical analysis, including a tight upper bound that establishes near-optimality. It also introduces heuristics that reduce the cost of coverage detection across both intersectional and non-intersectional groups, and it shows that relying only on pre-trained predictors is not sufficient for dependable bias detection in this setting.
296+
</p>
297+
<p>
298+
Finally, the paper extends the framework to make use of existing predictive models when possible, reducing crowdsourcing cost without fully trusting those models as the final answer. Extensive experiments, including live studies on Amazon Mechanical Turk, validate both the problem formulation and the practical performance of the proposed algorithms.
299+
</p>
300+
<p class="paper-citation">
301+
Citation: Mousavi, Melika, Nima Shahbazi, and Abolfazl Asudeh. 2024. <em>Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach</em>. International Conference on Extending Database Technology (EDBT).
302+
</p>
303+
<p class="paper-links">
304+
<a href="https://openproceedings.org/2024/conf/edbt/Data_Coverage_EDBT_CRV.pdf" target="_blank" rel="noopener noreferrer">Paper</a>
305+
</p>
306+
</div>
307+
</div>
308+
</article>
273309
</div>
274310
</div>
275311
</div>

js/indexstart.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ st+= ' <div class="dropdown-menu" aria-labelledby="fundedProjectsDropdown
2121
st+= ' <a class="dropdown-item" target="_blank" href="'+path+'nsf-iis-2348919/index.html">NSF IIS-2348919 | Fairness-aware Data Structures</a>';
2222
st+= ' <a class="dropdown-item" target="_blank" href="'+path+'efficient-llm-inference/index.html">Efficient LLM Inference</a>';
2323
st+= ' <a class="dropdown-item" target="_blank" href="'+path+'trustworthy-llms-and-llm-agents/index.html">Trustworthy LLMs and LLM Agents</a>';
24-
st+= ' <a class="dropdown-item" target="_blank" href="'+path+'genai4dm/index.html">Generative AI for Data Management</a>';
24+
st+= ' <a class="dropdown-item" target="_blank" href="'+path+'genai4dm/index.html">Generative AI for Multimodal Data Management</a>';
2525
st+= ' </div>';
2626
st += ' </li> ';
2727
//st += ' <li class="nav-item active"> ';

0 commit comments

Comments
 (0)