@@ -216,7 +216,8 @@ <h2 class="title is-3 mt-3">Abstract</h2>
216216 < h2 class ="title is-2 "> Model Architecture</ h2 >
217217 < img class ="mt-4 " src ="./static/images/pipeline.svg ">
218218 < div class ="content has-text-justified my-4 ">
219- < b > Overview of the proposed method</ b > : < b > (a)</ b > Given an object point cloud \(P\), our network
219+ < b > Overview of the proposed method</ b > : < b > (a)</ b > Given an object point cloud \(P\), our
220+ network
220221 encodes
221222 geometric features into dense feature maps. Next,
222223 randomly initialized dual-arm grasps \(H\) are used to transform a fixed query cloud into query
@@ -246,7 +247,8 @@ <h4 class="title is-4 has-text-centered">\(SE(3) \times SE(3) \longleftrightarro
246247 </ div >
247248
248249 < div class ="column is-full-width ">
249- < b > Denoising in the dual-arm grasp space:</ b > Additionally, dual-arm grasp poses are represented as pairs of rigid-body transformations
250+ < b > Denoising in the dual-arm grasp space:</ b > Additionally, dual-arm grasp poses are represented as
251+ pairs of rigid-body transformations
250252 in \(SE(3) \times SE(3)\), which are mapped into a \(12\text{D}\) Euclidean space for diffusion and
251253 back.
252254 Each \(SE(3)\) element is
@@ -267,54 +269,56 @@ <h4 class="title is-4 has-text-centered">\(SE(3) \times SE(3) \longleftrightarro
267269 < h4 class ="title is-4 has-text-centered "> Denoising using Classifier Guidance</ h4 >
268270
269271 < div class ="columns has-text-centered ">
270- < div class ="column is-full-width mt-2 ">
271- < video autoplay loop muted poster ="" preload ="none " style ="width:100%; ">
272- < source src ="./static/videos/only_graph_cropped3.mp4 ">
273- </ video >
274- </ div >
272+ < div class ="column is-full-width mt-2 ">
273+ < video autoplay loop muted poster ="" preload ="none " style ="width:100%; ">
274+ < source src ="./static/videos/only_graph_cropped3.mp4 ">
275+ </ video >
276+ </ div >
275277 </ div >
276-
278+
277279 <!-- Colormap bar -->
278280 < div class ="columns has-text-centered my-5 ">
279- < div class ="column is-full-width ">
280- < div style ="
281+ < div class ="column is-full-width ">
282+ < div style ="
281283 background: linear-gradient(to right, rgb(255, 85, 85), rgb(63, 255, 63));
282284 height: 12px;
283285 border-radius: 30px;
284286 margin: 0 auto;
285287 width: 70%;
286288 position: relative; ">
289+ </ div >
290+ < div
291+ style ="display: flex; justify-content: space-between; width: 70%; margin: 5px auto 0 auto; font-size: 0.9rem; ">
292+ < span style ="color: rgb(182, 1, 1); font-weight: 500; "> Noisy Grasp Pairs</ span >
293+ < span style ="color: rgb(45, 150, 45); font-weight: 500; "> Stable Grasp Pairs</ span >
294+ </ div >
287295 </ div >
288- < div style ="display: flex; justify-content: space-between; width: 70%; margin: 5px auto 0 auto; font-size: 0.9rem; ">
289- < span style ="color: rgb(182, 1, 1); font-weight: 500; "> Noisy Grasp Pairs</ span >
290- < span style ="color: rgb(45, 150, 45); font-weight: 500; "> Stable Grasp Pairs</ span >
291- </ div >
292- </ div >
293296 </ div >
294-
297+
295298 < div class ="content has-text-justified my-4 ">
296- < b > Overview of the denoising process:</ b > The above clip shows the joint denoising process step by step. As the
297- time progresses, the < span style ="color: rgb(182, 1, 1); "> Energy \((E_\alpha)\)</ span > gradually
298- decreases, which means grasps are moving towards the object and
299- not just floating in free space. At the same time, the < span
300- style ="color:rgb(11, 33, 158) "> Force-Closure Probability \((C_{\beta}^{\text{fc}})\)</ span > steadily
301- increases,
302- highlighting how the grasp becomes more stable and reliable over time. Finally, in the later stages of
303- denoising, colliding grasps are
304- refined for a small number of iterations using < span style ="color:rgb(45, 150, 45) "> Collision Classifier
305- \((C_{\gamma}^{\text{col}})\)</ span > , resulting in dual-arm grasps that are force-closure stable as
306- well as collision-free.
299+ < b > Overview of the denoising process:</ b > The above clip shows the joint denoising process step by step.
300+ As the
301+ time progresses, the < span style ="color: rgb(182, 1, 1); "> Energy \((E_\alpha)\)</ span > gradually
302+ decreases, which means grasps are moving towards the object and
303+ not just floating in free space. At the same time, the < span
304+ style ="color:rgb(11, 33, 158) "> Force-Closure Probability \((C_{\beta}^{\text{fc}})\)</ span > steadily
305+ increases,
306+ highlighting how the grasp becomes more stable and reliable over time. Finally, in the later stages of
307+ denoising, colliding grasps are
308+ refined for a small number of iterations using < span style ="color:rgb(45, 150, 45) "> Collision Classifier
309+ \((C_{\gamma}^{\text{col}})\)</ span > , resulting in dual-arm grasps that are force-closure stable as
310+ well as collision-free.
307311 </ div >
308-
309-
312+
313+
310314 </ div >
311315 </ section >
312316
313317 < section class ="section " style ="background-color: rgb(252, 252, 252); ">
314318 < div class ="container is-max-desktop ">
315319 < div class ="columns has-text-centered ">
316320 < div class ="column is-full-width ">
317- < h2 class ="title is-3 ">
321+ < h2 class ="title is-2 ">
318322 Real Life Results < sup style ="font-size: 15px; "> †</ sup >
319323 </ h2 >
320324
@@ -370,6 +374,158 @@ <h4 class="title is-5">(e) Saucepan</h4>
370374 </ div >
371375 </ section >
372376
377+ < section class ="section " style ="background-color:#fff; ">
378+ < div class ="container is-max-desktop ">
379+ < h2 class ="title is-2 has-text-centered "> Quantitative Results</ h2 >
380+ < p class ="has-text-centered is-size-6 mb-4 ">
381+ Comparison on our evaluation set (< span class ="icon "> < i class ="fas fa-arrow-up "> </ i > </ span > higher is
382+ better, < span class ="icon "> < i class ="fas fa-arrow-down "> </ i > </ span > lower is better).
383+ </ p >
384+
385+ < h3 class ="title is-5 mt-5 pt-5 "> 1. Comparison with Baselines</ h3 >
386+
387+ <!-- Main comparison -->
388+ < div class ="table-container ">
389+ < table class ="table is-striped is-hoverable is-fullwidth ">
390+ < thead >
391+ < tr >
392+ < th > Method</ th >
393+ < th class ="has-text-right "> FCE (%) < span class ="icon "> < i
394+ class ="fas fas-solid fa-arrow-up "> </ i > </ span > </ th >
395+ < th class ="has-text-right "> GSR (%) < span class ="icon "> < i
396+ class ="fas fa-arrow-up "> </ i > </ span > </ th >
397+ < th class ="has-text-right "> GCR (%) < span class ="icon "> < i
398+ class ="fas fa-arrow-down "> </ i > </ span > </ th >
399+ </ tr >
400+ </ thead >
401+ < tbody >
402+ < tr class ="">
403+ < td > < b > DAGDiff (ours)</ b > </ td >
404+ < td class ="has-text-right "> < b > 60.14</ b > </ td >
405+ < td class ="has-text-right "> < b > 72.50</ b > </ td >
406+ < td class ="has-text-right "> < b > 15.10</ b > </ td >
407+ </ tr >
408+ < tr >
409+ < td > CGDF</ td >
410+ < td class ="has-text-right "> 35.14</ td >
411+ < td class ="has-text-right "> 56.25</ td >
412+ < td class ="has-text-right "> 30.55</ td >
413+ </ tr >
414+ < tr >
415+ < td > VCGS</ td >
416+ < td class ="has-text-right "> 16.85</ td >
417+ < td class ="has-text-right "> 23.36</ td >
418+ < td class ="has-text-right "> 74.73</ td >
419+ </ tr >
420+ < tr >
421+ < td > UniDiffGrasp</ td >
422+ < td class ="has-text-right "> 10.10</ td >
423+ < td class ="has-text-right "> 31.68</ td >
424+ < td class ="has-text-right "> 59.90</ td >
425+ </ tr >
426+ < tr >
427+ < td > RoboBrainGrasp-KP</ td >
428+ < td class ="has-text-right "> 9.80</ td >
429+ < td class ="has-text-right "> 27.85</ td >
430+ < td class ="has-text-right "> 66.30</ td >
431+ </ tr >
432+ < tr >
433+ < td > RoboBrainGrasp-BB</ td >
434+ < td class ="has-text-right "> 7.12</ td >
435+ < td class ="has-text-right "> 27.81</ td >
436+ < td class ="has-text-right "> 70.26</ td >
437+ </ tr >
438+ </ tbody >
439+ </ table >
440+ </ div >
441+
442+ <!-- Dual-Afford (zero-shot) block -->
443+ < h3 class ="title is-5 mt-5 pt-5 "> 2. Zero-Shot on Dual-Afford Objects< sup > †</ sup > </ h3 >
444+ < div class ="table-container ">
445+ < table class ="table is-narrow is-striped is-fullwidth is-hoverable ">
446+ < thead >
447+ < tr >
448+ < th > Method</ th >
449+ < th class ="has-text-right "> FCE (%) < span class ="icon is-small "> < i
450+ class ="fas fa-arrow-up "> </ i > </ span > </ th >
451+ < th class ="has-text-right "> GSR (%) < span class ="icon is-small "> < i
452+ class ="fas fa-arrow-up "> </ i > </ span > </ th >
453+ < th class ="has-text-right "> GCR (%) < span class ="icon is-small "> < i
454+ class ="fas fa-arrow-down "> </ i > </ span > </ th >
455+ </ tr >
456+ </ thead >
457+ < tbody >
458+ < tr >
459+ < td > < b > Ours-DA</ b > < sup > †</ sup > </ td >
460+ < td class ="has-text-right "> < b > 56.45</ b > </ td >
461+ < td class ="has-text-right "> < b > 68.80</ b > </ td >
462+ < td class ="has-text-right "> < b > 18.59</ b > </ td >
463+ </ tr >
464+ < tr >
465+ < td > Dual-Afford< sup > ††</ sup > </ td >
466+ < td class ="has-text-right "> –</ td >
467+ < td class ="has-text-right "> 54.33</ td >
468+ < td class ="has-text-right "> –</ td >
469+ </ tr >
470+ </ tbody >
471+ </ table >
472+ </ div >
473+
474+
475+ < p class ="is-size-7 has-text-grey mt-2 ">
476+ < sup > †</ sup > Evaluated on Dual-Afford objects in a zero-shot setting. < br >
477+ < sup > ††</ sup > Values reported directly from the Dual-Afford paper.
478+ </ p >
479+
480+
481+ < h3 class ="title is-5 mt-5 pt-5 "> 3. Real-World Dual-Arm Grasp Results</ h3 >
482+
483+ < div class ="table-container ">
484+ < table class ="table is-striped is-hoverable is-fullwidth has-text-centered ">
485+ < thead >
486+ < tr >
487+ < th > Object</ th >
488+ < th > Tray</ th >
489+ < th > Bucket</ th >
490+ < th > Saucepan</ th >
491+ < th > Frypan</ th >
492+ < th > Drone</ th >
493+ </ tr >
494+ </ thead >
495+ < tbody >
496+ < tr >
497+ < th > Success</ th >
498+ < td > 6/10</ td >
499+ < td > 8/10</ td >
500+ < td > 7/10</ td >
501+ < td > 6/10</ td >
502+ < td > 5/10</ td >
503+ </ tr >
504+ </ tbody >
505+ </ table >
506+ </ div >
507+
508+ < div class ="content has-text-justified my-4 ">
509+ < b > Quantitative Results:</ b > DAGDiff consistently outperforms across all evaluation settings. It outperforms prior
510+ methods in < u > Force-Closure Evaluation (FCE)</ u > and < u > Grasp Success Rate (GSR)</ u > while maintaining the lowest < u > Grasp
511+ Collision Rate (GCR)</ u > , indicating more physically valid and robust dual-arm grasps. In zero-shot transfer
512+ to Dual-Afford objects, DAGDiff continues to show strong generalization without task-specific
513+ retraining. Finally, real-world experiments on unseen objects such as trays, buckets, and pans
514+ demonstrate consistent success, confirming that DAGDiff’s classifier-guided diffusion produces grasps
515+ that are stable, collision-free, and transferable beyond simulation. Real-life failures occur mostly due to noisy point-cloud estimation and
516+ hance generated grasps are not always perfect.
517+ </ div >
518+ </ div >
519+ </ section >
520+
521+
522+ < section class ="section " id ="BibTeX " style ="margin-bottom: 1rem; ">
523+ < div class ="container is-max-desktop content ">
524+ < h2 class ="title "> BibTeX</ h2 >
525+ < pre > < code > Will be updated</ code > </ pre >
526+ </ div >
527+ </ section >
528+
373529 <!-- <section class="section" id="BibTeX" style="margin-bottom: 1rem;">
374530 <div class="container is-max-desktop content">
375531 <h2 class="title">BibTeX</h2>
0 commit comments