@@ -157,7 +157,7 @@ <h1 class="title is-1 publication-title"><span class="gradient-text">DAGDiff</sp
157157 < div class ="container is-max-desktop ">
158158 < div class ="columns has-text-centered ">
159159 < div class ="column is-full-width ">
160- < h2 class ="title is-3 "> Video Explanation</ h2 >
160+ < h2 class ="title is-2 "> Video Explanation</ h2 >
161161 < div class ="columns is-centered video-container ">
162162 < video controls muted poster ="./static/images/video_thumbnail.png " preload ="none "
163163 src ="./static/videos/intro_video.mp4 ">
@@ -172,14 +172,14 @@ <h2 class="title is-3">Video Explanation</h2>
172172 < div class ="container is-max-desktop ">
173173 < div class ="columns is-centered has-text-centered ">
174174 < div class ="column is-four-fifths ">
175- < h2 class ="title is-3 mt-3 "> Abstract</ h2 >
175+ < h2 class ="title is-2 mt-3 "> Abstract</ h2 >
176176 < div class ="content has-text-justified ">
177177 Reliable dual-arm grasping is essential for manipulating large and complex objects but remains a
178178 challenging problem due to stability, collision, and generalization requirements. Prior methods
179179 typically decompose the task into two independent grasp proposals, relying on region priors or
180180 heuristics that limit generalization and provide no principled guarantee of stability. We
181- propose DAGDiff, an end-to-end framework that directly denoises to grasp pairs in the $ SE(3)
182- \times SE(3)$ space. Our key insight is that stability and collision can be enforced more
181+ propose DAGDiff, an end-to-end framework that directly denoises to grasp pairs in the \( SE(3)
182+ \times SE(3)\) space. Our key insight is that stability and collision can be enforced more
183183 effectively by guiding the diffusion process with classifier signals, rather than relying on
184184 explicit region detection or object priors. To this end, DAGDiff integrates geometry-,
185185 stability-, and collision-aware guidance terms that steer the generative process toward grasps
@@ -209,14 +209,14 @@ <h2 class="title is-3 mt-3">Abstract</h2>
209209
210210
211211
212- < section class ="section " style ="background-color: rgb(255, 255, 255); margin-bottom:20px ">
212+ < section class ="section " style ="background-color: rgb(255, 255, 255); margin-bottom:0px ">
213213 < div class ="container is-max-desktop ">
214214 < div class ="columns has-text-centered ">
215215 < div class ="column is-full-width ">
216- < h2 class ="title is-3 "> Model Architecture</ h2 >
217- < img src ="./static/images/pipeline.svg ">
216+ < h2 class ="title is-2 "> Model Architecture</ h2 >
217+ < img class =" mt-4 " src ="./static/images/pipeline.svg ">
218218 < div class ="content has-text-justified my-4 ">
219- < b > Overview of the proposed method</ b > : < b > (a)</ b > Given an object point cloud P , our network
219+ < b > Overview of the proposed method</ b > : < b > (a)</ b > Given an object point cloud \(P\) , our network
220220 encodes
221221 geometric features into dense feature maps. Next,
222222 randomly initialized dual-arm grasps \(H\) are used to transform a fixed query cloud into query
@@ -240,115 +240,135 @@ <h2 class="title is-3">Model Architecture</h2>
240240
241241 < h4 class ="title is-4 has-text-centered "> \(SE(3) \times SE(3) \longleftrightarrow \mathbb{R}^{12}\)</ h4 >
242242
243- < div class ="columns has-text-justified mt-4 ">
243+ < div class ="columns has-text-justified mt-2 mb-5 ">
244244 < div class ="column is-full-width is-flex is-justify-content-center is-align-items-center "">
245245 < img src =" ./static/images/logmap2.svg ">
246246 </ div >
247247
248248 < div class ="column is-full-width ">
249- Additionally, dual-arm grasp poses are represented as pairs of rigid-body transformations
249+ < b > Denoising in the dual-arm grasp space: </ b > Additionally, dual-arm grasp poses are represented as pairs of rigid-body transformations
250250 in \(SE(3) \times SE(3)\), which are mapped into a \(12\text{D}\) Euclidean space for diffusion and
251251 back.
252252 Each \(SE(3)\) element is
253253 first projected into its \(6\text{D}\) Lie algebra representation via the < u > logarithmic map</ u >
254254 \((\operatorname{Logmap_{2}})\), and
255- concatenated to form a vector in \(\mathbb{R}^{12}\).
256- < br /> < br />
255+ concatenated to form a vector in \(\mathbb{R}^{12}\).
256+ < br /> < br />
257257 The diffusion process is then carried out in
258258 this Euclidean space. To obtain valid grasp poses, the < u > exponential map</ u >
259259 \((\operatorname{Expmap_{2}})\) maps vectors in \(\mathbb{R}^{12}\) back to
260260 \(SE(3) \times SE(3)\). This bidirectional mapping enables diffusion while ensuring grasps remain
261261 consistent with rigid-body motion.
262262
263263 </ div >
264+ </ div >
265+ < hr />
264266
267+ < h4 class ="title is-4 has-text-centered "> \(\text{Denoising using Classifier Guidance}\)</ h4 >
265268
269+ < div class ="columns has-text-centered ">
270+ < div class ="column is-full-width mt-2 ">
271+ < video autoplay loop muted poster ="" preload ="none " style ="width:100%; ">
272+ < source src ="./static/videos/only_graph_cropped3.mp4 ">
273+ </ video >
274+ </ div >
275+ </ div >
276+
277+ <!-- Colormap bar -->
278+ < div class ="columns has-text-centered my-5 ">
279+ < div class ="column is-full-width ">
280+ < div style ="
281+ background: linear-gradient(to right, rgb(255, 85, 85), rgb(63, 255, 63));
282+ height: 12px;
283+ border-radius: 30px;
284+ margin: 0 auto;
285+ width: 70%;
286+ position: relative; ">
287+ </ div >
288+ < div style ="display: flex; justify-content: space-between; width: 70%; margin: 5px auto 0 auto; font-size: 0.9rem; ">
289+ < span style ="color: rgb(182, 1, 1); font-weight: 500; "> Noisy Grasp Pairs</ span >
290+ < span style ="color: rgb(45, 150, 45); font-weight: 500; "> Stable Grasp Pairs</ span >
291+ </ div >
292+ </ div >
266293 </ div >
294+
295+ < div class ="content has-text-justified my-4 ">
296+ < b > Overview of the denoising process:</ b > The above clip shows the joint denoising process step by step. As the
297+ time progresses, the < span style ="color: rgb(182, 1, 1); "> Energy \((E_\alpha)\)</ span > gradually
298+ decreases, which means grasps are moving towards the object and
299+ not just floating in free space. At the same time, the < span
300+ style ="color:rgb(11, 33, 158) "> Force-Closure Probability \((C_{\beta}^{\text{fc}})\)</ span > steadily
301+ increases,
302+ highlighting how the grasp becomes more stable and reliable over time. Finally, in the later stages of
303+ denoising, colliding grasps are
304+ refined for a small number of iterations using < span style ="color:rgb(45, 150, 45) "> Collision Classifier
305+ \((C_{\gamma}^{\text{col}})\)</ span > , resulting in dual-arm grasps that are force-closure stable as
306+ well as collision-free.
307+ </ div >
308+
309+
267310 </ div >
268311 </ section >
269312
270- <!--
271313 < section class ="section " style ="background-color: rgb(252, 252, 252); ">
272314 < div class ="container is-max-desktop ">
273315 < div class ="columns has-text-centered ">
274316 < div class ="column is-full-width ">
275- <h2 class="title is-3">Results (Coming soon)</h2>
276- </div>
277- </div>
278- </div>
279- </section> -->
317+ < h2 class ="title is-3 ">
318+ Real Life Results < sup style ="font-size: 15px; "> †</ sup >
319+ </ h2 >
320+
321+ < p class ="is-size-7 has-text-grey mt-4 has-text-right ">
322+ < sup > †</ sup > Unseen object categories
323+ </ p >
324+
325+ < div class ="columns is-multiline is-centered ">
326+ < div class ="column is-half my-5 ">
327+ < video autoplay loop muted poster ="" preload ="none "
328+ style ="width:100%; border: 2px solid #ddd; border-radius: 10px; ">
329+ < source src ="./static/videos/real_life_bucket.webm ">
330+ </ video >
331+ < h4 class ="title is-5 "> (a) Bucket</ h4 >
332+ </ div >
280333
281- < section class ="section " style ="background-color: rgb(255, 255, 255); ">
282- < div class ="container is-max-desktop ">
283- < div class ="columns has-text-centered ">
284- < div class ="column is-full-width ">
285- < video autoplay loop muted poster ="" preload ="none " style ="width:100%; ">
286- < source src ="./static/videos/only_graph.webm ">
287- </ video >
288- </ div >
289- </ div >
290- </ div >
291- </ section >
334+ < div class ="column is-half my-5 ">
335+ < video autoplay loop muted poster ="" preload ="none "
336+ style ="width:100%; border: 2px solid #ddd; border-radius: 10px; ">
337+ < source src ="./static/videos/real_life_tray.webm ">
338+ </ video >
339+ < h4 class ="title is-5 "> (b) Tray</ h4 >
340+ </ div >
292341
293- < section class ="section " style ="background-color: rgb(252, 252, 252); ">
294- < div class ="container is-max-desktop ">
295- < div class ="columns has-text-centered ">
296- < div class ="column is-full-width ">
297- < h2 class ="title is-3 ">
298- Real Life Results < sup style ="font-size: 15px; "> †</ sup >
299- </ h2 >
300-
301- < p class ="is-size-7 has-text-grey mt-4 has-text-right ">
302- < sup > †</ sup > Unseen object categories
303- </ p >
304-
305- < div class ="columns is-multiline is-centered ">
306- < div class ="column is-half my-5 ">
307- < video autoplay loop muted poster ="" preload ="none "
308- style ="width:100%; border: 2px solid #ddd; border-radius: 10px; ">
309- < source src ="./static/videos/real_life_bucket.webm ">
310- </ video >
311- < h4 class ="title is-5 "> (a) Bucket</ h4 >
312- </ div >
313-
314- < div class ="column is-half my-5 ">
315- < video autoplay loop muted poster ="" preload ="none "
316- style ="width:100%; border: 2px solid #ddd; border-radius: 10px; ">
317- < source src ="./static/videos/real_life_tray.webm ">
318- </ video >
319- < h4 class ="title is-5 "> (b) Tray</ h4 >
320- </ div >
321-
322- < div class ="column is-half my-5 ">
323- < video autoplay loop muted poster ="" preload ="none "
324- style ="width:100%; border: 2px solid #ddd; border-radius: 10px; ">
325- < source src ="./static/videos/real_life_drone.webm ">
326- </ video >
327- < h4 class ="title is-5 "> (c) Drone</ h4 >
328- </ div >
329-
330- < div class ="column is-half my-5 ">
331- < video autoplay loop muted poster ="" preload ="none "
332- style ="width:100%; border: 2px solid #ddd; border-radius: 10px; ">
333- < source src ="./static/videos/real_life_frypan.webm ">
334- </ video >
335- < h4 class ="title is-5 "> (d) Frypan</ h4 >
336- </ div >
337-
338- < div class ="column is-half my-5 ">
339- < video autoplay loop muted poster ="" preload ="none "
340- style ="width:100%; border: 2px solid #ddd; border-radius: 10px; ">
341- < source src ="./static/videos/real_life_saucepan.webm ">
342- </ video >
343- < h4 class ="title is-5 "> (e) Saucepan</ h4 >
342+ < div class ="column is-half my-5 ">
343+ < video autoplay loop muted poster ="" preload ="none "
344+ style ="width:100%; border: 2px solid #ddd; border-radius: 10px; ">
345+ < source src ="./static/videos/real_life_drone.webm ">
346+ </ video >
347+ < h4 class ="title is-5 "> (c) Drone</ h4 >
348+ </ div >
349+
350+ < div class ="column is-half my-5 ">
351+ < video autoplay loop muted poster ="" preload ="none "
352+ style ="width:100%; border: 2px solid #ddd; border-radius: 10px; ">
353+ < source src ="./static/videos/real_life_frypan.webm ">
354+ </ video >
355+ < h4 class ="title is-5 "> (d) Frypan</ h4 >
356+ </ div >
357+
358+ < div class ="column is-half my-5 ">
359+ < video autoplay loop muted poster ="" preload ="none "
360+ style ="width:100%; border: 2px solid #ddd; border-radius: 10px; ">
361+ < source src ="./static/videos/real_life_saucepan.webm ">
362+ </ video >
363+ < h4 class ="title is-5 "> (e) Saucepan</ h4 >
364+ </ div >
365+ </ div >
366+
367+ <!-- footnote -->
344368 </ div >
345- </ div >
346-
347- <!-- footnote -->
348369 </ div >
349- </ div >
350370 </ div >
351- </ section >
371+ </ section >
352372
353373 <!-- <section class="section" id="BibTeX" style="margin-bottom: 1rem;">
354374 <div class="container is-max-desktop content">
@@ -385,4 +405,4 @@ <h2 class="title">BibTeX</h2>
385405
386406</ body >
387407
388- </ html >
408+ </ html >
0 commit comments