Skip to content

feat(Chore): Figma MCP improvements#1542

Open
Marcosld wants to merge 4 commits intomasterfrom
figma-mcp-skill-updates
Open

feat(Chore): Figma MCP improvements#1542
Marcosld wants to merge 4 commits intomasterfrom
figma-mcp-skill-updates

Conversation

@Marcosld
Copy link
Copy Markdown
Contributor

@Marcosld Marcosld commented Apr 28, 2026

Introduction

The objective of this task was investigating figma MCP (with and without code connect).

First, we started by carrying out a bunch of manual tests. They made clear that implementing figma designs with a screenshot was far worse than implementing them with the MCP.
Then, using the MCP we looked for common implementation flaws. Based on those flaws we created a specific doc targeting LLMs trying to implement figma designs (hence this PR). We mainly found that the LLM didn't dig down into the figma layouts sometimes and missed props. For example, instead of using a slot for the prices it usually puts all the text in the description. On the other hand, it missed a lot of the text sizings and sometimes it tried to use the literal hex colors in the MCP output. Also sometimes it ignored the literal spacing values from the MCP. We wrote doc to try to avoid all these common flaws.

We also found a core flaw while trying to center texts. We added instructions to avoid it to the global skill doc.

Some of these tests made us think that figma essentially behaved the same with vs without code connect when implementing figma designs with mistica.

In order to systematize our investigation, we followed by carrying out some systematic tests comparing different conditions, that will be detailed below.

Previous investigation pointed us towards these 3 conditions:

  • Iteration 1: With / without figma specific doc with raw figmas.
  • Iteration 2: With / without figma specific doc with polished figmas (auto layout, components, etc)
  • Iteration 3: With / without figma specific doc with polished figmas and code connect (note we didn't test with fully functioning code connect doc, but with a subset of the components that can be checkout out in this branch: https://github.com/Telefonica/mistica-web/tree/iceballos-figma-code-connect)

As you can see further investigation should probably be carried out in the future with code connect.

Summary

Systematic tests proved LLMs are rather inconsistent when carrying out large tasks like implementing these figmas. The variation was enormous: a lot of hallucination, skipping explicit rules in the doc, using native primitives (llm made excuses like "being under pressure") or even cases where it skipped implementing parts of the page entirely. Also despite having clear values spit out by the MCP, llms "forget" them fairly frequently. One of the subagents even got stuck in one of the tasks and had to be relaunched. These tests were carryed out with Opus xhigh, so this makes us think that LLMs probably aren't suited for big tasks without surveillance and they behave far better when given small tasks and clear objectives. We would like to investigate splitting these large tasks in small subtasks in the future via skill or similar.

Patterns

  • New figma-mcp.md doc improves implementation consistently Beta's wins come from doc-driven behaviors: Drilling into composite nodes with disableCodeConnect: true, picking modern card props (imageSrc / slot / headline) over deprecated ones (media={<Image>} / extra / description), mapping Tag types per item and better centering.
  • Layout primitives are the most stubborn gap. Grid / GridLayout for asymmetric splits, Boxed for tile containers, <Align x="center"> for centering — all are consistently missed in favour of <div style={{display:"grid"}}>, <Inline> without fullWidth, or <div style={{textAlign:"center"}}>. The skill's instructions on these are explicit; both agents still fall back to CSS escape hatches randomly.
  • Polished Figma is the second biggest lever. Iter-1 → iter-2 deltas are large. Both stable and beta picked up Chip, BoxedRowList/BoxedRow, Slideshow, MediaCard, content-switching Tabs, and Tag type mapping per item.
  • Code Connect results are inconsistent. More testing should be carried out probably Iter-3 case 1 and case 2 ended in ties or stable wins. Iter-3 case 3 reopened the gap (perhaps because it is a more complex page).
  • Guesssing. Agents systematically missed cards aspect ratio when multiple aspect ratios were present in the page, using the same one for every section of cards.

Random content

Despite having explicit instructions and clear outputs from figmas, both versions randomly missed primitives or components.

  • Both versions occasionally invent reviewer names, review bodies, spec rows, and rating-filter values. The disableCodeConnect: true drill-down step is skipped intermitently.
  • Tag type selection was sometimes random. MediaCard prop choice sometimes regressed to deprecated ones and even custom implementation. Slideshow vs static <Image>.
  • Variant selector primitiveBoxedRowList/BoxedRow (right) vs custom <button> wrapping <Boxed> (wrong).
  • Chevron rotation LLMs randomly rotated up/down chevrons instead of using right/left ones.

Iterations

We used 3 figma designs in each iteration. With 3 iterations and 2 conditions that makes a total of 18 tests (they were super heavy, and I ran out of tokens twice). All the figma desings can be checked out here (raw vs polished): https://www.figma.com/design/a0kSW0iLff0omlQbUoaJQO/IA-x-Design?node-id=93-1039&m=dev.


Iteration 1: MCP doc vs MCP no doc

Original (unpolished) Figma files; no Code Connect. Beta's only material advantage is the new doc/figma-mcp.md rule file shipped in 16.62.0-beta.1.

1.1 — Case 1 (product list)

Stable 16.61.0 Beta 16.62.0-beta.1
Stable 16.61.0 Beta 16.62.0-beta.1
Pagination control ✗ — raw <button> w/ inline style ✗ — same. No Pagination primitive in either version.
Outer sidebar + content 2-col layout ✗ — <div style={{display:"grid"}}> ✗ — same. No primitive for asymmetric 2-col layouts.
Product grid ✗ — raw CSS grid repeat(auto-fill, minmax) ✓ — <Grid columns={3} gap={24}>
Inline-colored text ✗ — <span style={{color}}> ✓ — <Text2 as="span" medium>
Right-align of pagination ✗ — <div style={{marginLeft:"auto"}}> ✓ — Inline space="between" (with empty <span/> spacer)
Sidebar wrapper ✗ — extra <aside> around <FiltersSidebar /> ✓ — no extra wrapper
Prev/next chevrons ✓ — IconChevronLeftRegular / IconChevronRightRegular ✗ — single IconChevronDownRegular rotated via <span style={{transform:rotate(...)}}>
Score 1 / 7 5 / 7

Net. Beta uses Mistica primitives in 5/7 categories vs 1/7 for stable, and lands closer to the Figma visually (correct column count, real price slot, correctly sized stars). Stable's only win is using the proper directional chevrons instead of rotating one. Shared unavoidable escape hatches: page-number <button> (no Pagination).

1.2 — Case 2 (PDP)

Stable 16.61.0 Beta 16.62.0-beta.1
Stable 16.61.0 Beta 16.62.0-beta.1
Asymmetric 2-col layout (gallery + purchase aside) ✗ — <Grid columns={2}> ✓ — <GridLayout template="6+6">
Tabs content switching ✗ — <Tabs> rendered but selectedTab ignored; sections stacked ✓ — selectedTab drives {selectedTab === 0 && <InformacionTab />}
Gallery prev/next chevrons ✗ — absolute-positioned <div>s with chevron icons inside ✓ — <IconButton small Icon={IconChevronLeftRegular}>
Gallery thumbnails ✗ — <div style={{flex:1}}> flex hack wrapping Boxed ✗ — raw <button style={...}> ThumbnailButton
Capacity selector (128 GB / 256 GB / 512 GB) ✗ — raw <button> w/ inline style ✗ — same. No "pill toggle" primitive.
Color/finish swatches ✗ — custom <button> + 5 hex literals ✗ — custom <button> + 5 hex literals
Variant selector (iPhone 13 mini / 14) ✓ — RadioGroup + RadioButton inside Boxed ✓ — same
Companion cards ("Comprados juntos") ✗ — Boxed + Box + Stack + Image (no card primitive) ✗ — same
Star rating in header ✓ — InfoRating value={4.5} withHalfValue ✓ — InfoRating value={4}
Two-image strip in Información section ✗ — <div style={{flex:1}}> × 2 inside Inline n/a — beta puts gallery in another tab
"Envío" / "Recogida en tienda" rows w/ right-aligned action ✗ — labels left-aligned, not stretched ✓ — <Inline space="between"> stretches each row
Score 2 / 11 5 / 11

Net. Beta uses Mistica primitives in 5/11 vs 2/11. Beta adds GridLayout, content-switching Tabs, Inline space="between", probably because figma-mcp.md doc steering primitive selection. Shared errors: color swatches (no primitive, hex literals), companion cards defaulting to Boxed instead of MediaCard on this run.
Note beta version screen is smaller because it implemented tab logic.

1.3 — Case 3

Stable 16.61.0 Beta 16.62.0-beta.1
Stable 16.61.0 Beta 16.62.0-beta.1
Tech product card ("Xbox Series S", …) ✗ — <DataCard> with raw <img> as asset ✓ — <MediaCard imageSrc=… mediaAspectRatio="1:1">
Tech card price block ✗ — extra={…} (deprecated; slot= is the replacement) ✓ — slot={<Stack>…<Text4 medium>{price}</Text4>…</Stack>}
Tech card headline tag type ✗ — Tag type="promo" for all four cards including "Novedad" ✓ — Tag type={tag.type} per item (Novedad → active, Exclusivo online → promo)
"Lo mejor en tecnología" row layout ✗ — <Grid columns={4}> (static) ✓ — <Carousel itemsPerPage={{mobile:1, tablet:2, desktop:4}}>
Category circle (icon-on-tinted-circle) ✓ — <Circle size={56} backgroundColor=…> ✗ — inline <div style={{borderRadius:"50%"}}> (skipped Mistica Circle)
Centering category icon ✗ — <Inline alignItems="center"> + <div style={{margin:"0 auto"}}> ✓ — <Align x="center">
Centering value-prop columns ✗ — <div style={{display:"flex", justifyContent:"center"}}> ✓ — <Align x="center"> for icon + each text line
Search pill in MainNavigationBar topSlot ✗ — custom <div style={{...}}> w/ inline icon n/a — beta skipped the wide search pill
Footer social icons ✗ — <span style={{color:"#fff", fontWeight:700, …}}>{letter}</span> ✗ — <div style={{borderRadius:"50%"}}> colored dots. No brand SVGs in either.
Footer Movistar logo color ✗ — color="#fff" literal ✓ — <MovistarLogo> (no override; skin handles inverse)
MainNavigationBar right-side actions ✓ — <NavigationBarActionGroup> + NavigationBarAction ✓ — same, with <Avatar size={32} initials="ES">
Top fibra hero ✓ — <CoverHero backgroundImage=…> ✓ — <CoverHero backgroundImage="/images/hero-fibra.png">
Movistar Plus+ hero band ✓ — <CoverHero> inside ResponsiveLayout variant="negative" ✓ — <CoverHero backgroundImage=…>
Promo cover cards row ✓ — <CoverCard> × 3 in Grid w/ height={456} matching Figma ✗ — <CoverCard> × 3 with no aspect-ratio / height (renders shorter than designed)
Score 7 / 15 10 / 15

Net. Beta uses Mistica primitives in 10/15 vs 7/15. Wins are concentrated in card composition (MediaCard over DataCard+<img>, slot= over deprecated extra=, correct Tag types per item, Carousel over a static Grid, Align for centering, no hardcoded white). Stable's structural advantages are real but localized: it pinned the promo cover cards' height to match Figma and used Circle for category icons — beta missed both.


Iteration 2: MCP doc polished figmas vs MCP no doc polished figma

Same designs as iteration-1, rebuilt for agent compatibility (real designer assets where applicable, explicit placeholder cards where not, tighter Code Connect mappings, cleaner DOM structure).

2.1 — Case 1 (product list)

Stable 16.61.0 Beta 16.62.0-beta.1
Stable 16.61.0 Beta 16.62.0-beta.1
Hero (Pixel 7 + image) ✓ — <Hero> with imageSrc= and headline={<Tag type="info">} ✓ — same with <Tag type="active">
Filter chips row ✓ — <Chip active={…} onPress={…}> per option ✓ — same
Filter sidebar checkboxes ✓ — <Checkbox> ✓ — <Checkbox>
Rating filter (radio + stars) ✓ — <RadioGroup> + <RadioButton> + <InfoRating> ✓ — same
Asymmetric sidebar + grid layout ✗ — <Grid columns={12}> + <GridItem columnSpan={3}> / columnSpan={9} ✗ — <Grid columns={4}> + raw <div style={{gridColumn:"span 3"}}> (worse: skipped GridItem). Neither uses GridLayout.
Filters header + product paging alignment ✗ — both headers crammed onto one row ✗ — same misalignment
Product card primitive ✓ — <MediaCard imageSrc=… mediaAspectRatio="1:1" headline={<Tag>} title=… description=…> ✗ — hand-rolled <div> w/ <img> (agent invented a wrong constraint about MediaCard not exposing the slot)
Product card price hierarchy ✗ — flat description="Por solo:\n64,90 € / iva incl." blob ✓ — PriceSlot with Text2 for labels + Text4 medium digits
Product card tag types ✗ — Tag type="info" for "Novedad" (wrong); promo for "Exclusivo online" (right) ✓ — active for "Novedad" (matches Figma), promo for "Exclusivo online"
Pagination ✗ — custom IconButton + page-number <button> ✗ — same custom approach
Score 5 / 10 6 / 10

Net. Polishing the Figma raises the floor for both versions vs iteration-1. Both used Hero, Chip, Checkbox, RadioGroup+RadioButton+InfoRating without prompting. The version-attributable gap that remains is in typography and tag semantics: beta consistently picks the right Tag type per item and steps the text presets correctly across price blocks; stable used the wrong tag type and flat description=. Notable inversion: beta hand-rolled the product card on this run while stable correctly used MediaCard. Visible defects: wrong filter-header alignment in both versions; beta's rating filter repeats value={3} 3 times, breaking the filter visually.

2.2 — Case 2 (PDP)

Stable 16.61.0 Beta 16.62.0-beta.1
Stable 16.61.0 Beta 16.62.0-beta.1
Hero gallery (main image + thumbnails) ✗ — <Slideshow> correct, but the thumbnail row does not render in the page (broken layout) ✓ — <Slideshow withBullets> with placeholder thumbnails visible and clickable
Información tab image gallery ✗ — static Inline of two <Image>s ✓ — <Carousel> for the image strip
Asymmetric 2-col layout ✗ — <Grid columns={12}> + <GridItem columnSpan={7/5}> ✓ — <GridLayout template="6+6">
Variant selector ✓ — <RadioGroup> + <BoxedRowList> + <BoxedRow> ✓ — same
Capacity selector ✓ — <Chip> ✓ — same
Color/finish swatches ✗ — <div> + 5 hex literals ✗ — <button> + 6 hex literals. No swatch primitive.
Companion cards ✓ — <MediaCard> with deprecated media={<Image>} ✓ — <MediaCard imageSrc=… title=… pretitle=… slot=…> (modern API)
Companion card price block ✗ — flat description="..." blob ✓ — slot={<Stack>…<Text4 medium>{price}</Text4>…</Stack>}
Tabs content switching ✓ — <Tabs> driving conditional content ✓ — same
Star-rating distribution bars ✗ — <ProgressBar> correct primitive but renders broken ✗ — <div> shim, also visually wrong. Neither version renders this section correctly.
Stars wrapper around InfoRating ✗ — extra <div style={{display:"inline-flex"}}> wrapper ✓ — used InfoRating directly
Score 4 / 11 9 / 11

Net. Both versions picked up Chip, BoxedRowList/BoxedRow, Slideshow, MediaCard, content-switching Tabs, perhaps from the cleaner Figma — no library change required. Where versions still differ, the gap is consistent with iteration-1: beta uses the modern slot-style props, GridLayout, Carousel, and imageSrc=; stable uses the deprecated media= / flat-description props, Grid+columnSpan, and a static Inline of images. Two visible defects to flag: stable's hero thumbnail row doesn't render in the page; neither version renders the rating distribution bars correctly.

2.3 — Case 3

Stable 16.61.0 Beta 16.62.0-beta.1
Stable 16.61.0 Beta 16.62.0-beta.1
Top navigation ✓ — <MainNavigationBar logo={<MovistarLogo>}> ✓ — same, with <Avatar size={32} src=…>
Top fibra hero ✓ — <Slideshow> wrapping <CoverHero> ✓ — <CoverHero backgroundImage=…>
Category icons (6 items) ✓ — <Circle size={56}> containing the icon ✓ — <Image src=… circular width={80}>
Promo cover cards (Gamepass / Fibra / Novedades) ✓ — <CoverCard> × 3 ✓ — same
Tech product cards in "Lo mejor en tecnología" ✓ — <MediaCard imageSrc=… mediaAspectRatio="1:1" headline={<Tag>} pretitle=… description=…> ✓ — <MediaCard imageSrc=… headline={<Tag>} slot={<PriceSlot/>}>
Tech card price hierarchy ✗ — flat description=… blob ✓ — slot={<PriceSlot>} with Text4 medium digits
Tech card tag types ✗ — "Novedad" mapped to Tag type="promo" (red, wrong); other tags mapped inconsistently ✓ — "Novedad" → active, "Exclusivo online" → promo, correctly per item
Movistar Plus+ section composition ✗ — bare <CoverHero> with title + button only. The 3 Originals/Series/Deporte CoverCards are not in the file at all ✓ — <CoverHero extra={<Grid columns={3}>{3× <CoverCard>}</Grid>}> matching Figma
Footer Movistar logo on brand bg ✓ — <MovistarLogo size={52} color={…textPrimaryInverse}> ✗ — raw asset PNG (brand-logo.png); rendered slightly cropped
Centering / no raw HTML escape hatches ✓ — zero raw <div>/<span>/<button>/<img> and zero style={{}} blocks in 455 lines ✗ — two centering hacks + inline-styled blocks for category icon container
Score 7 / 10 8 / 10

Net. This is the test where stable looks cleanest at the file level — zero raw HTML / inline styles in 455 lines of App.tsx. Beta still has the structural edge on three categories that map to specific Mistica idioms: slot= for the tech-card price block, correct Tag type per item, and <CoverHero extra={…}> to nest the Movistar Plus+ mini-cards. Stable did not render the three CoverCards at all — there is no CoverCard for Movistar Originals / Mejores series y películas / Todo el deporte anywhere in stable's App.tsx, so a whole row of the design is missing from the page.


Iteration 3: MCP doc polished figma partial code connect vs MCP no doc polished figma partial code connect

Same polished designs as iteration-2. Difference: Figma Code Connect mappings are now enabled, so the MCP returns <CodeConnectSnippet> hints indicating which Mistica components the designer chose.

3.1 — Case 1 (product list)

Stable 16.61.0 Beta 16.62.0-beta.1
Stable 16.61.0 Beta 16.62.0-beta.1
Hero (Pixel 7 + image) ✓ — <Hero> with imageSrc= and headline={<Tag>} ✓ — same
Filter chips row ✓ — <Chip> ✓ — same
Filter sidebar checkboxes ✓ — <Checkbox> ✓ — <Checkbox>
Rating filter (radio + stars) ✓ — <RadioGroup>+<RadioButton>+<InfoRating> ✓ — same
Asymmetric sidebar + grid layout ✓ — <GridLayout template="3+9"> (improvement vs iter-2) ✓ — same GridLayout template="3+9"
Filters header + product paging alignment ✓ — sidebar/grid headers in their own columns (improvement vs iter-2) ✓ — same
Product card primitive ✓ — <MediaCard imageSrc=… mediaAspectRatio="1:1" headline={<Tag>} slot={<PriceSlot/>}> ✓ — same
Product card price slot ✓ — slot= with Text2/Text3 regular/Text2 medium ✓ — same
Product card tag types ✓ — per item: active / promo ✓ — same
Product grid layout (3 cols × 2 rows) ✓ — <Grid columns={3} gap={24}>{products.map(p => <ProductCard/>)}</Grid> ✗ — chunks rows manually into <Stack>{rows.map(row => <Inline>{row.map(p => <div style={{flex:1}}><ProductCard/></div>)}</Inline>)}</Stack> (raw flex-hack wrappers)
Pagination ✗ — custom IconButton + page-number <button> ✗ — same
Score 9 / 11 8 / 11

Net. Code Connect on the polished file pushed both versions up to a similar quality bar — but this run actually tipped slightly in stable's favour because beta went off-script on the product grid layout, hand-chunking rows into <Stack>+<Inline> with <div style={{flex:1}}> wrappers instead of using Grid columns={3}. The version-attributable items beta usually wins (modern card props, slot-style price block, correct tag types per item) all hit ✓ on stable too this run, so the only structural delta left was the grid choice — and beta lost it. Persistent shared miss: no Pagination primitive in either version.

3.2 — Case 2 (PDP)

Stable 16.61.0 Beta 16.62.0-beta.1
Stable 16.61.0 Beta 16.62.0-beta.1
Hero gallery (main image + thumbnails) ✓ — <Slideshow> for the main image, with proper bullets/controls ✗ — hand-rolled: state-driven <Image> + raw <button> thumbnails (regression vs iter-2)
Asymmetric 6+6 layout ✓ — <GridLayout template="6+6"> (improvement vs iter-2) ✓ — same
Variant selector ✗ — custom VariantBoxedRow: raw <button> wrapping <Boxed> + hand-rolled <span> radio (regression vs iter-2) ✓ — <RadioGroup> + <BoxedRowList> + <BoxedRow>
Capacity selector ✓ — <Chip> ✓ — same
Color/finish swatches ✗ — raw <button> + 6 hex literals ✗ — same. No swatch primitive.
Companion cards ("Comprados juntos") ✓ — <MediaCard imageSrc=… mediaAspectRatio="1:1" slot={<PriceSlot/>}> (modern API) ✗ — <MediaCard media={<Image>}> — uses the deprecated media={<Image>} prop instead of imageSrc= (regression vs iter-2)
Companion card price block (slot=) ✓ — slot= (digits use raw <Text size={20}> — should be Text4 medium) ✓ — slot= with Text3 regular for digits
Tabs content switching ✓ — <Tabs> with conditional content ✓ — same
Información-tab image gallery ✓ — <Carousel itemsPerPage={…}> ✓ — <Carousel>
Star-rating distribution bars ✓ — <ProgressBar progressPercent={…}> (renders correctly this iteration) ✓ — <ProgressBar>
Tag type for "Nuevo" header ✓ — Tag type="active" (correct) ✓ — Tag type="active"
Accordion all-sections-open ✓ — <Accordion defaultIndex={[0, 1, 2]}> ✓ — same
Score 10 / 12 10 / 12

Net. Enabling Code Connect on the polished file closed the version-attributable quality gap for this design — both scored 10/12, with non-overlapping defects. Two role reversals: stable correctly used imageSrc= on MediaCard while beta regressed to deprecated media={<Image>}; stable hand-rolled the variant selector while beta correctly used BoxedRowList/BoxedRow; beta hand-rolled the hero gallery while stable used Slideshow correctly. Cost-wise, Code Connect was a net negative for stable (longer reads, more tool calls, one stalled run that had to be re-spawned) and a net positive for beta (fewer tool calls, faster wall-clock).

3.3 — Case 3

Stable 16.61.0 Beta 16.62.0-beta.1
Stable 16.61.0 Beta 16.62.0-beta.1
Top nav (logo + actions) ✗ — <MainNavigationBar> correct, but logo slot is plain text ("Movistar"); no <Avatar> ("Label" placeholder) ✓ — <MainNavigationBar logo={<MovistarLogo>} + <Avatar size={24} initials="L">
Top fibra hero ✓ — <CoverHero backgroundImage=…> ✓ — same
Categories: row layout primitive ✗ — raw <div style={{display:"grid", gridTemplateColumns:"repeat(6, 1fr)"}}> ✗ — <Inline space={24}> without fullWidth, doesn't span page. Should have been Mistica Grid.
Categories: tile container ✗ — raw <div> w/ inline border/padding/radius (no <Boxed>) ✗ — same <div> shape, no <Boxed>
Promo cover cards: row layout ✗ — raw CSS grid ✗ — <Inline> without fullWidth
Promo cover cards: card primitive ✓ — <CoverCard> × 3 ✓ — <CoverCard> × 3
Tech product cards: row layout ✗ — raw CSS grid ✗ — <Inline> without fullWidth
Tech product cards: card primitive ✓ — <MediaCard> ✓ — <MediaCard>
Tech card price block ✗ — deprecated extra={…} (digits do use Text4 medium) ✓ — modern slot={<ProductPriceSlot>} with Text4 medium + Text2
Tech card tag types ✓ — per item: active / promo (improvement vs iter-2) ✓ — same
Movistar Plus+ section composition ✓ — <CoverHero extra={…}> (improvement vs iter-2 where stable dropped the 3 cards) ✓ — same
Movistar Plus+ extra row layout (3 mini cards) ✗ — raw CSS grid ✗ — <Inline> without fullWidth
Value props row ("Un referente global") ✗ — raw <div style={{display:"grid"}}> ✗ — <Inline space={24}> without fullWidth
Value props: text + icon centering ✗ — pure CSS <div style={{textAlign:"center"}}> for section title and per-block. Did not reach for Mistica ✗ — beta tried (used Text textAlign="center" per block) but wrapped icon in <div style={{display:"flex", justifyContent:"center"}}> and titles in <div style={{textAlign:"center"}}>missed <Align x="center"> despite the skill spelling it out
Footer Movistar logo on brand bg ✗ — rendered as plain text "Movistar" ✓ — <MovistarLogo>
Score 5 / 15 8 / 15

Net. The recurring story for iter-3 landing page is layout primitives: every row in the page (categories, promo covers, tech products, Movistar Plus+ extra cards, value props) wants Mistica Grid or GridLayout, and neither version reaches for it on any of those rows. Stable falls back to raw CSS grid every time; beta falls back to <Inline> without fullWidth (closer in spirit, still not the right primitive). The version-attributable wins for beta this run are concentrated in three places: modern slot= (vs stable's deprecated extra=), <MovistarLogo> and <Avatar> in the nav, <MovistarLogo> in the footer. The category-tile container is the most consistent shared miss — both versions hand-roll a <div> with inline border/padding/radius instead of using <Boxed>, exactly the primitive that exists for this. The value-props section is the most explicit failure case: skill explicitly recommends <Align x="center">; beta missed it; stable didn't even attempt and went straight to CSS.

Results

Quality scores at a glance

Score = number of categories where the implementation used the right Mistica primitive instead of html+css (denominator varies by design and iteration as the analysis got finer-grained).

Design Iter-1 stable Iter-1 beta Iter-2 stable Iter-2 beta Iter-3 stable Iter-3 beta
case 1 (product list) 1/7 5/7 5/10 6/10 9/11 8/11
case 2 (PDP) 2/11 5/11 4/11 9/11 10/12 10/12
case 3 7/15 10/15 7/10 8/10 5/15 8/15

Tokens

Iter-1 stable Iter-1 beta Iter-2 stable Iter-2 beta Iter-3 stable Iter-3 beta
case 1 tokens 108k 156k 100k 139k (rate-limited) (rate-limited)
case 2 tokens 114k 133k 147k 166k 190k 181k
case 3 tokens 134k 185k 125k 152k 153k 163k

Duration

Iter-1 stable Iter-1 beta Iter-2 stable Iter-2 beta Iter-3 stable Iter-3 beta
case 1 4m 53s 7m 47s 5m 7s 6m 58s (rate-limited) (rate-limited)
case 2 7m 51s 11m 32s 9m 27s 9m 9s 15m 45s 7m 53s
case 3 7m 36s 8m 22s 7m 7s 10m 20s 10m 25s 9m 43s

Conclussions

  • It is worth spending some time polishing the figma files.
  • Figma specific doc improves implemented code.
  • Code quality is not production grade. Developers must iterate or do manual polishing.
  • Bigger figmas tend too produce less consistent results.
  • We should perform further investigation with code connect.
  • We should perform further investigation having the llm carry out verification steps.
  • Improvements in mistica could have agents do better designs: Perhaps background in boxes, responsiveLayout and so

IMO LLMs don't seem to be ready for these big tasks. The bigger the task is the more it tends to get lost and be inconsistent. Probably due to their predictive nature, randomness strikes negatively when a lot of decisions have to be made. We would probably be better off giving LLMs smaller tasks and having them under surveillance.

ref: WEB-2435

@Marcosld Marcosld self-assigned this Apr 28, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

Size stats

master this branch diff
Total JS 16.1 MB 16.1 MB +8 B
JS without icons 2.01 MB 2.01 MB +8 B
Lib overhead 92.5 kB 92.5 kB 0 B
Lib overhead (gzip) 19.9 kB 19.9 kB 0 B

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

Accessibility report
✔️ No issues found

ℹ️ You can run this locally by executing yarn audit-accessibility.

@Marcosld Marcosld force-pushed the figma-mcp-skill-updates branch from 84eb045 to 4d3d95c Compare April 28, 2026 10:41
@Marcosld Marcosld marked this pull request as ready for review April 30, 2026 10:35
Copilot AI review requested due to automatic review settings April 30, 2026 10:36
@Marcosld Marcosld changed the title figma mcp skill updates feat(Chore): Figma MCP improvements Apr 30, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Mistica documentation to improve LLM-driven Figma (via MCP) implementations by adding a dedicated translation/rules doc, reinforcing key “no escape hatches” principles, and improving layout/text guidance.

Changes:

  • Add new doc/figma-mcp.md with mandatory translation rules for implementing from Figma MCP output (incl. Code Connect guidance and asset handling).
  • Consolidate “Critical rules” in doc/llms.md and make doc/patterns.md point to it as the single source of truth; add an explicit “re-apply rules during debugging” rule.
  • Enhance layout/text docs: add an Inline nesting pattern to doc/layout.md and add centering guidance to doc/components.md.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
doc/patterns.md Replaces duplicated critical rules with a link to llms.md; minor example indentation cleanup.
doc/llms.md Adds rule about re-applying critical rules during debugging and links the new Figma MCP doc in the references.
doc/layout.md Improves Inline docs formatting and adds an Inline nesting example/pattern.
doc/figma-mcp.md New documentation defining MCP-to-Mistica translation rules, Code Connect drill-down workflow, and asset handling guidance.
doc/components.md Adds centering guidance combining <Align> and textAlign.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread doc/components.md Outdated
Comment thread doc/layout.md
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

Deploy preview for mistica-web ready!

Project:mistica-web
Status: ✅  Deploy successful!
Preview URL:https://mistica-1hxthd2n0-flows-projects-65bb050e.vercel.app
Latest Commit:807c1c4

Deployed with vercel-action

@yceballost yceballost requested review from aweell and removed request for yceballost May 4, 2026 18:09
@yceballost
Copy link
Copy Markdown
Contributor

Introduction

The objective of this task was investigating figma MCP (with and without code connect).

First, we started by carrying out a bunch of manual tests. They made clear that implementing figma designs with a screenshot was far worse than implementing them with the MCP. Then, using the MCP we looked for common implementation flaws. Based on those flaws we created a specific doc targeting LLMs trying to implement figma designs (hence this PR). We mainly found that the LLM didn't dig down into the figma layouts sometimes and missed props. For example, instead of using a slot for the prices it usually puts all the text in the description. On the other hand, it missed a lot of the text sizings and sometimes it tried to use the literal hex colors in the MCP output. Also sometimes it ignored the literal spacing values from the MCP. We wrote doc to try to avoid all these common flaws.

We also found a core flaw while trying to center texts. We added instructions to avoid it to the global skill doc.

Some of these tests made us think that figma essentially behaved the same with vs without code connect when implementing figma designs with mistica.

In order to systematize our investigation, we followed by carrying out some systematic tests comparing different conditions, that will be detailed below.

Previous investigation pointed us towards these 3 conditions:

  • Iteration 1: With / without figma specific doc with raw figmas.
  • Iteration 2: With / without figma specific doc with polished figmas (auto layout, components, etc)
  • Iteration 3: With / without figma specific doc with polished figmas and code connect (note we didn't test with fully functioning code connect doc, but with a subset of the components that can be checkout out in this branch: iceballos-figma-code-connect)

As you can see further investigation should probably be carried out in the future with code connect.

Summary

Systematic tests proved LLMs are rather inconsistent when carrying out large tasks like implementing these figmas. The variation was enormous: a lot of hallucination, skipping explicit rules in the doc, using native primitives (llm made excuses like "being under pressure") or even cases where it skipped implementing parts of the page entirely. Also despite having clear values spit out by the MCP, llms "forget" them fairly frequently. One of the subagents even got stuck in one of the tasks and had to be relaunched. These tests were carryed out with Opus xhigh, so this makes us think that LLMs probably aren't suited for big tasks without surveillance and they behave far better when given small tasks and clear objectives. We would like to investigate splitting these large tasks in small subtasks in the future via skill or similar.

Patterns

  • New figma-mcp.md doc improves implementation consistently Beta's wins come from doc-driven behaviors: Drilling into composite nodes with disableCodeConnect: true, picking modern card props (imageSrc / slot / headline) over deprecated ones (media={<Image>} / extra / description), mapping Tag types per item and better centering.
  • Layout primitives are the most stubborn gap. Grid / GridLayout for asymmetric splits, Boxed for tile containers, <Align x="center"> for centering — all are consistently missed in favour of <div style={{display:"grid"}}>, <Inline> without fullWidth, or <div style={{textAlign:"center"}}>. The skill's instructions on these are explicit; both agents still fall back to CSS escape hatches randomly.
  • Polished Figma is the second biggest lever. Iter-1 → iter-2 deltas are large. Both stable and beta picked up Chip, BoxedRowList/BoxedRow, Slideshow, MediaCard, content-switching Tabs, and Tag type mapping per item.
  • Code Connect results are inconsistent. More testing should be carried out probably Iter-3 case 1 and case 2 ended in ties or stable wins. Iter-3 case 3 reopened the gap (perhaps because it is a more complex page).
  • Guesssing. Agents systematically missed cards aspect ratio when multiple aspect ratios were present in the page, using the same one for every section of cards.

Random content

Despite having explicit instructions and clear outputs from figmas, both versions randomly missed primitives or components.

  • Both versions occasionally invent reviewer names, review bodies, spec rows, and rating-filter values. The disableCodeConnect: true drill-down step is skipped intermitently.
  • Tag type selection was sometimes random. MediaCard prop choice sometimes regressed to deprecated ones and even custom implementation. Slideshow vs static <Image>.
  • Variant selector primitiveBoxedRowList/BoxedRow (right) vs custom <button> wrapping <Boxed> (wrong).
  • Chevron rotation LLMs randomly rotated up/down chevrons instead of using right/left ones.

Iterations

We used 3 figma designs in each iteration. With 3 iterations and 2 conditions that makes a total of 18 tests (they were super heavy, and I ran out of tokens twice). All the figma desings can be checked out here (raw vs polished): figma.com/design/a0kSW0iLff0omlQbUoaJQO/IA-x-Design?node-id=93-1039&m=dev.

Iteration 1: MCP doc vs MCP no doc

Original (unpolished) Figma files; no Code Connect. Beta's only material advantage is the new doc/figma-mcp.md rule file shipped in 16.62.0-beta.1.

1.1 — Case 1 (product list)

Stable 16.61.0 Beta 16.62.0-beta.1

Stable 16.61.0 Beta 16.62.0-beta.1
Pagination control ✗ — raw <button> w/ inline style ✗ — same. No Pagination primitive in either version.
Outer sidebar + content 2-col layout ✗ — <div style={{display:"grid"}}> ✗ — same. No primitive for asymmetric 2-col layouts.
Product grid ✗ — raw CSS grid repeat(auto-fill, minmax) ✓ — <Grid columns={3} gap={24}>
Inline-colored text ✗ — <span style={{color}}> ✓ — <Text2 as="span" medium>
Right-align of pagination ✗ — <div style={{marginLeft:"auto"}}> ✓ — Inline space="between" (with empty <span/> spacer)
Sidebar wrapper ✗ — extra <aside> around <FiltersSidebar /> ✓ — no extra wrapper
Prev/next chevrons ✓ — IconChevronLeftRegular / IconChevronRightRegular ✗ — single IconChevronDownRegular rotated via <span style={{transform:rotate(...)}}>
Score 1 / 7 5 / 7
Net. Beta uses Mistica primitives in 5/7 categories vs 1/7 for stable, and lands closer to the Figma visually (correct column count, real price slot, correctly sized stars). Stable's only win is using the proper directional chevrons instead of rotating one. Shared unavoidable escape hatches: page-number <button> (no Pagination).

1.2 — Case 2 (PDP)

Stable 16.61.0 Beta 16.62.0-beta.1

Stable 16.61.0 Beta 16.62.0-beta.1
Asymmetric 2-col layout (gallery + purchase aside) ✗ — <Grid columns={2}> ✓ — <GridLayout template="6+6">
Tabs content switching ✗ — <Tabs> rendered but selectedTab ignored; sections stacked ✓ — selectedTab drives {selectedTab === 0 && <InformacionTab />}
Gallery prev/next chevrons ✗ — absolute-positioned <div>s with chevron icons inside ✓ — <IconButton small Icon={IconChevronLeftRegular}>
Gallery thumbnails ✗ — <div style={{flex:1}}> flex hack wrapping Boxed ✗ — raw <button style={...}> ThumbnailButton
Capacity selector (128 GB / 256 GB / 512 GB) ✗ — raw <button> w/ inline style ✗ — same. No "pill toggle" primitive.
Color/finish swatches ✗ — custom <button> + 5 hex literals ✗ — custom <button> + 5 hex literals
Variant selector (iPhone 13 mini / 14) ✓ — RadioGroup + RadioButton inside Boxed ✓ — same
Companion cards ("Comprados juntos") ✗ — Boxed + Box + Stack + Image (no card primitive) ✗ — same
Star rating in header ✓ — InfoRating value={4.5} withHalfValue ✓ — InfoRating value={4}
Two-image strip in Información section ✗ — <div style={{flex:1}}> × 2 inside Inline n/a — beta puts gallery in another tab
"Envío" / "Recogida en tienda" rows w/ right-aligned action ✗ — labels left-aligned, not stretched ✓ — <Inline space="between"> stretches each row
Score 2 / 11 5 / 11
Net. Beta uses Mistica primitives in 5/11 vs 2/11. Beta adds GridLayout, content-switching Tabs, Inline space="between", probably because figma-mcp.md doc steering primitive selection. Shared errors: color swatches (no primitive, hex literals), companion cards defaulting to Boxed instead of MediaCard on this run. Note beta version screen is smaller because it implemented tab logic.

1.3 — Case 3

Stable 16.61.0 Beta 16.62.0-beta.1

Stable 16.61.0 Beta 16.62.0-beta.1
Tech product card ("Xbox Series S", …) ✗ — <DataCard> with raw <img> as asset ✓ — <MediaCard imageSrc=… mediaAspectRatio="1:1">
Tech card price block ✗ — extra={…} (deprecated; slot= is the replacement) ✓ — slot={<Stack>…<Text4 medium>{price}</Text4>…</Stack>}
Tech card headline tag type ✗ — Tag type="promo" for all four cards including "Novedad" ✓ — Tag type={tag.type} per item (Novedad → active, Exclusivo online → promo)
"Lo mejor en tecnología" row layout ✗ — <Grid columns={4}> (static) ✓ — <Carousel itemsPerPage={{mobile:1, tablet:2, desktop:4}}>
Category circle (icon-on-tinted-circle) ✓ — <Circle size={56} backgroundColor=…> ✗ — inline <div style={{borderRadius:"50%"}}> (skipped Mistica Circle)
Centering category icon ✗ — <Inline alignItems="center"> + <div style={{margin:"0 auto"}}> ✓ — <Align x="center">
Centering value-prop columns ✗ — <div style={{display:"flex", justifyContent:"center"}}> ✓ — <Align x="center"> for icon + each text line
Search pill in MainNavigationBar topSlot ✗ — custom <div style={{...}}> w/ inline icon n/a — beta skipped the wide search pill
Footer social icons ✗ — <span style={{color:"#fff", fontWeight:700, …}}>{letter}</span> ✗ — <div style={{borderRadius:"50%"}}> colored dots. No brand SVGs in either.
Footer Movistar logo color ✗ — color="#fff" literal ✓ — <MovistarLogo> (no override; skin handles inverse)
MainNavigationBar right-side actions ✓ — <NavigationBarActionGroup> + NavigationBarAction ✓ — same, with <Avatar size={32} initials="ES">
Top fibra hero ✓ — <CoverHero backgroundImage=…> ✓ — <CoverHero backgroundImage="/images/hero-fibra.png">
Movistar Plus+ hero band ✓ — <CoverHero> inside ResponsiveLayout variant="negative" ✓ — <CoverHero backgroundImage=…>
Promo cover cards row ✓ — <CoverCard> × 3 in Grid w/ height={456} matching Figma ✗ — <CoverCard> × 3 with no aspect-ratio / height (renders shorter than designed)
Score 7 / 15 10 / 15
Net. Beta uses Mistica primitives in 10/15 vs 7/15. Wins are concentrated in card composition (MediaCard over DataCard+<img>, slot= over deprecated extra=, correct Tag types per item, Carousel over a static Grid, Align for centering, no hardcoded white). Stable's structural advantages are real but localized: it pinned the promo cover cards' height to match Figma and used Circle for category icons — beta missed both.

Iteration 2: MCP doc polished figmas vs MCP no doc polished figma

Same designs as iteration-1, rebuilt for agent compatibility (real designer assets where applicable, explicit placeholder cards where not, tighter Code Connect mappings, cleaner DOM structure).

2.1 — Case 1 (product list)

Stable 16.61.0 Beta 16.62.0-beta.1

Stable 16.61.0 Beta 16.62.0-beta.1
Hero (Pixel 7 + image) ✓ — <Hero> with imageSrc= and headline={<Tag type="info">} ✓ — same with <Tag type="active">
Filter chips row ✓ — <Chip active={…} onPress={…}> per option ✓ — same
Filter sidebar checkboxes ✓ — <Checkbox> ✓ — <Checkbox>
Rating filter (radio + stars) ✓ — <RadioGroup> + <RadioButton> + <InfoRating> ✓ — same
Asymmetric sidebar + grid layout ✗ — <Grid columns={12}> + <GridItem columnSpan={3}> / columnSpan={9} ✗ — <Grid columns={4}> + raw <div style={{gridColumn:"span 3"}}> (worse: skipped GridItem). Neither uses GridLayout.
Filters header + product paging alignment ✗ — both headers crammed onto one row ✗ — same misalignment
Product card primitive ✓ — <MediaCard imageSrc=… mediaAspectRatio="1:1" headline={<Tag>} title=… description=…> ✗ — hand-rolled <div> w/ <img> (agent invented a wrong constraint about MediaCard not exposing the slot)
Product card price hierarchy ✗ — flat description="Por solo:\n64,90 € / iva incl." blob ✓ — PriceSlot with Text2 for labels + Text4 medium digits
Product card tag types ✗ — Tag type="info" for "Novedad" (wrong); promo for "Exclusivo online" (right) ✓ — active for "Novedad" (matches Figma), promo for "Exclusivo online"
Pagination ✗ — custom IconButton + page-number <button> ✗ — same custom approach
Score 5 / 10 6 / 10
Net. Polishing the Figma raises the floor for both versions vs iteration-1. Both used Hero, Chip, Checkbox, RadioGroup+RadioButton+InfoRating without prompting. The version-attributable gap that remains is in typography and tag semantics: beta consistently picks the right Tag type per item and steps the text presets correctly across price blocks; stable used the wrong tag type and flat description=. Notable inversion: beta hand-rolled the product card on this run while stable correctly used MediaCard. Visible defects: wrong filter-header alignment in both versions; beta's rating filter repeats value={3} 3 times, breaking the filter visually.

2.2 — Case 2 (PDP)

Stable 16.61.0 Beta 16.62.0-beta.1

Stable 16.61.0 Beta 16.62.0-beta.1
Hero gallery (main image + thumbnails) ✗ — <Slideshow> correct, but the thumbnail row does not render in the page (broken layout) ✓ — <Slideshow withBullets> with placeholder thumbnails visible and clickable
Información tab image gallery ✗ — static Inline of two <Image>s ✓ — <Carousel> for the image strip
Asymmetric 2-col layout ✗ — <Grid columns={12}> + <GridItem columnSpan={7/5}> ✓ — <GridLayout template="6+6">
Variant selector ✓ — <RadioGroup> + <BoxedRowList> + <BoxedRow> ✓ — same
Capacity selector ✓ — <Chip> ✓ — same
Color/finish swatches ✗ — <div> + 5 hex literals ✗ — <button> + 6 hex literals. No swatch primitive.
Companion cards ✓ — <MediaCard> with deprecated media={<Image>} ✓ — <MediaCard imageSrc=… title=… pretitle=… slot=…> (modern API)
Companion card price block ✗ — flat description="..." blob ✓ — slot={<Stack>…<Text4 medium>{price}</Text4>…</Stack>}
Tabs content switching ✓ — <Tabs> driving conditional content ✓ — same
Star-rating distribution bars ✗ — <ProgressBar> correct primitive but renders broken ✗ — <div> shim, also visually wrong. Neither version renders this section correctly.
Stars wrapper around InfoRating ✗ — extra <div style={{display:"inline-flex"}}> wrapper ✓ — used InfoRating directly
Score 4 / 11 9 / 11
Net. Both versions picked up Chip, BoxedRowList/BoxedRow, Slideshow, MediaCard, content-switching Tabs, perhaps from the cleaner Figma — no library change required. Where versions still differ, the gap is consistent with iteration-1: beta uses the modern slot-style props, GridLayout, Carousel, and imageSrc=; stable uses the deprecated media= / flat-description props, Grid+columnSpan, and a static Inline of images. Two visible defects to flag: stable's hero thumbnail row doesn't render in the page; neither version renders the rating distribution bars correctly.

2.3 — Case 3

Stable 16.61.0 Beta 16.62.0-beta.1

Stable 16.61.0 Beta 16.62.0-beta.1
Top navigation ✓ — <MainNavigationBar logo={<MovistarLogo>}> ✓ — same, with <Avatar size={32} src=…>
Top fibra hero ✓ — <Slideshow> wrapping <CoverHero> ✓ — <CoverHero backgroundImage=…>
Category icons (6 items) ✓ — <Circle size={56}> containing the icon ✓ — <Image src=… circular width={80}>
Promo cover cards (Gamepass / Fibra / Novedades) ✓ — <CoverCard> × 3 ✓ — same
Tech product cards in "Lo mejor en tecnología" ✓ — <MediaCard imageSrc=… mediaAspectRatio="1:1" headline={<Tag>} pretitle=… description=…> ✓ — <MediaCard imageSrc=… headline={<Tag>} slot={<PriceSlot/>}>
Tech card price hierarchy ✗ — flat description=… blob ✓ — slot={<PriceSlot>} with Text4 medium digits
Tech card tag types ✗ — "Novedad" mapped to Tag type="promo" (red, wrong); other tags mapped inconsistently ✓ — "Novedad" → active, "Exclusivo online" → promo, correctly per item
Movistar Plus+ section composition ✗ — bare <CoverHero> with title + button only. The 3 Originals/Series/Deporte CoverCards are not in the file at all ✓ — <CoverHero extra={<Grid columns={3}>{3× <CoverCard>}</Grid>}> matching Figma
Footer Movistar logo on brand bg ✓ — <MovistarLogo size={52} color={…textPrimaryInverse}> ✗ — raw asset PNG (brand-logo.png); rendered slightly cropped
Centering / no raw HTML escape hatches ✓ — zero raw <div>/<span>/<button>/<img> and zero style={{}} blocks in 455 lines ✗ — two centering hacks + inline-styled blocks for category icon container
Score 7 / 10 8 / 10
Net. This is the test where stable looks cleanest at the file level — zero raw HTML / inline styles in 455 lines of App.tsx. Beta still has the structural edge on three categories that map to specific Mistica idioms: slot= for the tech-card price block, correct Tag type per item, and <CoverHero extra={…}> to nest the Movistar Plus+ mini-cards. Stable did not render the three CoverCards at all — there is no CoverCard for Movistar Originals / Mejores series y películas / Todo el deporte anywhere in stable's App.tsx, so a whole row of the design is missing from the page.

Iteration 3: MCP doc polished figma partial code connect vs MCP no doc polished figma partial code connect

Same polished designs as iteration-2. Difference: Figma Code Connect mappings are now enabled, so the MCP returns <CodeConnectSnippet> hints indicating which Mistica components the designer chose.

3.1 — Case 1 (product list)

Stable 16.61.0 Beta 16.62.0-beta.1

Stable 16.61.0 Beta 16.62.0-beta.1
Hero (Pixel 7 + image) ✓ — <Hero> with imageSrc= and headline={<Tag>} ✓ — same
Filter chips row ✓ — <Chip> ✓ — same
Filter sidebar checkboxes ✓ — <Checkbox> ✓ — <Checkbox>
Rating filter (radio + stars) ✓ — <RadioGroup>+<RadioButton>+<InfoRating> ✓ — same
Asymmetric sidebar + grid layout ✓ — <GridLayout template="3+9"> (improvement vs iter-2) ✓ — same GridLayout template="3+9"
Filters header + product paging alignment ✓ — sidebar/grid headers in their own columns (improvement vs iter-2) ✓ — same
Product card primitive ✓ — <MediaCard imageSrc=… mediaAspectRatio="1:1" headline={<Tag>} slot={<PriceSlot/>}> ✓ — same
Product card price slot ✓ — slot= with Text2/Text3 regular/Text2 medium ✓ — same
Product card tag types ✓ — per item: active / promo ✓ — same
Product grid layout (3 cols × 2 rows) ✓ — <Grid columns={3} gap={24}>{products.map(p => <ProductCard/>)}</Grid> ✗ — chunks rows manually into <Stack>{rows.map(row => <Inline>{row.map(p => <div style={{flex:1}}><ProductCard/></div>)}</Inline>)}</Stack> (raw flex-hack wrappers)
Pagination ✗ — custom IconButton + page-number <button> ✗ — same
Score 9 / 11 8 / 11
Net. Code Connect on the polished file pushed both versions up to a similar quality bar — but this run actually tipped slightly in stable's favour because beta went off-script on the product grid layout, hand-chunking rows into <Stack>+<Inline> with <div style={{flex:1}}> wrappers instead of using Grid columns={3}. The version-attributable items beta usually wins (modern card props, slot-style price block, correct tag types per item) all hit ✓ on stable too this run, so the only structural delta left was the grid choice — and beta lost it. Persistent shared miss: no Pagination primitive in either version.

3.2 — Case 2 (PDP)

Stable 16.61.0 Beta 16.62.0-beta.1

Stable 16.61.0 Beta 16.62.0-beta.1
Hero gallery (main image + thumbnails) ✓ — <Slideshow> for the main image, with proper bullets/controls ✗ — hand-rolled: state-driven <Image> + raw <button> thumbnails (regression vs iter-2)
Asymmetric 6+6 layout ✓ — <GridLayout template="6+6"> (improvement vs iter-2) ✓ — same
Variant selector ✗ — custom VariantBoxedRow: raw <button> wrapping <Boxed> + hand-rolled <span> radio (regression vs iter-2) ✓ — <RadioGroup> + <BoxedRowList> + <BoxedRow>
Capacity selector ✓ — <Chip> ✓ — same
Color/finish swatches ✗ — raw <button> + 6 hex literals ✗ — same. No swatch primitive.
Companion cards ("Comprados juntos") ✓ — <MediaCard imageSrc=… mediaAspectRatio="1:1" slot={<PriceSlot/>}> (modern API) ✗ — <MediaCard media={<Image>}> — uses the deprecated media={<Image>} prop instead of imageSrc= (regression vs iter-2)
Companion card price block (slot=) ✓ — slot= (digits use raw <Text size={20}> — should be Text4 medium) ✓ — slot= with Text3 regular for digits
Tabs content switching ✓ — <Tabs> with conditional content ✓ — same
Información-tab image gallery ✓ — <Carousel itemsPerPage={…}> ✓ — <Carousel>
Star-rating distribution bars ✓ — <ProgressBar progressPercent={…}> (renders correctly this iteration) ✓ — <ProgressBar>
Tag type for "Nuevo" header ✓ — Tag type="active" (correct) ✓ — Tag type="active"
Accordion all-sections-open ✓ — <Accordion defaultIndex={[0, 1, 2]}> ✓ — same
Score 10 / 12 10 / 12
Net. Enabling Code Connect on the polished file closed the version-attributable quality gap for this design — both scored 10/12, with non-overlapping defects. Two role reversals: stable correctly used imageSrc= on MediaCard while beta regressed to deprecated media={<Image>}; stable hand-rolled the variant selector while beta correctly used BoxedRowList/BoxedRow; beta hand-rolled the hero gallery while stable used Slideshow correctly. Cost-wise, Code Connect was a net negative for stable (longer reads, more tool calls, one stalled run that had to be re-spawned) and a net positive for beta (fewer tool calls, faster wall-clock).

3.3 — Case 3

Stable 16.61.0 Beta 16.62.0-beta.1

Stable 16.61.0 Beta 16.62.0-beta.1
Top nav (logo + actions) ✗ — <MainNavigationBar> correct, but logo slot is plain text ("Movistar"); no <Avatar> ("Label" placeholder) ✓ — <MainNavigationBar logo={<MovistarLogo>} + <Avatar size={24} initials="L">
Top fibra hero ✓ — <CoverHero backgroundImage=…> ✓ — same
Categories: row layout primitive ✗ — raw <div style={{display:"grid", gridTemplateColumns:"repeat(6, 1fr)"}}> ✗ — <Inline space={24}> without fullWidth, doesn't span page. Should have been Mistica Grid.
Categories: tile container ✗ — raw <div> w/ inline border/padding/radius (no <Boxed>) ✗ — same <div> shape, no <Boxed>
Promo cover cards: row layout ✗ — raw CSS grid ✗ — <Inline> without fullWidth
Promo cover cards: card primitive ✓ — <CoverCard> × 3 ✓ — <CoverCard> × 3
Tech product cards: row layout ✗ — raw CSS grid ✗ — <Inline> without fullWidth
Tech product cards: card primitive ✓ — <MediaCard> ✓ — <MediaCard>
Tech card price block ✗ — deprecated extra={…} (digits do use Text4 medium) ✓ — modern slot={<ProductPriceSlot>} with Text4 medium + Text2
Tech card tag types ✓ — per item: active / promo (improvement vs iter-2) ✓ — same
Movistar Plus+ section composition ✓ — <CoverHero extra={…}> (improvement vs iter-2 where stable dropped the 3 cards) ✓ — same
Movistar Plus+ extra row layout (3 mini cards) ✗ — raw CSS grid ✗ — <Inline> without fullWidth
Value props row ("Un referente global") ✗ — raw <div style={{display:"grid"}}> ✗ — <Inline space={24}> without fullWidth
Value props: text + icon centering ✗ — pure CSS <div style={{textAlign:"center"}}> for section title and per-block. Did not reach for Mistica ✗ — beta tried (used Text textAlign="center" per block) but wrapped icon in <div style={{display:"flex", justifyContent:"center"}}> and titles in <div style={{textAlign:"center"}}>missed <Align x="center"> despite the skill spelling it out
Footer Movistar logo on brand bg ✗ — rendered as plain text "Movistar" ✓ — <MovistarLogo>
Score 5 / 15 8 / 15
Net. The recurring story for iter-3 landing page is layout primitives: every row in the page (categories, promo covers, tech products, Movistar Plus+ extra cards, value props) wants Mistica Grid or GridLayout, and neither version reaches for it on any of those rows. Stable falls back to raw CSS grid every time; beta falls back to <Inline> without fullWidth (closer in spirit, still not the right primitive). The version-attributable wins for beta this run are concentrated in three places: modern slot= (vs stable's deprecated extra=), <MovistarLogo> and <Avatar> in the nav, <MovistarLogo> in the footer. The category-tile container is the most consistent shared miss — both versions hand-roll a <div> with inline border/padding/radius instead of using <Boxed>, exactly the primitive that exists for this. The value-props section is the most explicit failure case: skill explicitly recommends <Align x="center">; beta missed it; stable didn't even attempt and went straight to CSS.

Results

Quality scores at a glance

Score = number of categories where the implementation used the right Mistica primitive instead of html+css (denominator varies by design and iteration as the analysis got finer-grained).

Design Iter-1 stable Iter-1 beta Iter-2 stable Iter-2 beta Iter-3 stable Iter-3 beta
case 1 (product list) 1/7 5/7 5/10 6/10 9/11 8/11
case 2 (PDP) 2/11 5/11 4/11 9/11 10/12 10/12
case 3 7/15 10/15 7/10 8/10 5/15 8/15

Tokens

Iter-1 stable Iter-1 beta Iter-2 stable Iter-2 beta Iter-3 stable Iter-3 beta
case 1 tokens 108k 156k 100k 139k (rate-limited) (rate-limited)
case 2 tokens 114k 133k 147k 166k 190k 181k
case 3 tokens 134k 185k 125k 152k 153k 163k

Duration

Iter-1 stable Iter-1 beta Iter-2 stable Iter-2 beta Iter-3 stable Iter-3 beta
case 1 4m 53s 7m 47s 5m 7s 6m 58s (rate-limited) (rate-limited)
case 2 7m 51s 11m 32s 9m 27s 9m 9s 15m 45s 7m 53s
case 3 7m 36s 8m 22s 7m 7s 10m 20s 10m 25s 9m 43s

Conclussions

  • It is worth spending some time polishing the figma files.
  • Figma specific doc improves implemented code.
  • Code quality is not production grade. Developers must iterate or do manual polishing.
  • Bigger figmas tend too produce less consistent results.
  • We should perform further investigation with code connect.
  • We should perform further investigation having the llm carry out verification steps.
  • Improvements in mistica could have agents do better designs: Perhaps background in boxes, responsiveLayout and so

IMO LLMs don't seem to be ready for these big tasks. The bigger the task is the more it tends to get lost and be inconsistent. Probably due to their predictive nature, randomness strikes negatively when a lot of decisions have to be made. We would probably be better off giving LLMs smaller tasks and having them under surveillance.

ref: WEB-2435

1.2 case is visually worse than the stable version 🤔

@Marcosld
Copy link
Copy Markdown
Contributor Author

@yceballost Some things are worse and some things are better in the stable version. All the images failed downloading from figma tho, that makes it look weird.

@github-actions
Copy link
Copy Markdown

Size stats

master this branch diff
Total JS 16.1 MB 16.1 MB 0 B
JS without icons 2.02 MB 2.02 MB 0 B
Lib overhead 92.5 kB 92.5 kB 0 B
Lib overhead (gzip) 19.9 kB 19.9 kB 0 B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants