Skip to content

How to analyze various "como" constructions in Spanish? More frequently SCONJ or ADP? #1247

@AngledLuffa

Description

@AngledLuffa

How should we analyze various forms of "como"? Identified this as one of the big differences between annotations in AnCora and GSD


  1. comparative:

tan alto como tú

más rápido como antes

in AnCora: usually SCONJ
in GSD: varies between SCONJ and ADP

  1. como VERB:

Hazlo como quieras
Lo hizo como le dijeron

in AnCora: SCONJ
in GSD: SCONJ, also some ADP

  1. examples:

Animales como perros y gatos
Países como España y Francia

AnCora: usually SCONJ
in GSD: often ADP


I have no strong opinions, but it'd be nice to unify them, regardless. I suppose having fewer tags in general would be easier to understand or model, but at the same time, ADP to connect two NPs in the "example" form of como is pretty appealing. Such as in English, I suppose:

1       His     his     PRON    PRP$    Case=Gen|Gender=Masc|Number=Sing|Person=3|Poss=Yes|PronType=Prs 3       nmod:poss       3:nmod:poss     _
2       military        military        ADJ     JJ      Degree=Pos      3       amod    3:amod  _
3       intelligence    intelligence    NOUN    NN      Number=Sing     5       nsubj   5:nsubj _
4       has     have    AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   5       aux     5:aux   _
5       captured        capture VERB    VBN     Tense=Past|VerbForm=Part        0       root    0:root  _
6       major   major   ADJ     JJ      Degree=Pos      7       amod    7:amod  _
7       figures figure  NOUN    NNS     Number=Plur     5       obj     5:obj   _
8       like    like    ADP     IN      _       9       case    9:case  _     <------
9       Abu     Abu     PROPN   NNP     Number=Sing     7       nmod    7:nmod:like     _
10      Zubayda Zubayda PROPN   NNP     Number=Sing     9       flat    9:flat  _
11      and     and     CCONJ   CC      _       12      cc      12:cc   _
12      Khalid  Khalid  PROPN   NNP     Number=Sing     9       conj    7:nmod:like|9:conj:and  _
13      Shaykh  Shaykh  PROPN   NNP     Number=Sing     12      flat    12:flat _
14      Muhammad        Muhammad        PROPN   NNP     Number=Sing     12      flat    12:flat SpaceAfter=No

So one proposal would be to change "thing como thing" to case/ADP like in GSD and in English, and change the others to SCONJ like in AnCora. Happy to hear other suggestions, though

AnCora example of como for examples:

12      la      el      DET     da0fs0  Definite=Def|Gender=Fem|Number=Sing|PronType=Art        13      det     13:det  Entity=(NOCOREF:Gen--4-gstype:gen,HomoDD
13      mayoría mayoría NOUN    ncfs000 Gender=Fem|Number=Sing  15      nmod    15:nmod _
14      de      de      ADP     sps00   _       13      case    13:case _
15      ellas   él      PRON    pp3fp000        Case=Acc,Nom|Gender=Fem|Number=Plur|Person=3|PronType=Prs       16      nsubj   16:nsubj        ArgTem=arg1:tem|Entity=NOCOREF:Gen)
16      carece  carecer VERB    vmip3s0 Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   6       conj    6:conj  _
17      de      de      ADP     sps00   _       18      case    18:case _
18      servicios       servicio        NOUN    ncmp000 Gender=Masc|Number=Plur 16      obl:arg 16:obl:arg      ArgTem=arg2:atr|Entity=(NOCOREF:Gen--1-gstype:gen
19      básicos básico  ADJ     aq0mp0  Gender=Masc|Number=Plur 18      amod    18:amod SpaceAfter=No
20      ,       ,       PUNCT   fc      PunctType=Comm  21      punct   21:punct        _
21      tales   tal     PRON    pd0cp000        Number=Plur|PronType=Dem        18      appos   18:appos        _
22      como    como    SCONJ   cs      _       23      mark    23:mark _
23      luz     luz     NOUN    ncfs000 Gender=Fem|Number=Sing  21      nmod    21:nmod SpaceAfter=No
24      ,       ,       PUNCT   fc      PunctType=Comm  25      punct   25:punct        _
25      agua    agua    NOUN    ncfs000 Gender=Fem|Number=Sing  23      conj    23:conj _
26      potable potable ADJ     aq0cs0  Number=Sing     25      amod    25:amod _
27      o       o       CCONJ   cc      _       28      cc      28:cc   _
28      asistencia      asistencia      NOUN    ncfs000 Gender=Fem|Number=Sing  23      conj    23:conj _
29      hospitalaria    hospitalario    ADJ     aq0fs0  Gender=Fem|Number=Sing  28      amod    28:amod Entity=NOCOREF:Gen)|SpaceAfter=No

(although that's possibly different given the tales)

or

6       los     el      DET     da0mp0  Definite=Def|Gender=Masc|Number=Plur|PronType=Art       8       det     8:det   Entity=(NOCOREF:Gen--3-gstype:gen,HomoDD
7       principales     principal       ADJ     aq0cp0  Number=Plur     8       amod    8:amod  _
8       ideólogos       ideólogo        NOUN    ncmp000 Gender=Masc|Number=Plur 4       nmod    4:nmod  ArgTem=arg0:agt
9       de      de      ADP     sps00   _       11      case    11:case _
10      la      el      DET     da0fs0  Definite=Def|Gender=Fem|Number=Sing|PronType=Art        11      det     11:det  Entity=(CESSCASTAA200007085671c12--2-CorefType:ident,gstype:gen
11      independencia   independencia   NOUN    ncfs000 Gender=Fem|Number=Sing  8       nmod    8:nmod  Entity=CESSCASTAA200007085671c12)|SpaceAfter=No
12      ,       ,       PUNCT   fc      PunctType=Comm  14      punct   14:punct        _
13      como    como    SCONJ   cs      _       14      mark    14:mark _
14      Toussaint       Toussaint       PROPN   np00000 _       8       nmod    8:nmod  MWE=Toussaint_Louverture|MWEPOS=PROPN|Entity=(NOCOREF:Spec.person-person-1-gstype:spec
15      Louverture      Louverture      PROPN   _       _       14      flat    14:flat Entity=NOCOREF:Spec.person)|SpaceAfter=No

GSD examples, note the different annotation on tales as well:

# sent_id = es-train-002-s285
# text = Son muchos los platos que contienen especias tales como la canela, pimentón, menta, azafrán, la pimienta negra, el comino etc.
1       Son     ser     AUX     _       Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin   4       cop     _       _
2       muchos  mucho   PRON    _       Gender=Masc|Number=Plur|NumType=Card|PronType=Ind       4       nsubj   _       _
3       los     el      DET     _       Definite=Def|Gender=Masc|Number=Plur|PronType=Art       4       det     _       _
4       platos  plato   NOUN    _       Gender=Masc|Number=Plur 0       root    _       _
5       que     que     SCONJ   _       _       6       mark    _       _
6       contienen       contener        VERB    _       Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin   4       acl:relcl       _       _
7       especias        especia NOUN    _       Gender=Fem|Number=Plur  6       obj     _       _
8       tales   tal     ADJ     _       Number=Plur     7       amod    _       _
9       como    como    ADP     _       _       11      case    _       _
10      la      el      DET     _       Definite=Def|Gender=Fem|Number=Sing|PronType=Art        11      det     _       _
11      canela  canela  NOUN    _       Gender=Fem|Number=Sing  8       nmod    _       SpaceAfter=No

example w/o tales:

# sent_id = es-train-002-s298
# text = Sin embargo, no todos comparten este punto de vista y ha grabado con renombrados artistas como Luciano Pavarotti y Ruggiero Ricci.
12      ha      haber   AUX     _       Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   13      aux     _       _
13      grabado grabar  VERB    _       Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part        6       conj    _       _
14      con     con     ADP     _       _       16      case    _       _
15      renombrados     renombrado      ADJ     _       Gender=Masc|Number=Plur|VerbForm=Part   16      amod    _       _
16      artistas        artista NOUN    _       Number=Plur     13      obl     _       _
17      como    como    ADP     _       _       18      case    _       _
18      Luciano luciano PROPN   _       Gender=Masc|Number=Sing 16      nmod    _       _
19      Pavarotti       pavarotti       PROPN   _       _       18      flat    _       _
20      y       y       CCONJ   _       _       21      cc      _       _
21      Ruggiero        ruggiero        PROPN   _       _       18      conj    _       _
22      Ricci   ricci   PROPN   _       _       21      flat    _       SpaceAfter=No

in this usage, in AnCora, it is not even analyzed in a particularly useful manner:

20      tanto   tanto   PRON    _       Gender=Masc|Number=Sing|NumType=Card|PronType=Dem       23      nmod    _       _
21      en      en      ADP     _       _       23      case    _       _
22      el      el      DET     _       Definite=Def|Gender=Masc|Number=Sing|PronType=Art       23      det     _       _
23      ámbito  ámbito  NOUN    _       Gender=Masc|Number=Sing 17      obl     _       _
24      de      de      ADP     _       _       26      case    _       _
25      la      el      DET     _       Definite=Def|Gender=Fem|Number=Sing|PronType=Art        26      det     _       _
26      osmosis osmosis NOUN    _       Gender=Fem      23      nmod    _       _
27      inversa inverso ADJ     _       Gender=Fem|Number=Sing  26      amod    _       _
28      como    como    CCONJ   _       _       20      dep     _       _
29      de      de      ADP     _       _       31      case    _       _
30      la      el      DET     _       Definite=Def|Gender=Fem|Number=Sing|PronType=Art        31      det     _       _
31      descalcificación        descalcificación        NOUN    _       Gender=Fem|Number=Sing  28      nmod    _       SpaceAfter=No

here maybe it's supposed to be SCONJ/mark instead of CCONJ/dep? well, the CCONJ is pointed at a completely different word anyway

... further analyzing tanto X como Y, it seems GSD consistently does use CCONJ, such as

# sent_id = es-train-001-s338
# text = Suelo comprar sus productos y son exquisitos, tanto su carne como sus embutidos.
9       tanto   tanto   PRON    _       Gender=Masc|Number=Sing|NumType=Card|PronType=Dem       11      nmod    _       _
10      su      su      DET     _       Number=Sing|Person=3|Poss=Yes|PronType=Prs      11      det     _       _
11      carne   carne   NOUN    _       Gender=Fem|Number=Sing  2       parataxis       _       _
12      como    como    CCONJ   _       _       14      case    _       _
13      sus     su      DET     _       Number=Plur|Person=3|Poss=Yes|PronType=Prs      14      det     _       _
14      embutidos       embutido        NOUN    _       Gender=Masc|Number=Plur 9       nmod    _       SpaceAfter=No

whereas AnCora usually uses SCONJ, and the CCONJ example I found earlier was a weird outlier

# sent_id = 3LB-CAST-211_C-1-s9
# text = En los casi tres años del gobierno actual, el Ejecutivo y la oposición habían mantenido una tensa discrepancia sobre los asuntos de Estado, hasta situaciones de enfrentamiento verbal tanto en el Parlamento como en los medios de comunicación social.
32      tanto   tanto   ADV     rg      _       35      cc      35:cc   _
33      en      en      ADP     sps00   _       35      case    35:case _
34      el      el      DET     da0ms0  Definite=Def|Gender=Masc|Number=Sing|PronType=Art       35      det     35:det  Entity=(NOCOREF:Spec.organization-organization-2-gstype:spec
35      Parlamento      Parlamento      PROPN   np0000o _       28      nmod    28:nmod ArgTem=argM:loc|Entity=NOCOREF:Spec.organization)
36      como    como    SCONJ   cs      _       39      cc      39:cc   _
37      en      en      ADP     sps00   _       39      case    39:case _
38      los     el      DET     da0mp0  Definite=Def|Gender=Masc|Number=Plur|PronType=Art       39      det     39:det  _
39      medios  medio   NOUN    ncmp000 Gender=Masc|Number=Plur 35      conj    35:conj _
40      de      de      ADP     sps00   _       41      case    41:case _
41      comunicación    comunicación    NOUN    ncfs000 Gender=Fem|Number=Sing  39      nmod    39:nmod _
42      social  social  ADJ     aq0cs0  Number=Sing     41      amod    41:amod SpaceAfter=No

so I guess just throw that on the pile of different uses of como which are analyzed differently in the two treebanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions