How should we analyze various forms of "como"? Identified this as one of the big differences between annotations in AnCora and GSD
- comparative:
tan alto como tú
más rápido como antes
in AnCora: usually SCONJ
in GSD: varies between SCONJ and ADP
- como VERB:
Hazlo como quieras
Lo hizo como le dijeron
in AnCora: SCONJ
in GSD: SCONJ, also some ADP
- examples:
Animales como perros y gatos
Países como España y Francia
AnCora: usually SCONJ
in GSD: often ADP
I have no strong opinions, but it'd be nice to unify them, regardless. I suppose having fewer tags in general would be easier to understand or model, but at the same time, ADP to connect two NPs in the "example" form of como is pretty appealing. Such as in English, I suppose:
1 His his PRON PRP$ Case=Gen|Gender=Masc|Number=Sing|Person=3|Poss=Yes|PronType=Prs 3 nmod:poss 3:nmod:poss _
2 military military ADJ JJ Degree=Pos 3 amod 3:amod _
3 intelligence intelligence NOUN NN Number=Sing 5 nsubj 5:nsubj _
4 has have AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 5 aux 5:aux _
5 captured capture VERB VBN Tense=Past|VerbForm=Part 0 root 0:root _
6 major major ADJ JJ Degree=Pos 7 amod 7:amod _
7 figures figure NOUN NNS Number=Plur 5 obj 5:obj _
8 like like ADP IN _ 9 case 9:case _ <------
9 Abu Abu PROPN NNP Number=Sing 7 nmod 7:nmod:like _
10 Zubayda Zubayda PROPN NNP Number=Sing 9 flat 9:flat _
11 and and CCONJ CC _ 12 cc 12:cc _
12 Khalid Khalid PROPN NNP Number=Sing 9 conj 7:nmod:like|9:conj:and _
13 Shaykh Shaykh PROPN NNP Number=Sing 12 flat 12:flat _
14 Muhammad Muhammad PROPN NNP Number=Sing 12 flat 12:flat SpaceAfter=No
So one proposal would be to change "thing como thing" to case/ADP like in GSD and in English, and change the others to SCONJ like in AnCora. Happy to hear other suggestions, though
AnCora example of como for examples:
12 la el DET da0fs0 Definite=Def|Gender=Fem|Number=Sing|PronType=Art 13 det 13:det Entity=(NOCOREF:Gen--4-gstype:gen,HomoDD
13 mayoría mayoría NOUN ncfs000 Gender=Fem|Number=Sing 15 nmod 15:nmod _
14 de de ADP sps00 _ 13 case 13:case _
15 ellas él PRON pp3fp000 Case=Acc,Nom|Gender=Fem|Number=Plur|Person=3|PronType=Prs 16 nsubj 16:nsubj ArgTem=arg1:tem|Entity=NOCOREF:Gen)
16 carece carecer VERB vmip3s0 Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 6 conj 6:conj _
17 de de ADP sps00 _ 18 case 18:case _
18 servicios servicio NOUN ncmp000 Gender=Masc|Number=Plur 16 obl:arg 16:obl:arg ArgTem=arg2:atr|Entity=(NOCOREF:Gen--1-gstype:gen
19 básicos básico ADJ aq0mp0 Gender=Masc|Number=Plur 18 amod 18:amod SpaceAfter=No
20 , , PUNCT fc PunctType=Comm 21 punct 21:punct _
21 tales tal PRON pd0cp000 Number=Plur|PronType=Dem 18 appos 18:appos _
22 como como SCONJ cs _ 23 mark 23:mark _
23 luz luz NOUN ncfs000 Gender=Fem|Number=Sing 21 nmod 21:nmod SpaceAfter=No
24 , , PUNCT fc PunctType=Comm 25 punct 25:punct _
25 agua agua NOUN ncfs000 Gender=Fem|Number=Sing 23 conj 23:conj _
26 potable potable ADJ aq0cs0 Number=Sing 25 amod 25:amod _
27 o o CCONJ cc _ 28 cc 28:cc _
28 asistencia asistencia NOUN ncfs000 Gender=Fem|Number=Sing 23 conj 23:conj _
29 hospitalaria hospitalario ADJ aq0fs0 Gender=Fem|Number=Sing 28 amod 28:amod Entity=NOCOREF:Gen)|SpaceAfter=No
(although that's possibly different given the tales)
or
6 los el DET da0mp0 Definite=Def|Gender=Masc|Number=Plur|PronType=Art 8 det 8:det Entity=(NOCOREF:Gen--3-gstype:gen,HomoDD
7 principales principal ADJ aq0cp0 Number=Plur 8 amod 8:amod _
8 ideólogos ideólogo NOUN ncmp000 Gender=Masc|Number=Plur 4 nmod 4:nmod ArgTem=arg0:agt
9 de de ADP sps00 _ 11 case 11:case _
10 la el DET da0fs0 Definite=Def|Gender=Fem|Number=Sing|PronType=Art 11 det 11:det Entity=(CESSCASTAA200007085671c12--2-CorefType:ident,gstype:gen
11 independencia independencia NOUN ncfs000 Gender=Fem|Number=Sing 8 nmod 8:nmod Entity=CESSCASTAA200007085671c12)|SpaceAfter=No
12 , , PUNCT fc PunctType=Comm 14 punct 14:punct _
13 como como SCONJ cs _ 14 mark 14:mark _
14 Toussaint Toussaint PROPN np00000 _ 8 nmod 8:nmod MWE=Toussaint_Louverture|MWEPOS=PROPN|Entity=(NOCOREF:Spec.person-person-1-gstype:spec
15 Louverture Louverture PROPN _ _ 14 flat 14:flat Entity=NOCOREF:Spec.person)|SpaceAfter=No
GSD examples, note the different annotation on tales as well:
# sent_id = es-train-002-s285
# text = Son muchos los platos que contienen especias tales como la canela, pimentón, menta, azafrán, la pimienta negra, el comino etc.
1 Son ser AUX _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 4 cop _ _
2 muchos mucho PRON _ Gender=Masc|Number=Plur|NumType=Card|PronType=Ind 4 nsubj _ _
3 los el DET _ Definite=Def|Gender=Masc|Number=Plur|PronType=Art 4 det _ _
4 platos plato NOUN _ Gender=Masc|Number=Plur 0 root _ _
5 que que SCONJ _ _ 6 mark _ _
6 contienen contener VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 4 acl:relcl _ _
7 especias especia NOUN _ Gender=Fem|Number=Plur 6 obj _ _
8 tales tal ADJ _ Number=Plur 7 amod _ _
9 como como ADP _ _ 11 case _ _
10 la el DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 11 det _ _
11 canela canela NOUN _ Gender=Fem|Number=Sing 8 nmod _ SpaceAfter=No
example w/o tales:
# sent_id = es-train-002-s298
# text = Sin embargo, no todos comparten este punto de vista y ha grabado con renombrados artistas como Luciano Pavarotti y Ruggiero Ricci.
12 ha haber AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 13 aux _ _
13 grabado grabar VERB _ Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 6 conj _ _
14 con con ADP _ _ 16 case _ _
15 renombrados renombrado ADJ _ Gender=Masc|Number=Plur|VerbForm=Part 16 amod _ _
16 artistas artista NOUN _ Number=Plur 13 obl _ _
17 como como ADP _ _ 18 case _ _
18 Luciano luciano PROPN _ Gender=Masc|Number=Sing 16 nmod _ _
19 Pavarotti pavarotti PROPN _ _ 18 flat _ _
20 y y CCONJ _ _ 21 cc _ _
21 Ruggiero ruggiero PROPN _ _ 18 conj _ _
22 Ricci ricci PROPN _ _ 21 flat _ SpaceAfter=No
in this usage, in AnCora, it is not even analyzed in a particularly useful manner:
20 tanto tanto PRON _ Gender=Masc|Number=Sing|NumType=Card|PronType=Dem 23 nmod _ _
21 en en ADP _ _ 23 case _ _
22 el el DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 23 det _ _
23 ámbito ámbito NOUN _ Gender=Masc|Number=Sing 17 obl _ _
24 de de ADP _ _ 26 case _ _
25 la el DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 26 det _ _
26 osmosis osmosis NOUN _ Gender=Fem 23 nmod _ _
27 inversa inverso ADJ _ Gender=Fem|Number=Sing 26 amod _ _
28 como como CCONJ _ _ 20 dep _ _
29 de de ADP _ _ 31 case _ _
30 la el DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 31 det _ _
31 descalcificación descalcificación NOUN _ Gender=Fem|Number=Sing 28 nmod _ SpaceAfter=No
here maybe it's supposed to be SCONJ/mark instead of CCONJ/dep? well, the CCONJ is pointed at a completely different word anyway
... further analyzing tanto X como Y, it seems GSD consistently does use CCONJ, such as
# sent_id = es-train-001-s338
# text = Suelo comprar sus productos y son exquisitos, tanto su carne como sus embutidos.
9 tanto tanto PRON _ Gender=Masc|Number=Sing|NumType=Card|PronType=Dem 11 nmod _ _
10 su su DET _ Number=Sing|Person=3|Poss=Yes|PronType=Prs 11 det _ _
11 carne carne NOUN _ Gender=Fem|Number=Sing 2 parataxis _ _
12 como como CCONJ _ _ 14 case _ _
13 sus su DET _ Number=Plur|Person=3|Poss=Yes|PronType=Prs 14 det _ _
14 embutidos embutido NOUN _ Gender=Masc|Number=Plur 9 nmod _ SpaceAfter=No
whereas AnCora usually uses SCONJ, and the CCONJ example I found earlier was a weird outlier
# sent_id = 3LB-CAST-211_C-1-s9
# text = En los casi tres años del gobierno actual, el Ejecutivo y la oposición habían mantenido una tensa discrepancia sobre los asuntos de Estado, hasta situaciones de enfrentamiento verbal tanto en el Parlamento como en los medios de comunicación social.
32 tanto tanto ADV rg _ 35 cc 35:cc _
33 en en ADP sps00 _ 35 case 35:case _
34 el el DET da0ms0 Definite=Def|Gender=Masc|Number=Sing|PronType=Art 35 det 35:det Entity=(NOCOREF:Spec.organization-organization-2-gstype:spec
35 Parlamento Parlamento PROPN np0000o _ 28 nmod 28:nmod ArgTem=argM:loc|Entity=NOCOREF:Spec.organization)
36 como como SCONJ cs _ 39 cc 39:cc _
37 en en ADP sps00 _ 39 case 39:case _
38 los el DET da0mp0 Definite=Def|Gender=Masc|Number=Plur|PronType=Art 39 det 39:det _
39 medios medio NOUN ncmp000 Gender=Masc|Number=Plur 35 conj 35:conj _
40 de de ADP sps00 _ 41 case 41:case _
41 comunicación comunicación NOUN ncfs000 Gender=Fem|Number=Sing 39 nmod 39:nmod _
42 social social ADJ aq0cs0 Number=Sing 41 amod 41:amod SpaceAfter=No
so I guess just throw that on the pile of different uses of como which are analyzed differently in the two treebanks
How should we analyze various forms of "como"? Identified this as one of the big differences between annotations in AnCora and GSD
tan alto como tú
más rápido como antes
in AnCora: usually SCONJ
in GSD: varies between SCONJ and ADP
Hazlo como quieras
Lo hizo como le dijeron
in AnCora: SCONJ
in GSD: SCONJ, also some ADP
Animales como perros y gatos
Países como España y Francia
AnCora: usually SCONJ
in GSD: often ADP
I have no strong opinions, but it'd be nice to unify them, regardless. I suppose having fewer tags in general would be easier to understand or model, but at the same time, ADP to connect two NPs in the "example" form of
comois pretty appealing. Such as in English, I suppose:So one proposal would be to change "thing como thing" to case/ADP like in GSD and in English, and change the others to SCONJ like in AnCora. Happy to hear other suggestions, though
AnCora example of como for examples:
(although that's possibly different given the
tales)or
GSD examples, note the different annotation on
talesas well:example w/o
tales:in this usage, in AnCora, it is not even analyzed in a particularly useful manner:
here maybe it's supposed to be SCONJ/mark instead of CCONJ/dep? well, the CCONJ is pointed at a completely different word anyway
... further analyzing
tanto X como Y, it seems GSD consistently does use CCONJ, such aswhereas AnCora usually uses SCONJ, and the CCONJ example I found earlier was a weird outlier
so I guess just throw that on the pile of different uses of
comowhich are analyzed differently in the two treebanks