This is a similar problem to another issue:#1.
I encountered this issue when solving ProntoQA using pyke. One typical example is the following. The problem is ProntoQA_10:
{
"id": "ProntoQA_10",
"context": "Every impus is earthy. Each impus is a jompus. Jompuses are small. Jompuses are rompuses. Rompuses are not amenable. Rompuses are wumpuses. Wumpuses are wooden. Wumpuses are zumpuses. Every zumpus is temperate. Every zumpus is a dumpus. Dumpuses are dull. Dumpuses are vumpuses. Every vumpus is not shy. Every yumpus is sweet. Vumpuses are numpuses. Numpuses are not sweet. Numpuses are tumpuses. Fae is a wumpus.",
"question": "Is the following statement true or false? Fae is sweet.",
"options": [
"A) True",
"B) False"
],
"answer": "B",
"explanation": [
"Fae is a wumpus.",
"Wumpuses are zumpuses.",
"Fae is a zumpus.",
"Every zumpus is a dumpus.",
"Fae is a dumpus.",
"Dumpuses are vumpuses.",
"Fae is a vumpus.",
"Vumpuses are numpuses.",
"Fae is a numpus.",
"Numpuses are not sweet.",
"Fae is not sweet."
]
}
And the formation from natural language to program is also correct:
fact1
foreach
facts.Impus($x, True)
assert
facts.Earthy($x, True)
fact2
foreach
facts.Impus($x, True)
assert
facts.Jompus($x, True)
fact3
foreach
facts.Jompus($x, True)
assert
facts.Small($x, True)
fact4
foreach
facts.Jompus($x, True)
assert
facts.Rompus($x, True)
fact5
foreach
facts.Rompus($x, True)
assert
facts.Amenable($x, False)
fact6
foreach
facts.Rompus($x, True)
assert
facts.Wumpus($x, True)
fact7
foreach
facts.Wumpus($x, True)
assert
facts.Wooden($x, True)
fact8
foreach
facts.Wumpus($x, True)
assert
facts.Zumpus($x, True)
fact9
foreach
facts.Zumpus($x, True)
assert
facts.Temperate($x, True)
fact10
foreach
facts.Zumpus($x, True)
assert
facts.Dumpus($x, True)
fact11
foreach
facts.Dumpus($x, True)
assert
facts.Dull($x, True)
fact12
foreach
facts.Dumpus($x, True)
assert
facts.Vumpus($x, True)
fact13
foreach
facts.Vumpus($x, True)
assert
facts.Shy($x, False)
fact14
foreach
facts.Yumpus($x, True)
assert
facts.Sweet($x, True)
fact15
foreach
facts.Vumpus($x, True)
assert
facts.Numpus($x, True)
fact16
foreach
facts.Numpus($x, True)
assert
facts.Sweet($x, False)
fact17
foreach
facts.Numpus($x, True)
assert
facts.Tumpus($x, True)
But after giving these to pyke, the output prediction is A rather than the correct answer B.
Are there some problems with pyke? Is pyke giving the correct answer?
This is a similar problem to another issue:#1.
I encountered this issue when solving ProntoQA using pyke. One typical example is the following. The problem is ProntoQA_10:
And the formation from natural language to program is also correct:
But after giving these to pyke, the output prediction is A rather than the correct answer B.
Are there some problems with pyke? Is pyke giving the correct answer?