Skip to content

Pre-parse sentences and save in DB #101

@XanderVertegaal

Description

@XanderVertegaal

Currently, LangPro will both parse and prove each problem it receives. Especially the parsing takes a lot of time, so we stand to gain a lot of time by preparsing every sentence in the corpus and storing the raw Prolog results for the CCG Tree (the other parse trees can be derived very quickly) in the DB.

When the user hits 'Parse and Proof' in the annotator, the backend will first retrieve the parse results from the DB (if they are available) and send them to LangPro, which will simply have to derive the other three parse trees and do the proving.

This requires

  • a new endpoint on LangPro Container (e.g. /api/parse) that takes a problem as its input and returns just the parse output.
  • a new endpoint on LangPro Container or an update to the existing /api/prove route: it should be able to receive parse results and simply run the prover (bypassing the parser). If we implement a new endpoint, the output should have the same format as /api/prove.
  • an asynchronous worker that can be triggered to go over all problems, send them to LangPro one by one to be parsed and then store the results in the DB.
  • DB models for the stored parse results.

For the async worker, we can use a Celery/RabbitMQ setup. If we're comfortable upgrading to Django 6.0, there is also the new 'Tasks' module to consider.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions