Skip to content

Add onboarding_report.csv generation#1097

Closed
TaperChipmunk32 wants to merge 2 commits into
masterfrom
onboarding-report
Closed

Add onboarding_report.csv generation#1097
TaperChipmunk32 wants to merge 2 commits into
masterfrom
onboarding-report

Conversation

@TaperChipmunk32

@TaperChipmunk32 TaperChipmunk32 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

This PR adds a onboarding_report.csv file that is generated at the end of an Onboarding Request with basic information on each of the projects all compiled in one file, rather than being spread across the command line output and various other output files.

  • Added a OnboardingReport class that is used by each OnboardingProject to store and format the information needed for the csv file
  • Added the generate_report and create_report functions to OnboardingProject and OnboardingRequest respectively to create the csv file using the values from the OnboardingReportfor each project in the Request
  • Added additional return values to extract_corpora and check_versification to pass along detected_versification, versification_error_count, and terms_countto theOnboardingProject` for the Onboarding Report.

This PR also addresses two small bugs:

  • Added error catching when the main project fails to upload/Settings file is invalid
  • Correcting a typo from corpus_stats.csv to corpus-stats.csv

This change is Reviewable

@TaperChipmunk32 TaperChipmunk32 marked this pull request as draft June 19, 2026 16:24
@TaperChipmunk32

Copy link
Copy Markdown
Collaborator Author

Switching this to a draft to wait for the next machine release to add NormalizationForm and Language to the report.

@TaperChipmunk32 TaperChipmunk32 marked this pull request as ready for review June 24, 2026 13:15
@TaperChipmunk32 TaperChipmunk32 marked this pull request as draft June 24, 2026 13:34

@benjaminking benjaminking left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took so long to get to this.

@benjaminking reviewed 3 files and all commit messages, and made 5 comments.
Reviewable status: all files reviewed, 4 unresolved discussions.


silnlp/common/extract_corpora.py line 79 at r2 (raw file):

    versification_error_output_path: Optional[str] = None,
    environment: SilNlpEnv = SilNlpEnv.create_standard_environment(),
) -> Tuple[Path | None, List[int], int, int]:

When you have a return type that gets this complex, it's usually good to think about creating a class to represent the return value. The class gives you some semantics about what the different parts mean, which helps to reduce programming errors. For example, it would be really easy to accidentally mix up the terms count and the versification error count, but less easy with a class.


silnlp/common/extract_corpora.py line 96 at r2 (raw file):

                LOGGER.info(f"Identified parent project {parent_project_dir.name}.")

            matching, detected_versification, versification_error_count = check_versification(

It would similarly be good to have a class the represents the output of check_versification.


silnlp/common/onboard_project.py line 431 at r2 (raw file):

    def generate_report(self) -> dict:
        self.report.name_on_bucket = self.project_name

Are you familiar with the "Builder" pattern? It's a software design pattern used for creating a complex object (like OnboardingReport), and I think it would be a good fit for what you're doing here. It would let you encapsulate the logic for building the report inside of the OnboardingReport class. Accessing class members outside of the class is always a risk for bugs as the software evolves.


silnlp/common/onboard_project.py line 493 at r2 (raw file):

        self.main_project: OnboardingProject = OnboardingProject(
            project_name=main_project_name,
            project_type="Request Project",

The project type might be a good use case for an enum, since there's a small set of possible values it could take.

@TaperChipmunk32

Copy link
Copy Markdown
Collaborator Author

@benjaminking I forgot to mention this, but PR is a newer version of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants