This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
EbrcModelCommon contains shared data type definitions and structured searches for VEuPathDB (Vector and Eukaryotic Pathogen Database) sites. It provides common infrastructure for representing datasets, managing user datasets, and injecting data into WDK (Workflow Development Kit) model XML files.
This project is specialized by:
This is a multi-module Maven project with Ant build support:
# Build all modules
mvn clean install
# Build specific module
cd DatasetPresenter && mvn clean install
cd Model && mvn clean install# Requires environment variables: GUS_HOME, PROJECT_HOME
# Requires WEBAPP_PROP_FILE with property: webappTargetDir=<path>
# Build entire project
bld EbrcModelCommon
# Note: Changes to WDK model XML files require Tomcat restart-
DatasetPresenter - Core dataset presentation framework
- Location:
DatasetPresenter/src/main/java/org/apidb/apicommon/datasetPresenter/ - Parses dataset XML configurations and manages dataset metadata
- Provides
DatasetPresenterclass: holds dataset properties, contacts, publications, history - Provides
DatasetInjectorabstract class: injects datasets into WDK presentation layer via templates
- Location:
-
Model - Dataset-specific implementations and WDK definitions
- Location:
Model/src/main/java/org/apidb/apicommon/model/ - Contains concrete
DatasetInjectorsubclasses for specific dataset types (GeneOntology, UniProt, InterPro, etc.) - Contains
UserDatasetTypeHandlerimplementations for user-uploaded datasets (ISASimpleTypeHandler, BiomTypeHandler) - WDK model XML files:
Model/lib/wdk/ - Dataset template files:
Model/lib/dst/
- Location:
-
Model/bin/- Scripts for dataset processing and tuning manager operations:propertiesFromDatasets- Extract dataset propertiesjbrowseFromDatasets- Generate JBrowse configurationsbuildDatasetPresentersTT- Build tuning tables for dataset presentersupdateResourcesWithPubmed- Update dataset references with PubMed data
-
Model/lib/wdk/- WDK model XML definitionsmodel/records/- Record class definitions (datasets, datasources, user datasets)model/questions/- Search definitions and queriesontology/- Category ontologies for organizing searches
-
Model/lib/xml/- Configuration filesdatasetPresenters/global.xml- Global dataset presenter definitionstuningManager/*.xml- Database tuning table configurationsdatasetClass/classes.xml- Dataset class API definitions
-
Model/lib/dst/- Dataset template files (DST format)- Templates use
${propertyName}syntax for variable substitution - Templates are injected into WDK XML anchor files at build time
- Templates use
WDK model XML files in Model/lib/wdk/ are middleware configuration that define how the WDK (Workflow Development Kit) converts RDBMS data into objects. These XML files act as an ORM-like layer, mapping database queries to object representations.
The schema is defined at: https://github.com/VEuPathDB/WDK/blob/master/Model/lib/rng/wdkModel.rng
The schema defines the root <wdkModel> element which can contain:
<modelName>- Model identification (displayName, version, releaseDate, buildNumber)<import>- Import external model XML files<paramSet>- Parameter definitions for searches:stringParam,numberParam,dateParam,enumParam,filterParam,datasetParam- Parameters can have validation rules and display options (select, checkbox, treeBox, typeAhead)
<querySet>- SQL and process queries that retrieve data from RDBMS:- Query types: attribute, table, ID, transform, utility, summary, vocabulary
- Support for complex SQL with multiple columns and nested queries
- Define the SQL-to-object mapping for data retrieval
<recordClassSet>- Object definitions representing data entities (genes, datasets, etc.):- Attributes - Define 1-to-1 properties (single values per record, mapped from query columns)
- Tables - Model tabular data (1-to-many relationships, collections of related data)
- Also includes views and primary key definitions
- Defines the structure of objects created from database data
- Specifies how data is displayed and accessed in the application
<questionSet>- User-facing search interfaces:- Links parameters to queries to create complete search workflows
- Defines search categories and display properties
- Represents the user's entry point to query the RDBMS through objects
- Platform-specific configuration - Uses
includeProjects/excludeProjectsattributes for site-specific features
The WDK XML configuration defines a complete middleware layer:
- Queries (
<querySet>) - SQL statements that retrieve data from the RDBMS - Records (
<recordClassSet>) - Object definitions that structure the query results - Questions (
<questionSet>) - User interfaces that execute queries and return record objects - Parameters (
<paramSet>) - Input values that filter and customize queries
Example flow: User selects parameters → Question executes query with parameters → Query retrieves RDBMS data → Records structure data as objects → Application displays objects to user
When modifying WDK XML files, ensure they conform to this schema structure. The root element should be <wdkModel> and typically imports are used to organize related definitions.
- DatasetPresenter (XML) → defines dataset metadata, display properties, contacts, publications
- DatasetInjector (Java) → declares required properties, adds WDK model references, injects templates
- Templates (DST files) → text fragments with property placeholders injected into WDK XML
- Template Instances → tuple of (property values, template) created by injector
Key method flow in DatasetInjector subclasses:
getPropertiesDeclaration()- Declare required propertiesaddModelReferences()- Add WDK references (questions, tables, etc.) viaaddWdkReference()injectTemplates()- Inject template instances viainjectTemplate(templateName)
Example: GeneOntology.java adds references to GO search questions and tables.
Located in Model/src/main/java/org/apidb/apicommon/model/userdataset/
User dataset handlers process user-uploaded data:
ISASimpleTypeHandler- ISA (Investigation/Study/Assay) format datasetsBiomTypeHandler- BIOM format microbiome data
Each handler implements:
getCompatibility()- Validate user datasetgetInstallInAppDbCommand()- Generate installation commands for app database
A new <recordClass> named userDatasetRecord is being created to model the VDI Control Schema, which tracks information about user-uploaded datasets. The database schema is defined at:
https://github.com/VEuPathDB/VdiSchema/blob/main/Main/lib/sql/Postgres/createVdiControlTables.sql
VDI Schema Overview:
The VDI schema uses PostgreSQL tables with parameterized naming (VDI_CONTROL_:VAR1 prefix). Key table groups:
-
Core Dataset Tables:
dataset- Primary metadata (dataset_id, owner, type_name, type_version, category, is_public, accessibility, deleted_status, creation_date)dataset_meta- Descriptive information (name, summary, description, program_name, project_name, attribution)sync_control- Synchronization timestamps for shares, data, and metadata updates
-
Installation Tracking:
dataset_install_message- Installation status per install_type (status, message, updated timestamp)dataset_install_activity- Install process heartbeats to track active installs and detect interruptions
-
Access Control:
dataset_visibility- Maps dataset_id to user_id for owners and accepted share offersdataset_project- Associates datasets with project_ids
-
Rich Metadata Tables:
dataset_publication- External IDs (PubMed, DOI), citations, is_primary flagdataset_contact- Author/contact information (is_primary, name, email, affiliation, country)dataset_organism- Experimental or host organisms (species, strain)dataset_dependency- Dependencies with identifiers and versionsdataset_hyperlink- Related URLs with descriptionsdataset_funding_award- Funding agency and award numbersdataset_characteristics- Study design, participant ages, sample years- Categorical tables:
dataset_country,dataset_species,dataset_disease,dataset_sample_type - External identifiers:
dataset_doi,dataset_bioproject_id,dataset_link
-
Convenience View:
AvailableUserDatasets- Shows datasets visible to a user that are fully installed and not deleted
Dataset Lifecycle:
deleted_status: 0 = Active, 1 = Deleted and Uninstalled, 2 = Deleted but not yet Uninstalledaccessibility: 'public', 'protected', 'private'- Installation tracked through heartbeat mechanism in
dataset_install_activity
Mapping VDI Schema to WDK Record Class:
The userDatasetRecord will model the VDI schema using:
-
Attributes (1-to-1 properties): Data from
datasetanddataset_metatables will become record attributes since each dataset has exactly one set of core metadata (e.g., dataset_id, owner, type_name, name, summary, description, is_public, accessibility, creation_date). -
Tables (1-to-many relationships): The supporting metadata tables will become WDK tables since a dataset can have multiple of each:
- Publications (
dataset_publication) - A dataset can have multiple publications - Contacts (
dataset_contact) - Multiple authors/contacts per dataset - Organisms (
dataset_organism) - Multiple experimental or host organisms - Hyperlinks (
dataset_hyperlink) - Multiple related URLs - Countries, species, diseases - Multiple categorical values
- Dependencies, funding awards, sample types, etc.
- Publications (
This provides an object-oriented interface where the record represents a single user dataset with scalar properties (attributes) and collections of related data (tables).
Implementation Pattern - Creating UserDatasetRecordClass:
The implementation should follow the pattern established in datasetRecords.xml and datasetQueries.xml, creating two new parallel files:
-
userDatasetRecords.xml- Defines the record class structure:<recordClass name="UserDatasetRecordClass">- Main record class definition<primaryKey>- References an alias query (likely usingdataset_idfrom VDI schema)<idAttribute>- Display attribute for the primary key<reporter>elements - Standard reporters (attributesTabular, tableTabular, fullRecord, xml, json)<attributeQueryRef>elements - Reference queries from userDatasetQueries.xml for 1-to-1 properties:- Core dataset metadata (dataset_id, owner, type_name, type_version, category, etc.)
- Dataset descriptive info (name, summary, description, etc.)
- Accessibility and status (is_public, accessibility, deleted_status, creation_date)
- Each
<attributeQueryRef>contains<columnAttribute>elements mapping query columns to attributes
<table>elements - Reference table queries for 1-to-many relationships:- Publications (
<table name="Publications" queryRef="UserDatasetTables.Publications">) - Contacts (
<table name="Contacts" queryRef="UserDatasetTables.Contacts">) - Organisms, Hyperlinks, Countries, Species, etc.
- Each table contains
<columnAttribute>,<linkAttribute>, or<textAttribute>elements
- Publications (
-
userDatasetQueries.xml- Defines SQL queries to retrieve data from VDI Control Schema:<querySet name="UserDatasetAttributes" queryType="attribute">- Attribute queries:- SQL queries joining
datasetanddataset_metatables - Queries for sync_control, install status, visibility, etc.
- Each
<sqlQuery>has<column>elements and<sql>with the query
- SQL queries joining
<querySet name="UserDatasetTables" queryType="table">- Table queries:- SQL queries for each supporting metadata table
dataset_publication,dataset_contact,dataset_organism, etc.- Queries should join on
dataset_idforeign key
Key Patterns from DatasetRecordClass:
-
Attribute Queries: Join core tables, return one row per dataset_id
<attributeQueryRef ref="UserDatasetAttributes.CoreMetadata"> <columnAttribute name="dataset_id" internal="true"/> <columnAttribute name="owner" displayName="Owner"/> <columnAttribute name="type_name" displayName="Type"/> <!-- etc. --> </attributeQueryRef>
-
Table Queries: Return multiple rows per dataset_id for 1-to-many relationships
<table name="Publications" displayName="Publications" queryRef="UserDatasetTables.Publications"> <columnAttribute name="dataset_id" internal="true"/> <columnAttribute name="pmid" displayName="PubMed ID"/> <linkAttribute name="pubmed_link" displayName="Link"> <!-- link markup --> </linkAttribute> </table>
-
Link Attributes: Use
<linkAttribute>for URLs, with<displayText>and<url>child elements -
Internal Attributes: Use
internal="true"for columns needed for joins but not displayed -
Project Filtering: Use
includeProjects/excludeProjectsattributes for site-specific features
Both files must be imported in Model/lib/wdk/ebrcModelCommon.xml:
<import file="model/records/userDatasetRecords.xml"/>
<import file="model/records/userDatasetQueries.xml"/>The tuning manager (Model/lib/xml/tuningManager/) defines database materialized views (tuning tables) that optimize queries. These are built from dataset metadata and external data sources.
Common tuning tables:
DatasetPresenter- Main dataset metadata tableStudyIdDatasetId- Maps EDA study IDs to dataset IDs- Various record-specific attribute tables
Dataset classes are defined in Model/lib/xml/datasetClass/classes.xml. Each class represents a type of data (e.g., RNA-Seq, microarray, proteomics) and specifies:
- Properties available for that class
- How the data integrates with the site
- Display templates and visualization options
- WDK (Workflow Development Kit) - Core dependency providing model infrastructure
- ReFlow - Workflow management
- FgpUtil - Utility libraries (core, xml, cli, db)
- Apache Commons (Digester, CLI, Codec)
- Jackson - JSON processing
- Log4j - Logging
-
To add a new dataset type:
- Create a
DatasetInjectorsubclass inModel/src/main/java/org/apidb/apicommon/model/datasetInjector/ - Implement the three abstract methods
- Add dataset presenter XML entry in
Model/lib/xml/datasetPresenters/global.xml - Create templates in
Model/lib/dst/if needed - Build and restart Tomcat
- Create a
-
To modify WDK model definitions:
- Edit XML files in
Model/lib/wdk/ - Rebuild project with
bld EbrcModelCommon - Restart Tomcat instance (changes require reload)
- Edit XML files in
-
To add/modify user dataset handlers:
- Create/edit handler in
Model/src/main/java/org/apidb/apicommon/model/userdataset/ - Implement required UserDatasetTypeHandler methods
- Register handler in WDK configuration
- Create/edit handler in