Skip to content

Commit 016fdd9

Browse files
Copilotetj
andauthored
[Fixes #13512] Create a task that loads required thesauri automatically (#14187)
* Add autoload subcommand to thesaurus management command, task in tasks.py, and entrypoint call * Add documentation for thesaurus autoload command and boot-time thesauri loading * Move autoload_thesauri into its own thesaurus_subcommands/autoload.py * Log improvements, lang selection * Apply suggestions from code review * Fix mutable default argument for langs parameter in load_thesaurus * Fix patch apps.get_app_configs in correct autoload module * Fix code formatting * Improve load final log Co-authored-by: etj <717359+etj@users.noreply.github.com> Co-authored-by: Emanuele Tajariol <etj@geo-solutions.it>
1 parent ac65364 commit 016fdd9

8 files changed

Lines changed: 298 additions & 23 deletions

File tree

docs/src/admin/thesauri/thesauri.md

Lines changed: 62 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ GeoNode provides a single command (``thesaurus``) with multiple actions:
9292
* ``list``: list existing thesauri
9393
* ``load``: load a RDF file
9494
* ``dump``: dump a thesaurus into a file
95+
* ``autoload``: automatically discover and load all thesauri shipped by installed apps
9596

9697
.. code-block::
9798

@@ -102,12 +103,13 @@ GeoNode provides a single command (``thesaurus``) with multiple actions:
102103
[--format {json-ld,n3,nt,pretty-xml,sorted-xml,trig,ttl,xml}] [--default-lang LANG] [--version]
103104
[-v {0,1,2,3}] [--settings SETTINGS] [--pythonpath PYTHONPATH] [--traceback] [--no-color]
104105
[--force-color] [--skip-checks]
105-
[{list,load,dump}]
106+
[{list,load,dump,autoload}]
106107

107-
Handles thesaurus commands ['list', 'load', 'dump']
108+
Handles thesaurus commands ['list', 'load', 'dump', 'autoload']
108109

109110
positional arguments:
110-
{list,load,dump} thesaurus operation to run
111+
{list,load,dump,autoload}
112+
thesaurus operation to run
111113

112114
options:
113115
-h, --help show this help message and exit
@@ -227,6 +229,63 @@ In order to only export the entries we edited, we'll issue the command::
227229
python manage.py thesaurus dump -i labels-i18n --include "proj1_*" --include "*_ovr" -f labels-i18n.proj1.rdf
228230

229231

232+
### Auto-loading thesauri: ``thesaurus autoload``
233+
234+
The ``autoload`` subcommand scans every installed Django app for a ``thesauri/`` directory
235+
at the top level of the app package, then loads all ``.rdf`` files it finds there.
236+
This is how GeoNode and third-party apps can ship thesauri that are loaded automatically at start-up.
237+
238+
```bash
239+
python manage.py thesaurus autoload
240+
```
241+
242+
For each ``.rdf`` file discovered, the command runs the equivalent of ``thesaurus load --action update``,
243+
so the operation is **idempotent**: running it multiple times will not create duplicates; instead,
244+
existing records are updated and missing ones are created.
245+
246+
**Convention for app-provided thesauri**
247+
248+
Place one or more ``.rdf`` files inside a ``thesauri/`` directory at the root of your app package:
249+
250+
```
251+
my_geonode_app/
252+
thesauri/
253+
my_vocabulary.rdf
254+
another_vocab.rdf
255+
models.py
256+
...
257+
```
258+
259+
All ``.rdf`` files in that directory are picked up automatically whenever ``thesaurus autoload``
260+
(or ``invoke loadthesauri``) is executed.
261+
262+
!!! note
263+
The ``autoload`` command is automatically run during GeoNode's Docker container start-up sequence (see [Initialization at boot](#initialization-at-boot)).
264+
265+
266+
## Initialization at boot { #initialization-at-boot }
267+
268+
When GeoNode starts (e.g. via the Docker entrypoint), the following initialization steps are executed in order:
269+
270+
1. **Database migrations** – applies any pending schema migrations.
271+
2. **Fixtures** – loads default OAuth2 apps, admin user, and site data (only on first boot or when ``FORCE_REINIT=true``).
272+
3. **Static files** – collects static assets.
273+
4. **Thesauri autoload** – runs ``thesaurus autoload`` to load or update all ``.rdf`` files found in any installed app's ``thesauri/`` directory. This step runs on **every** boot so that thesaurus updates shipped with an upgraded app are applied automatically.
274+
275+
To run the thesaurus autoload step manually:
276+
277+
```bash
278+
# Inside the GeoNode container
279+
python manage.py thesaurus autoload
280+
```
281+
282+
Or using the invoke task:
283+
284+
```bash
285+
invoke loadthesauri
286+
```
287+
288+
230289
## Configuring a Thesaurus
231290

232291

docs/src/setup/docker/vanilla-docker-installation.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,15 @@ Executing UWSGI server uwsgi --ini /usr/src/app/uwsgi.ini for Production
8181
[uWSGI] getting INI configuration from /usr/src/app/uwsgi.ini
8282
```
8383

84+
The container performs these initialization steps before starting the application server:
85+
86+
1. **Database migrations** – applies any pending schema migrations.
87+
2. **Fixtures** – loads default OAuth2 apps, admin user and site data (only on first boot or when ``FORCE_REINIT=true``).
88+
3. **Static files** – collects static assets.
89+
4. **Thesauri autoload** – scans all installed apps for a ``thesauri/`` directory and loads (or updates) any ``.rdf`` files found there. This makes sure thesauri shipped by GeoNode apps are always up-to-date.
90+
91+
See [Thesauri – Initialization at boot](../../../admin/thesauri/thesauri.md#initialization-at-boot) for more details on the thesaurus autoload step.
92+
8493
To exit just hit `CTRL+C`.
8594

8695
This message means that the GeoNode containers have been started. Browsing to `http://localhost/` will show the GeoNode home page. You should be able to successfully log with the credentials of admin user which are defined in the .env file and start using it right away.

entrypoint.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ else
5656
fi
5757

5858
invoke statics
59+
invoke loadthesauri
5960

6061
echo "Executing UWSGI server $cmd for Production"
6162
fi

geonode/base/management/commands/thesaurus.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
from django.core.management.base import BaseCommand, CommandError
33

44
from geonode.base.management.command_utils import setup_logger
5+
from geonode.base.management.commands.thesaurus_subcommands.autoload import autoload_thesauri
56
from geonode.base.management.commands.thesaurus_subcommands.dump import (
67
dump_thesaurus,
78
DUMP_FORMATS,
@@ -16,7 +17,8 @@
1617
COMMAND_LIST = "list"
1718
COMMAND_DUMP = "dump"
1819
COMMAND_LOAD = "load"
19-
COMMANDS = [COMMAND_LIST, COMMAND_LOAD, COMMAND_DUMP]
20+
COMMAND_AUTOLOAD = "autoload"
21+
COMMANDS = [COMMAND_LIST, COMMAND_LOAD, COMMAND_DUMP, COMMAND_AUTOLOAD]
2022

2123

2224
class Command(BaseCommand):
@@ -41,6 +43,12 @@ def add_arguments(self, parser):
4143
choices=ACTIONS,
4244
help="Actions to run upon data loading (default: create)",
4345
)
46+
load_group.add_argument(
47+
"--langs",
48+
dest="langs",
49+
action="append",
50+
help="Only import labels for the requested languages; can be repeated",
51+
)
4452

4553
dump_group = parser.add_argument_group('Params for "dump" subcommand')
4654
dump_group.add_argument("-o", "--out", nargs="?", help="Full path to the output file to be created")
@@ -99,6 +107,8 @@ def handle(self, *args, **options):
99107
input_file = options.get("file")
100108
action = options.get("action")
101109
identifier = options.get("identifier")
110+
lang = options.get("lang")
111+
langs = options.get("langs") or []
102112

103113
if not input_file:
104114
raise CommandError("'load' command requires the <file> parameter.")
@@ -107,7 +117,10 @@ def handle(self, *args, **options):
107117
action = ACTION_CREATE
108118
logger.info(f"Missing action param: setting actions as '{action}'")
109119

110-
load_thesaurus(input_file, identifier, action)
120+
load_thesaurus(input_file, identifier, action, default_lang=lang, langs=langs)
121+
122+
elif subcommand == COMMAND_AUTOLOAD:
123+
autoload_thesauri()
111124

112125
else:
113126
raise CommandError(f"Unknown subcommand: {subcommand}")
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
import os
2+
3+
from django.apps import apps
4+
5+
from geonode.base.management.command_utils import setup_logger
6+
from geonode.base.management.commands.thesaurus_subcommands.load import load_thesaurus, ACTION_UPDATE
7+
8+
logger = setup_logger()
9+
10+
11+
def autoload_thesauri():
12+
"""
13+
Discover and load all thesauri (.rdf files) found in a `thesauri/` directory
14+
within each installed Django app. Uses the `update` action so existing entries
15+
are updated and new ones are created without duplicates.
16+
"""
17+
loaded = 0
18+
for app_config in apps.get_app_configs():
19+
thesauri_dir = os.path.join(app_config.path, "thesauri")
20+
logger.debug(f"Looking for auto thesaurus in app '{app_config.name}' path: {thesauri_dir}")
21+
if not os.path.isdir(thesauri_dir):
22+
continue
23+
try:
24+
rdf_files = [f for f in os.listdir(thesauri_dir) if f.lower().endswith(".rdf")]
25+
except OSError as e:
26+
logger.error(
27+
f"Failed to scan thesauri directory for app '{app_config.name}' at '{thesauri_dir}': {e}",
28+
exc_info=True,
29+
)
30+
continue
31+
for rdf_file in sorted(rdf_files):
32+
rdf_path = os.path.join(thesauri_dir, rdf_file)
33+
logger.info(f"Autoloading thesaurus from app '{app_config.name}': {rdf_path}")
34+
try:
35+
load_thesaurus(rdf_path, identifier=None, action=ACTION_UPDATE, log_details=False)
36+
loaded += 1
37+
except Exception as e:
38+
logger.error(f"Failed to load thesaurus '{rdf_path}': {e}", exc_info=True)
39+
logger.info(f"Autoload complete: {loaded} thesaurus file(s) loaded.")

geonode/base/management/commands/thesaurus_subcommands/load.py

Lines changed: 39 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -45,13 +45,16 @@
4545
FAKE_BASE_URI = "http://automatically/added/uri/"
4646

4747

48-
def load_thesaurus(input_file, identifier: str, action: str = ACTION_CREATE):
48+
def load_thesaurus(input_file, identifier: str, action: str = ACTION_CREATE, default_lang: str = None, langs: List[str] = None, log_details=True):
4949
g = Graph()
5050

5151
# if the input_file is an UploadedFile object rather than a file path the Graph.parse()
5252
# method may not have enough info to correctly guess the type; in this case supply the
5353
# name, which should include the extension, to guess_format manually...
5454

55+
# explodes list of comma separated langs into single list of langs
56+
langs = [lang.strip() for item in (langs or []) for lang in item.split(",") if lang.strip()]
57+
5558
filename = input_file.name if isinstance(input_file, UploadedFile) else input_file
5659
rdf_format = guess_format(filename)
5760
if not identifier:
@@ -65,7 +68,7 @@ def load_thesaurus(input_file, identifier: str, action: str = ACTION_CREATE):
6568
if scheme is None:
6669
raise CommandError("ConceptScheme not found in file")
6770

68-
default_lang = getattr(settings, "THESAURUS_DEFAULT_LANG", None)
71+
default_lang = default_lang or getattr(settings, "THESAURUS_DEFAULT_LANG", None) or getattr(settings, "LANGUAGE_CODE", 'en')
6972

7073
available_titles = [t
7174
for t in itertools.chain(g.objects(scheme, DC.title), g.objects(scheme, DCTERMS.title))
@@ -81,15 +84,21 @@ def load_thesaurus(input_file, identifier: str, action: str = ACTION_CREATE):
8184
Thesaurus,
8285
{"identifier": identifier},
8386
{"date": date_issued, "description": description, "title": thesaurus_title, "about": str(scheme)},
84-
{"card_min": 0, "card_max": 0, "facet": False}
87+
{"card_min": 0, "card_max": 0, "facet": False},
88+
log_details
8589
)
8690

87-
tl_cnt = tl_add = 0
91+
tl_cnt = tl_add = tl_skp = 0
8892
tk_cnt = tk_add = 0
89-
tkl_cnt = tkl_add = 0
93+
tkl_cnt = tkl_add = tkl_skp = 0
9094

9195
for lang in available_titles:
9296
if lang.language is not None:
97+
tl_cnt += 1
98+
if langs and lang.language not in langs:
99+
logger.debug(f"Skipping thesaurus label for language '{lang.language}' not in requested langs {langs}")
100+
tl_skp += 1
101+
continue
93102
thesaurus_label, c = _run_action(
94103
action,
95104
ThesaurusLabel,
@@ -99,8 +108,8 @@ def load_thesaurus(input_file, identifier: str, action: str = ACTION_CREATE):
99108
},
100109
{"label": lang.value},
101110
{},
111+
log_details
102112
)
103-
tl_cnt += 1
104113
tl_add += 1 if c else 0
105114

106115
for concept in g.subjects(RDF.type, SKOS.Concept):
@@ -115,7 +124,8 @@ def load_thesaurus(input_file, identifier: str, action: str = ACTION_CREATE):
115124
available_labels = [t for t in g.objects(concept, SKOS.prefLabel) if isinstance(t, Literal)]
116125
alt_label = value_for_language(available_labels, default_lang) or about
117126

118-
logger.info(f" - Parsed Concept -> about:'{about}' alt:'{alt_label}' pref:'{str(pref)}' ")
127+
if log_details:
128+
logger.info(f" - Parsed Concept -> about:'{about}' alt:'{alt_label}' pref:'{str(pref)}' ")
119129

120130
tk, c = _run_action(
121131
action,
@@ -126,14 +136,21 @@ def load_thesaurus(input_file, identifier: str, action: str = ACTION_CREATE):
126136
},
127137
{"alt_label": alt_label},
128138
{},
139+
log_details
129140
)
130141
tk_cnt += 1
131142
tk_add += 1 if c else 0
132143

133144
for _, pref_label in preferredLabel(g, concept):
145+
tkl_cnt += 1
134146
lang = pref_label.language
147+
if langs and lang not in langs:
148+
logger.debug(f"Skipping label for language '{lang}' not in requested langs {langs}")
149+
tkl_skp += 1
150+
continue
135151
label = str(pref_label)
136-
logger.info(f" - Label {lang}: {label}")
152+
if log_details:
153+
logger.info(f" - Label {lang}: {label}")
137154

138155
tkl, c = _run_action(
139156
action,
@@ -144,25 +161,26 @@ def load_thesaurus(input_file, identifier: str, action: str = ACTION_CREATE):
144161
},
145162
{"label": label},
146163
{},
164+
log_details
147165
)
148-
tkl_cnt += 1
149166
tkl_add += 1 if c else 0
150167

151-
logger.warning(f"Thesaurus added: {cr_t}")
152-
logger.warning(f"ThesaurusLabel added: {tl_add:3}/{tl_cnt:3}")
153-
logger.warning(f"ThesaurusKeyword added: {tk_add:3}/{tk_cnt:3}")
154-
logger.warning(f"ThesaurusKeywordLabel added: {tkl_add:3}/{tkl_cnt:3}")
168+
logger.warning(f"Thesaurus added: {cr_t}")
169+
logger.warning(f"ThesaurusLabel: found: {tl_cnt:3} - added: {tl_add:3} - skipped: {tl_skp:3}")
170+
logger.warning(f"ThesaurusKeyword: found: {tk_cnt:3} - added: {tk_add:3}")
171+
logger.warning(f"ThesaurusKeywordLabel: found: {tkl_cnt:3} - added: {tkl_add:3} - skipped: {tkl_skp:3}")
155172

156173

157-
def _run_action(action: str, model: type[models.Model], pk_dict, upd_dict, create_dict) -> tuple[models.Model, bool]:
174+
def _run_action(action: str, model: type[models.Model], pk_dict, upd_dict, create_dict, log_details) -> tuple[models.Model, bool]:
158175
def update_or_create(defaults=upd_dict, create_defaults=create_dict, **pk_dict):
159176
# this signature is available since django 5
160177
obj, created = model.objects.get_or_create(defaults=upd_dict | create_dict, **pk_dict)
161178

162179
if not created:
163180
rows = model.objects.filter(pk=obj.pk).update(**upd_dict)
164181
if rows != 1:
165-
logger.error(f"UPDATED {rows} rows for {model.__name__} -> {pk_dict}")
182+
if log_details:
183+
logger.error(f"UPDATED {rows} rows for {model.__name__} -> {pk_dict}")
166184

167185
return obj, created
168186

@@ -176,14 +194,17 @@ def update_or_create(defaults=upd_dict, create_defaults=create_dict, **pk_dict):
176194
elif action == ACTION_UPDATE:
177195
obj, created = update_or_create(defaults=upd_dict, create_defaults=create_dict, **pk_dict)
178196
if created:
179-
logger.info(f"{model.__name__} -> Created id:{pk_dict}")
197+
if log_details:
198+
logger.info(f"{model.__name__} -> Created id:{pk_dict}")
180199
else:
181-
logger.info(f"{model.__name__} -> Updated id:{pk_dict} DATA:{upd_dict}")
200+
if log_details:
201+
logger.info(f"{model.__name__} -> Updated id:{pk_dict} DATA:{upd_dict}")
182202

183203
elif action == ACTION_APPEND:
184204
obj, created = model.objects.get_or_create(defaults=upd_dict | create_dict, **pk_dict)
185205
if created:
186-
logger.info(f"{model.__name__} -> Created {pk_dict}")
206+
if log_details:
207+
logger.info(f"{model.__name__} -> Created {pk_dict}")
187208
else:
188209
raise CommandError("No valid action found")
189210

0 commit comments

Comments
 (0)