Skip to content

Commit 39b334b

Browse files
authored
Merge branch 'develop' into feature/dropdown-imroved
2 parents 8a874c2 + 6d766d3 commit 39b334b

10 files changed

Lines changed: 244 additions & 135 deletions

File tree

CITATION.cff

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,5 +35,5 @@ keywords:
3535
- elasticsearch
3636
- natural language processing
3737
license: MIT
38-
version: 5.30.0
39-
date-released: '2026-04-09'
38+
version: 5.30.1
39+
date-released: '2026-05-19'

backend/requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ humanize==4.13.0
9696
# via flower
9797
ianalyzer-readers==0.4.0
9898
# via -r requirements.in
99-
idna==3.10
99+
idna==3.15
100100
# via requests
101101
iniconfig==2.1.0
102102
# via pytest
@@ -252,7 +252,7 @@ tzdata==2025.2
252252
# via
253253
# kombu
254254
# pandas
255-
urllib3==2.6.3
255+
urllib3==2.7.0
256256
# via
257257
# django-revproxy
258258
# elastic-transport

backend/visualization/tests/test_visualization_views.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
from visualization.query import MATCH_ALL
2+
from visualization.views import TERM_FREQUENCY_SIZE_LIMIT
23
import pytest
4+
import math
35
from rest_framework import status
46

57
@pytest.fixture()
@@ -86,13 +88,24 @@ def test_date_term_frequency(transactional_db, admin_client, date_term_frequency
8688
post_response = admin_client.post('/api/visualization/date_term_frequency', date_term_frequency_body, content_type='application/json')
8789
assert post_response.status_code == 400
8890

89-
def test_term_frequency_limit(transactional_db, admin_client, date_term_frequency_body, aggregate_term_frequency_body, index_small_mock_corpus, celery_worker):
91+
def test_date_term_frequency_limit(transactional_db, admin_client, date_term_frequency_body, index_small_mock_corpus, celery_worker):
9092
for bin in date_term_frequency_body['bins']:
9193
bin['size'] = 1000000
9294
post_response = admin_client.post('/api/visualization/date_term_frequency', date_term_frequency_body, content_type='application/json')
9395
assert post_response.status_code == 400
9496

97+
for bin in date_term_frequency_body['bins']:
98+
bin['size'] = math.ceil(TERM_FREQUENCY_SIZE_LIMIT / len(date_term_frequency_body['bins']))
99+
post_response = admin_client.post('/api/visualization/date_term_frequency', date_term_frequency_body, content_type='application/json')
100+
assert post_response.status_code == 200
101+
102+
def test_aggregate_term_frequency_limit(transactional_db, admin_client, aggregate_term_frequency_body, index_small_mock_corpus, celery_worker):
95103
for bin in aggregate_term_frequency_body['bins']:
96104
bin['size'] = 1000000
97105
post_response = admin_client.post('/api/visualization/aggregate_term_frequency', aggregate_term_frequency_body, content_type='application/json')
98106
assert post_response.status_code == 400
107+
108+
for bin in aggregate_term_frequency_body['bins']:
109+
bin['size'] = math.ceil(TERM_FREQUENCY_SIZE_LIMIT / len(aggregate_term_frequency_body['bins']))
110+
post_response = admin_client.post('/api/visualization/aggregate_term_frequency', aggregate_term_frequency_body, content_type='application/json')
111+
assert post_response.status_code == 200

backend/visualization/views.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,12 @@
2020
`10.000 + N(bins)`.
2121
'''
2222

23+
def validate_term_frequency_size(bins):
24+
max_size = TERM_FREQUENCY_SIZE_LIMIT + len(bins)
25+
if sum(bin['size'] for bin in bins) > max_size:
26+
raise ValidationError(detail='Maximum size exceeded')
27+
28+
2329

2430
class WordcloudView(APIView):
2531
'''
@@ -121,8 +127,7 @@ def post(self, request, *args, **kwargs):
121127
raise ParseError(
122128
detail=f'key {key} is not present for all bins in request data')
123129

124-
if sum(bin['size'] for bin in bins) > TERM_FREQUENCY_SIZE_LIMIT:
125-
raise ValidationError(detail='Maximum size exceeded')
130+
validate_term_frequency_size(bins)
126131

127132
try:
128133
group = tasks.timeline_term_frequency_tasks(
@@ -153,9 +158,7 @@ def post(self, request, *args, **kwargs):
153158
raise ParseError(
154159
detail=f'key {key} is not present for all bins in request data')
155160

156-
max_size = TERM_FREQUENCY_SIZE_LIMIT + len(bins)
157-
if sum(bin['size'] for bin in bins) > max_size:
158-
raise ValidationError(detail='Maximum size exceeded')
161+
validate_term_frequency_size(bins)
159162

160163
try:
161164
group = tasks.histogram_term_frequency_tasks(
Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
# Distrobox development setup.
2+
3+
This is a guide for setting up a [distrobox](https://distrobox.it/) container with Ubuntu for developing Textcavator. These are notes from my (Luka) experience; modify as needed.
4+
5+
### Distrobox vs. Docker compose containers
6+
7+
This repository also includes a container setup using Docker compose. The docker-compose setup is designed for maximum isolation, separating the application from the host system, as well as using separate containers for different subsystems (frontend, backend, database engines).
8+
9+
By contrast, distrobox is a system for running privileged containers, which are tightly integrated with the host system. Also, this guide will create a single container that includes Textcavator and all its dependencies, instead of isolating subsystems.
10+
11+
Generally speaking, using distrobox offers some but not all of the benefits of containerisation, and some but not all of the drawbacks. Which option you prefer is up to you.
12+
13+
### Using host services
14+
15+
This guide will install PostgreSQL, Elasticsearch, Kibana, and Redis in the container; these will run in the background when the container is up.
16+
17+
Because distrobox are not isolated from the host system, you can also connect to services that are running in the host, instead of running them in the container. For instance, if your host system is already running PostgreSQL, it's probably not necessary (or useful) to install it in the container.
18+
19+
(Of course, if *all* of these services are already running on the host, there really is no point in setting up a container at all.)
20+
21+
## Prerequisites
22+
23+
You need to install [distrobox](https://distrobox.it/). You may also consider installing [DistroShelf](https://flathub.org/en-GB/apps/com.ranfdev.DistroShelf) if you want a GUI manager.
24+
25+
## Container setup
26+
27+
If your container will include PostgreSQL (see below), create it with:
28+
29+
```sh
30+
distrobox create --name textcavator --image ubuntu:24.04 --init --pre-init-hooks "mkdir /var/run/postgresql && chown postgres /var/run/postgresql"
31+
```
32+
33+
Otherwise, you can leave out the pre-init hook:
34+
35+
```sh
36+
distrobox create --name textcavator --image ubuntu:24.04 --init
37+
```
38+
39+
After creating, enter the container with
40+
41+
```sh
42+
distrobox enter textcavator
43+
```
44+
45+
First-time setup will take a while.
46+
47+
## Install basic libraries
48+
49+
Run inside the container:
50+
51+
```sh
52+
sudo apt update
53+
sudo upgrade
54+
sudo apt install nano git git-flow python3-pip python3-virtualenv
55+
```
56+
57+
## PostgreSQL
58+
59+
Running PostgreSQL is the most precarious part of the setup. If you're developing other projects that use PostgreSQL, I recommend against installing it like this, as there is little benefit. Instead, run PostgreSQL on your host system, or run it inside a separate docker/podman container.
60+
61+
If you are going to use PostgreSQL, make sure you included the pre-init hook when creating the container (see above). Enter the container to install postgresql.
62+
63+
To prevent an error in installation, run the following:
64+
65+
```sh
66+
sudo nano /usr/sbin/policy-rc.d
67+
```
68+
69+
Change the file contents to `exit 0`, save and close.
70+
71+
Then install PostgreSQL 16 with:
72+
73+
```sh
74+
sudo apt install postgresql
75+
```
76+
77+
For convenience, the following commands let you run `psql` without switching to the `postgres` user. Substitute `johndoe` with your own username.
78+
79+
```sh
80+
sudo -u postgres createuser johndoe
81+
sudo -u postgres createdb -O johndoe johndoe
82+
sudo -u postgres psql
83+
```
84+
85+
In the psql prompt run:
86+
87+
```sql
88+
alter user johndoe superuser;
89+
```
90+
91+
Use `exit` to quit.
92+
93+
94+
To check that everything it working, stop the container, restart, and type `psql` in the command line. This should open the psql prompt.
95+
96+
The default port for PostgreSQL is 5432, but if that port is occupied (usually because PostgreSQL is already running on the host), it will use a different port. Check the port with:
97+
98+
```sh
99+
cat /etc/postgresql/16/main/postgresql.conf | grep "port ="
100+
```
101+
102+
If this is not 5432, open (or create) `backend/ianalyzer/local_settings.py` in this repository to override your database configuration. Copy the `DATABASES` declaration from `backend/ianalyzer/settings.py` and change the port number.
103+
104+
## Node and yarn
105+
106+
See [nodejs.org](https://nodejs.org/en/download) for instructions. Choose Node 22 / Linux / nvm / yarn in the options, and execute the instructions.
107+
108+
## ElasticSearch
109+
110+
Install Elasticsearch. See [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.17/deb.html):
111+
112+
```sh
113+
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
114+
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
115+
sudo apt-get update
116+
sudo apt-get install elasticsearch
117+
sudo systemctl daemon-reload
118+
```
119+
120+
Elasticsearch should now be up and running. To disable security options, run:
121+
122+
```sh
123+
sudo nano /etc/elasticsearch/elasticsearch.yml
124+
```
125+
126+
Change the `xpack.security.enable:` option to `false`, save and close.
127+
128+
Depending on your total RAM, ELasticsearch may reserve a large amount of memory, which is probably not what you want. (See [advanced configuration docs](https://www.elastic.co/guide/en/elasticsearch/reference/8.17/advanced-configuration.html) for more about this step.) Create a file override the default memory settings:
129+
130+
```sh
131+
sudo nano /etc/elasticsearch/jvm.options.d/memory.options
132+
```
133+
134+
Copy-paste the following to set the size to 4GB. Save and close.
135+
136+
```
137+
-Xms4g
138+
-Xmx4g
139+
```
140+
141+
## Kibana (optional)
142+
143+
Kibana provides a GUI interface for Elasticsearch. Textcavator does not depend on Kibana, but we recommended that you install this too, as it's useful for troubleshooting, testing queries, managing indices, etc.
144+
145+
(The steps below should be done *after* the Elasticsearch installation.)
146+
147+
```sh
148+
sudo apt-get install kibana
149+
sudo systemctl enable kibana.service
150+
```
151+
152+
Now run:
153+
154+
```sh
155+
sudo nano /etc/kibana/kibana.yaml
156+
```
157+
158+
Edit the setting `pid.file:` to `/var/run/kibana.pid`
159+
160+
When your container is running, you can open Kibana by going to `https://localhost:5601` in your browser.
161+
162+
## Redis
163+
164+
Install Redis ([APT installation instructions](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/apt/)):
165+
166+
```sh
167+
sudo apt-get install lsb-release curl gpg
168+
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
169+
sudo chmod 644 /usr/share/keyrings/redis-archive-keyring.gpg
170+
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
171+
sudo apt-get update
172+
sudo apt-get install redis
173+
```
174+
175+
## Google Chrome
176+
177+
Chrome is used for browser testing in the frontend. Install with:
178+
179+
```sh
180+
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
181+
sudo sh -c 'echo "deb https://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
182+
sudo apt-get update
183+
sudo apt-get install google-chrome-stable
184+
```
185+
186+
## IDE (optional)
187+
188+
You could use a code editor installed on your host system, but I find it more convenient to install the editor in my container. See https://distrobox.it/posts/integrate_vscode_distrobox/ .
189+
190+
### VSCode
191+
192+
To install VSCode, [VSCode installation instructions](https://code.visualstudio.com/docs/setup/linux#_install-vs-code-on-linux). Choose the option to install the `.deb` package. Then export the application.
193+
194+
For convience, I include a flag to always open on the Textcavator repository:
195+
196+
```sh
197+
distrobox-export --app code --extra-flags "/path/to/this/repository/ --foreground"
198+
```
199+
200+
### VSCodium
201+
202+
See [VSCodium installation instructions](https://vscodium.com/#install-on-debian-ubuntu-deb-package). Then export with:
203+
204+
```sh
205+
distrobox-export --app codium --extra-flags "/path/to/this/repository/ --foreground"
206+
```

documentation/First-time-setup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ These are instructions to set up an Textcavator server. If you are going to deve
1111
* [Node.js](https://nodejs.org/). See [.nvmrc](/.nvmrc) for the recommended version.
1212
* [Yarn](https://yarnpkg.com/)
1313

14-
The documentation includes a [recipe for installing the prerequisites on Debian 10](./documentation/Local-Debian-Textcavator-setup.md)
14+
The documentation includes a [recipe for installing the prerequisites in a distrobox container](./Distrobox%20development%20setup.md).
1515

1616
## First-time setup
1717

0 commit comments

Comments
 (0)