Skip to content

Commit 8478a33

Browse files
authored
Merge pull request #10 from CompNet/dev
Renard 0.7.0
2 parents bd10e65 + 72c0423 commit 8478a33

87 files changed

Lines changed: 18774 additions & 2100 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Makefile

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
.PHONY: test
2+
test:
3+
uv run --group dev python -m pytest tests
4+
5+
.PHONE: ui
6+
ui:
7+
uv run --extra ui python -m renard.ui

README.md

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,16 @@ Renard (Relationship Extraction from NARrative Documents) is a modular library f
99

1010
# Installation
1111

12-
You can install the latest version using pip:
12+
Currently, Renard supports Python>=3.9,<=3.12. You can install the
13+
latest version using pip:
1314

1415
> pip install renard-pipeline
1516
16-
Currently, Renard supports Python>=3.9,<=3.12
17+
If you have a GPU, there are accelerated versions for Nvidia CUDA and
18+
AMD ROCm:
19+
20+
> pip install renard-pipeline[cuda128]
21+
> pip install renard-pipeline[rocm63]
1722
1823

1924
# Documentation
@@ -67,7 +72,25 @@ see [the "Contributing" section of the documentation](https://compnet.github.io/
6772

6873
> uv run python -m pytest tests
6974
70-
Expensive tests are disabled by default. These can be run by setting the environment variable `RENARD_TEST_ALL` to `1`.
75+
Alternatively, the project Makefile has a test target:
76+
77+
> make test
78+
79+
Expensive tests are disabled by default. These can be run by setting the environment variable `RENARD_TEST_SLOW` to `1`.
80+
81+
82+
83+
# Renard UI
84+
85+
Since version 0.7, Renard has a web interface powered by gradio. First, install the additional dependencies:
86+
87+
> uv sync --group ui
88+
89+
Then, simply run:
90+
91+
> make ui
92+
93+
And open your browser at http://127.0.0.1:7860
7194

7295

7396
# How to cite

docs/contributing.rst

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ issue if you encounter a problem or want to discuss a specific
77
feature. If you want to contribute a patch:
88

99
1. Check that your code matches our code quality guidelines and that
10-
all existing tests are passing with ``RENARD_TEST_ALL=1``.
10+
all existing tests are passing with ``RENARD_TEST_SLOW=1``.
1111
2. Create a Github pull request with your patch, explaining the
1212
rationale behind it and giving a high level overview of your
1313
code. Mention the relevant issue if applicable.
@@ -36,9 +36,7 @@ the ``tests`` directory. We use ``pytest`` to test code, and also use
3636
``hypothesis`` when applicable. If you open a patch, make sure that
3737
all tests are passing. In particular, do not rely on the CI, as it
3838
does not run time costly tests! Check for yourself locally, using
39-
``RENARD_TEST_ALL=1 python -m pytest tests``. Note that there are
39+
``RENARD_TEST_SLOW=1 python -m pytest tests``. Note that there are
4040
specific tests and environment variable for optional dependencies such
41-
as *stanza* (``RENARD_TEST_STANZA_OPTDEP``). These must be explicitely
42-
set to ``1`` if you want to test optional dependencies, as
43-
``RENARD_TEST_ALL=1`` does not enable test on these optional
44-
dependencies.
41+
as *stanza* (``RENARD_TEST_OPTDEP_STANZA``). These must be explicitely
42+
set to ``1`` if you want to test optional dependencies.

docs/installation.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,11 @@ Installation
55
Using Pip
66
=========
77

8-
Simply use ``pip install renard-pipeline``.
8+
For the simplest case, use ``pip install renard-pipeline``. By default, this installs the CPU version of PyTorch. If you want GPU support to accelerate inference:
9+
10+
- CUDA 12.8: ``pip install renard-pipeline[cuda128]``
11+
- ROCm 6.3: ``pip install renard-pipeline[rocm63]``
12+
913

1014
Note that for some modules, you might need to install additional
1115
libraries:

docs/pipeline.rst

Lines changed: 44 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -75,17 +75,22 @@ For simplicity, one can use one of the preconfigured pipelines:
7575

7676
.. code-block:: python
7777
78-
from renard.pipeline.preconfigured import bert_pipeline
78+
from renard.pipeline.preconfigured import co_occurence_pipeline
7979
8080
with open("./my_doc.txt") as f:
8181
text = f.read()
8282
83-
pipeline = bert_pipeline(
84-
graph_extractor_kwargs={"co_occurrences_dist": (1, "sentences")}
85-
)
83+
pipeline = co_occurrence_pipeline()
8684
out = pipeline(text)
8785
8886
87+
The following preconfigured pipelines are available:
88+
89+
- :func:`.co_occurrence_pipeline`
90+
- :func:`.conversational_pipeline`
91+
- :func:`.relational_pipeline`
92+
93+
8994
Pipeline Output: the Pipeline State
9095
===================================
9196

@@ -137,7 +142,7 @@ Tokenization
137142
Tokenization is the task of cutting text in *tokens*. It is usually
138143
the first task to apply to a text. 2 tokenizer are available:
139144

140-
- :class:`.NLTKTokenizer`
145+
- :class:`.NLTKTokenizer` is the tokenizer from NLTK.
141146
- :class:`.StanfordCoreNLPPipeline` does contain a tokenizer as part
142147
of its full NLP pipeline.
143148

@@ -148,16 +153,19 @@ Named Entity Recognition
148153
Named entity recognition (NER) detects entities occurences in the
149154
text. 3 modules are available:
150155

151-
- :class:`.NLTKNamedEntityRecognizer`
152-
- :class:`.BertNamedEntityRecognizer`
156+
- :class:`.NLTKNamedEntityRecognizer` is a lightweight NER module from
157+
NLTK, based on POS tagging and rules.
158+
- :class:`.BertNamedEntityRecognizer` is a NER module employing a
159+
finetuned BERT model.
153160
- :class:`.StanfordCoreNLPPipeline` contains a NER model as part of
154161
its full NLP pipeline.
155162

156163

157164
Coreference Resolution
158165
----------------------
159166

160-
- :class:`.SpacyCorefereeCoreferenceResolver`
167+
- :class:`.SpacyCorefereeCoreferenceResolver` uses the spacy coreferee
168+
module.
161169
- :class:`.BertCoreferenceResolver`, using the Tibert library.
162170
- :class:`.StanfordCoreNLPPipeline` can execute a coreference
163171
resolution model as part of its pipeline.
@@ -166,14 +174,14 @@ Coreference Resolution
166174
Quote Detection
167175
---------------
168176

169-
- :class:`.QuoteDetector`
177+
- :class:`.QuoteDetector` detect quotes using simple logic.
170178

171179

172180
Sentiment Analysis
173181
------------------
174182

175183
- :class:`.NLTKSentimentAnalyzer` leverages NLTK's Vader for sentiment
176-
analysis
184+
analysis.
177185

178186

179187
Characters Extraction
@@ -183,21 +191,36 @@ Characters extraction (or alias resolution) extract characters from
183191
occurences detected using NER. This is done by assigning each mention
184192
to a unique character.
185193

186-
- :class:`.NaiveCharacterUnifier`
187-
- :class:`.GraphRulesCharacterUnifier`
194+
- :class:`.NaiveCharacterUnifier` assigns each mention with a unique
195+
form to a character.
196+
- :class:`.GraphRulesCharacterUnifier` uses a set of rules to assign
197+
each mention to a character.
198+
199+
200+
Relation Extraction
201+
-------------------
202+
203+
- :class:`.GenerativeRelationExtractor` is currently in development
204+
and should not be used.
188205

189206

190207
Speaker Attribution
191208
-------------------
192209

193-
- :class:`.BertSpeakerDetector`
210+
- :class:`.BertSpeakerDetector` detects speaker using a finetuned BERT
211+
model.
194212

195213

196214
Graph Extraction
197215
----------------
198216

199-
- :class:`.CoOccurrencesGraphExtractor`
200-
- :class:`.ConversationalGraphExtractor`
217+
- :class:`.CoOccurrencesGraphExtractor` extracts a graph of
218+
co-occurrence between characters.
219+
- :class:`.ConversationalGraphExtractor` extracts a conversational
220+
graph: either conversation between characters, or of character
221+
mentions.
222+
- :class:`.RelationalGraphExtractor` extracts a relational graph,
223+
where the relation between each character is typed.
201224

202225

203226
Dynamic Graphs
@@ -240,8 +263,9 @@ When executing the above block of code, the output attribute
240263
>>> out.character_network
241264
[<networkx.classes.graph.Graph object at 0x7fd9e9115900>]
242265

243-
See :class:`.CoOccurrencesGraphExtractor` for more details on the
244-
usage of the ``dynamic`` and ``dynamic_window`` arguments.
266+
Both :class:`.CoOccurrencesGraphExtractor` and
267+
:class:`.ConversationalGraphExtractor` support dynamic networks. See
268+
their documentation for more details.
245269

246270
Plot and export functions work as one would expect
247271
intuitively. :meth:`.PipelineState.plot_graph` allow to visualize the
@@ -255,10 +279,9 @@ dynamic graph to the Gephi format.
255279
Custom Segmentation
256280
-------------------
257281

258-
The ``dynamic_window`` parameter of
259-
:class:`.CoOccurencesGraphExtractor` determines the segmentation of
260-
the dynamic networks, in number of interactions. In the example above,
261-
a new graph will be created for each 20 interactions.
282+
The ``dynamic_window`` parameter determines the segmentation of the
283+
dynamic networks, in number of interactions. In the example above, a
284+
new graph will be created for each 20 interactions.
262285

263286
While one can rely on the arguments of the graph extractor of the
264287
pipeline to determine the dynamic window, Renard allows to specify a

pyproject.toml

Lines changed: 80 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "renard-pipeline"
3-
version = "0.6.5"
3+
version = "0.7.0"
44
description = "Relationships Extraction from NARrative Documents"
55
authors = [
66
{name = "Arthur Amalvy", email = "arthur.amalvy@univ-avignon.fr"},
@@ -9,20 +9,23 @@ license = { text = "GPL-3.0-only" }
99
readme = "README.md"
1010
requires-python = ">=3.9,<3.13"
1111
dependencies = [
12-
"torch>=2.0.0,!=2.0.1",
13-
"transformers>=4.37",
14-
"nltk>=3.9",
15-
"tqdm>=4.62",
16-
"networkx>=3.0",
17-
"more-itertools>=10.5",
18-
"nameparser>=1.1",
19-
"matplotlib>=3.5",
20-
"pandas>=2.0",
21-
"pytest>=8.3.0",
22-
"tibert>=0.5",
23-
"grimbert>=0.1",
24-
"datasets>=3.0",
12+
"torch>=2.7.0",
13+
"transformers>=4.57.1",
14+
"nltk>=3.9.1",
15+
"tqdm>=4.67.1",
16+
"networkx>=3.2",
17+
"more-itertools>=10.7",
18+
"nameparser>=1.1.3",
19+
"matplotlib>=3.9",
20+
"pytest>=8.4.1",
21+
"tibert>=0.5.2",
22+
"grimbert>=0.1.5",
23+
"datasets>=4.0.0",
2524
"rank-bm25>=0.2.2",
25+
"accelerate>=1.10.1",
26+
"scikit-learn>=1.6.1",
27+
"tiktoken>=0.12.0",
28+
"protobuf>=6.33.2",
2629
]
2730

2831
[build-system]
@@ -43,4 +46,66 @@ dev = [
4346
"Sphinx>=4.3",
4447
"sphinx-rtd-theme>=1.0.0",
4548
"sphinx-autodoc-typehints>=1.12.0",
46-
]
49+
]
50+
51+
[project.optional-dependencies]
52+
ui = [
53+
"gradio>=4.44.1",
54+
"pyvis>=0.3.2",
55+
]
56+
cpu = [
57+
"torch>=2.7.1",
58+
]
59+
cuda128 = [
60+
"torch>=2.7.1",
61+
]
62+
rocm63 = [
63+
"torch>=2.7.1",
64+
"pytorch-triton-rocm>=3.1.0",
65+
]
66+
rocm64 = [
67+
"torch>=2.7.1",
68+
"pytorch-triton-rocm>=3.1.0",
69+
]
70+
71+
[tool.uv]
72+
conflicts = [
73+
[
74+
{ extra = "cpu" },
75+
{ extra = "cuda128" },
76+
{ extra = "rocm63" },
77+
{ extra = "rocm64" },
78+
],
79+
]
80+
81+
[tool.uv.sources]
82+
torch = [
83+
{ index = "pytorch-cpu", extra = "cpu" },
84+
{ index = "pytorch-cuda128", extra = "cuda128" },
85+
{ index = "pytorch-rocm63", extra = "rocm63" },
86+
{ index = "pytorch-rocm64", extra = "rocm64" },
87+
]
88+
pytorch-triton-rocm = [
89+
{ index = "pytorch-rocm63", extra = "rocm63" },
90+
{ index = "pytorch-rocm64", extra = "rocm64" },
91+
]
92+
93+
[[tool.uv.index]]
94+
name = "pytorch-cpu"
95+
url = "https://download.pytorch.org/whl/cpu"
96+
explicit = true
97+
98+
[[tool.uv.index]]
99+
name = "pytorch-cuda128"
100+
url = "https://download.pytorch.org/whl/cu128"
101+
explicit = true
102+
103+
[[tool.uv.index]]
104+
name = "pytorch-rocm63"
105+
url = "https://download.pytorch.org/whl/rocm6.3"
106+
explicit = true
107+
108+
[[tool.uv.index]]
109+
name = "pytorch-rocm64"
110+
url = "https://download.pytorch.org/whl/rocm6.4"
111+
explicit = true

renard/pipeline/character_unification.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,14 @@ class Character:
2121
mentions: List[Mention]
2222
gender: Gender = Gender.UNKNOWN
2323

24-
def longest_name(self) -> str:
24+
def longest_name(self) -> Optional[str]:
25+
if len(self.names) == 0:
26+
return None
2527
return max(self.names, key=len)
2628

27-
def shortest_name(self) -> str:
29+
def shortest_name(self) -> Optional[str]:
30+
if len(self.names) == 0:
31+
return None
2832
return min(self.names, key=len)
2933

3034
def most_frequent_name(self) -> Optional[str]:
@@ -236,7 +240,6 @@ def __call__(
236240

237241
# * link nodes based on several rules
238242
for name1, name2 in combinations(G.nodes(), 2):
239-
240243
# preprocess name when needed
241244
pname1 = self._preprocess_name(name1)
242245
pname2 = self._preprocess_name(name2)
@@ -294,7 +297,6 @@ def try_remove_edges(edges):
294297
pass
295298

296299
for name1, name2 in combinations(G.nodes(), 2):
297-
298300
# preprocess names when needed
299301
pname1 = self._preprocess_name(name1)
300302
pname2 = self._preprocess_name(name2)

0 commit comments

Comments
 (0)