Skip to content

Commit 393ac3b

Browse files
committed
updated man page
1 parent 3bb2923 commit 393ac3b

1 file changed

Lines changed: 39 additions & 21 deletions

File tree

docs/frog.1

Lines changed: 39 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
.TH frog 1 "2023 jan 31"
1+
.TH frog 1 "2023 feb 22"
22

33
.SH NAME
44
frog \- Dutch Natural Language Toolkit
55
.SH SYNOPSIS
6-
frog [options]
6+
frog [\-t] test\-file
77

8-
frog \-t test\-file
8+
frog [options]
99

1010
.SH DESCRIPTION
1111
Frog is an integration of memory\(hy-based natural language processing (NLP)
@@ -25,7 +25,7 @@ you can use
2525
to select the 'config-file' for an installed language 'lang'
2626
.RE
2727

28-
.BR \-\-debug =<modele><level>,...
28+
.BR \-\-debug =<module><level>,...
2929
.RS
3030
set debug level per module, indicated by a single letter:
3131
Tagger (T), Tokenizer (t), Lemmatizer (l), Morphological Analyzer (a),
@@ -35,11 +35,14 @@ or Parser (p). Different modules must be separated by commas.
3535
(e.g. \-\-debug=l5,n3 sets the level for the Lemmatizer to 5 and for the NER
3636
to 3 )
3737

38+
Debugging lines are written to a file
39+
.BR frog.<number>.debug
3840
.RE
41+
The name of that file is given at the end of the run.
3942

4043
.BR \-d " <level>"
4144
.RS
42-
set global debug level. (for all modules)
45+
set a global debug level for all modules at once.
4346
.RE
4447

4548
.BR \-\-deep\(hymorph
@@ -75,7 +78,12 @@ The first language in the list will be the default, unspecified languages are
7578
asumed to be of that default.
7679

7780
e.g. \-\-language=nld,eng,por
78-
means: detect Dutch, English and Portuguese, with Dutch being the default.
81+
means: detect Dutch, English and Portuguese, with Dutch being the default,
82+
using TextCat. Mainly useful for XML processing.
83+
84+
Specifying a unsupported language is a fatal error. However, you can add the
85+
special language 'und' which assures that sentences in an unknown languages
86+
will be labeled as such, and processed no further.
7987

8088
.B IMPORTANT
8189
Frog can at the moment handle only one language at a time, as determined by the
@@ -115,23 +123,24 @@ from the inputfilename(s) with '.out' appended.
115123
.BR \-\-retry
116124
.RS
117125
assume a re-run on the same input file(s). Frog wil only process those files
118-
that haven't been processed yet. This is accomplished by looking at the output
119-
file names. (so this has no effect if neither \-o, \-\-outputdir, \-X or
120-
\-\-xmldir is used)
126+
that haven't been processed yet.
121127
.RE
122128

123129

124130
.BR \-\-skip =[tlacnmp]
125131
.RS
126132
skip parts of the process: Tokenizer (t), Lemmatizer (l), Morphological
127-
Analyzer (a), Chunker (c), Named Entity Recognition (n), Multi-Word Units (m) or Parser (p).
133+
Analyzer (a), Chunker (c), Named Entity Recognition (n), Multi-Word Units (m)
134+
or Parser (p).
135+
136+
The Tagger cannot be skipped.
128137

129138
Skipping the Multiword Unit implies disabling the Parser too.
130139
.RE
131140

132141
.BR \-\-alpino
133142
.RS
134-
Use a locally installed Alpino parser
143+
Use a locally installed Alpino parser. Disables our build-in Dependency parser
135144
.RE
136145

137146
.BR \-\-alpino =server
@@ -154,9 +163,14 @@ Run Frog as a server on 'port'
154163
.RS
155164
process 'file'.
156165

157-
\-t can be omitted. Frog will run on any <file> found on the command-line.
166+
This option can be omitted. Frog will run on any <file> found on the
167+
qcommand-line.
158168
Wildcards are allowed too. When NO files are specified, Frog will start in
159169
interactive mode.
170+
171+
Files with the extension '.gz' or '.bz2' are handled too. The corresponding
172+
output-files will be compressed using the same compression again. Except
173+
when an explicit output filename is specified.
160174
.RE
161175

162176
.BR \-x " <xmlfile>"
@@ -165,13 +179,20 @@ process 'xmlfile', which is supposed to be in FoLiA format! If 'xmlfile' is
165179
empty, and
166180
.BR \-\-testdir =<dir>
167181
is provided, all '.xml' files in 'dir' will be processed as FoLia XML.
182+
183+
This option can be omitted. Frog will process files with the 'xml' extension
184+
as FoLiA files.
185+
186+
Files with the extension '.xml.gz' or '.xml.bz2' are handled too. The
187+
corresponding output-files will be compressed using the same compression again.
188+
Except when an explicit output filename is specified.
168189
.RE
169190

170191
.BR \-X " <xmlfile>"
171192
.RS
172193
When 'xmlfile' is specified, create a FoLiA XML output file with that name.
173194

174-
When 'xmlfile' is empty, generate XML output for every inputfile.
195+
When 'xmlfile' is empty, generate FoLiA XML output for every inputfile.
175196
.RE
176197

177198
.BR \-\-textclass "=<cls>"
@@ -182,7 +203,6 @@ is given, use 'cls' to find AND store text in the FoLiA document(s).
182203
Using \-\-inputclass and \-\-\outputclass is in general a better choice.
183204
.RE
184205

185-
186206
.BR \-\-inputclass "=<cls>"
187207
.RS
188208
use 'cls' to find text in the FoLiA input document(s).
@@ -196,16 +216,11 @@ Preferably this is another class then the inputclass.
196216

197217
.BR \-\-testdir =<dir>
198218
.RS
199-
process all files in 'dir'. When the input mode is XML, only '.xml' files are
200-
teken from 'dir'. see also
219+
process all files in 'dir'. When the input mode is XML, only '.xml' files,
220+
'.xml.gz' or '.xml.bz2' files are taken from 'dir'. see also
201221
.B \-\-outputdir
202222
.RE
203223

204-
.BR \-\-tmpdir =<dir>
205-
.RS
206-
location to store intermediate files. Default /tmp. NOT USED!
207-
.RE
208-
209224
.BR \-\-uttmarker =<mark>
210225
.RS
211226
assume all utterances are separated by 'mark'. (the default is none).
@@ -308,3 +323,6 @@ Antal van den Bosch
308323
e\-mail: lamasoftware@science.ru.nl
309324
.SH SEE ALSO
310325
.BR ucto (1)
326+
.BR mblem (1)
327+
.BR mbma (1)
328+
.BR ner (1)

0 commit comments

Comments
 (0)