1- .TH frog 1 "2023 jan 31 "
1+ .TH frog 1 "2023 feb 22 "
22
33.SH NAME
44frog \- Dutch Natural Language Toolkit
55.SH SYNOPSIS
6- frog [options]
6+ frog [\- t] test \- file
77
8- frog \- t test \- file
8+ frog [options]
99
1010.SH DESCRIPTION
1111Frog is an integration of memory\(hy -based natural language processing (NLP)
@@ -25,7 +25,7 @@ you can use
2525to select the 'config-file' for an installed language 'lang'
2626.RE
2727
28- .BR \-\- debug =<modele ><level>,...
28+ .BR \-\- debug =<module ><level>,...
2929.RS
3030set debug level per module, indicated by a single letter:
3131Tagger (T), Tokenizer (t), Lemmatizer (l), Morphological Analyzer (a),
@@ -35,11 +35,14 @@ or Parser (p). Different modules must be separated by commas.
3535(e.g. \-\- debug=l5,n3 sets the level for the Lemmatizer to 5 and for the NER
3636to 3 )
3737
38+ Debugging lines are written to a file
39+ .BR frog.<number>.debug
3840.RE
41+ The name of that file is given at the end of the run.
3942
4043.BR \- d " <level>"
4144.RS
42- set global debug level. ( for all modules)
45+ set a global debug level for all modules at once.
4346.RE
4447
4548.BR \-\- deep\(hy morph
@@ -75,7 +78,12 @@ The first language in the list will be the default, unspecified languages are
7578asumed to be of that default.
7679
7780e.g. \-\- language=nld,eng,por
78- means: detect Dutch, English and Portuguese, with Dutch being the default.
81+ means: detect Dutch, English and Portuguese, with Dutch being the default,
82+ using TextCat. Mainly useful for XML processing.
83+
84+ Specifying a unsupported language is a fatal error. However, you can add the
85+ special language 'und' which assures that sentences in an unknown languages
86+ will be labeled as such, and processed no further.
7987
8088.B IMPORTANT
8189Frog can at the moment handle only one language at a time, as determined by the
@@ -115,23 +123,24 @@ from the inputfilename(s) with '.out' appended.
115123.BR \-\- retry
116124.RS
117125assume a re-run on the same input file(s). Frog wil only process those files
118- that haven't been processed yet. This is accomplished by looking at the output
119- file names. (so this has no effect if neither \- o, \-\- outputdir, \- X or
120- \-\- xmldir is used)
126+ that haven't been processed yet.
121127.RE
122128
123129
124130.BR \-\- skip =[tlacnmp]
125131.RS
126132skip parts of the process: Tokenizer (t), Lemmatizer (l), Morphological
127- Analyzer (a), Chunker (c), Named Entity Recognition (n), Multi-Word Units (m) or Parser (p).
133+ Analyzer (a), Chunker (c), Named Entity Recognition (n), Multi-Word Units (m)
134+ or Parser (p).
135+
136+ The Tagger cannot be skipped.
128137
129138Skipping the Multiword Unit implies disabling the Parser too.
130139.RE
131140
132141.BR \-\- alpino
133142.RS
134- Use a locally installed Alpino parser
143+ Use a locally installed Alpino parser. Disables our build-in Dependency parser
135144.RE
136145
137146.BR \-\- alpino =server
@@ -154,9 +163,14 @@ Run Frog as a server on 'port'
154163.RS
155164process 'file'.
156165
157- \- t can be omitted. Frog will run on any <file> found on the command-line.
166+ This option can be omitted. Frog will run on any <file> found on the
167+ qcommand-line.
158168Wildcards are allowed too. When NO files are specified, Frog will start in
159169interactive mode.
170+
171+ Files with the extension '.gz' or '.bz2' are handled too. The corresponding
172+ output-files will be compressed using the same compression again. Except
173+ when an explicit output filename is specified.
160174.RE
161175
162176.BR \- x " <xmlfile>"
@@ -165,13 +179,20 @@ process 'xmlfile', which is supposed to be in FoLiA format! If 'xmlfile' is
165179empty, and
166180.BR \-\- testdir =<dir>
167181is provided, all '.xml' files in 'dir' will be processed as FoLia XML.
182+
183+ This option can be omitted. Frog will process files with the 'xml' extension
184+ as FoLiA files.
185+
186+ Files with the extension '.xml.gz' or '.xml.bz2' are handled too. The
187+ corresponding output-files will be compressed using the same compression again.
188+ Except when an explicit output filename is specified.
168189.RE
169190
170191.BR \- X " <xmlfile>"
171192.RS
172193When 'xmlfile' is specified, create a FoLiA XML output file with that name.
173194
174- When 'xmlfile' is empty, generate XML output for every inputfile.
195+ When 'xmlfile' is empty, generate FoLiA XML output for every inputfile.
175196.RE
176197
177198.BR \-\- textclass " =<cls>"
@@ -182,7 +203,6 @@ is given, use 'cls' to find AND store text in the FoLiA document(s).
182203Using \-\- inputclass and \-\- \o utpu tclass is in general a better choice.
183204.RE
184205
185-
186206.BR \-\- inputclass " =<cls>"
187207.RS
188208use 'cls' to find text in the FoLiA input document(s).
@@ -196,16 +216,11 @@ Preferably this is another class then the inputclass.
196216
197217.BR \-\- testdir =<dir>
198218.RS
199- process all files in 'dir'. When the input mode is XML, only '.xml' files are
200- teken from 'dir'. see also
219+ process all files in 'dir'. When the input mode is XML, only '.xml' files,
220+ ' .xml.gz' or '.xml.bz2' files are taken from 'dir'. see also
201221.B \-\- outputdir
202222.RE
203223
204- .BR \-\- tmpdir =<dir>
205- .RS
206- location to store intermediate files. Default /tmp. NOT USED!
207- .RE
208-
209224.BR \-\- uttmarker =<mark>
210225.RS
211226assume all utterances are separated by 'mark'. (the default is none).
@@ -308,3 +323,6 @@ Antal van den Bosch
308323e\- mail: lamasoftware@science.ru.nl
309324.SH SEE ALSO
310325.BR ucto (1)
326+ .BR mblem (1)
327+ .BR mbma (1)
328+ .BR ner (1)
0 commit comments