Skip to content

Commit fe7a380

Browse files
committed
Update documentation according to Gemini suggestion.
1 parent d8bbd39 commit fe7a380

5 files changed

Lines changed: 180 additions & 162 deletions

File tree

README.pod

Lines changed: 61 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
=head1 NAME
22

3-
Algorithm::LibLinear - A Perl binding for LIBLINEAR, a library for classification/regression using linear SVM and logistic regression.
3+
Algorithm::LibLinear - A Perl binding for LIBLINEAR, a library for classification, regression, and outlier detection using linear Support Vector Machines (SVM) and logistic regression
44

55
=head1 SYNOPSIS
66

77
use Algorithm::LibLinear;
8-
# Constructs a model for L2-regularized L2 loss support vector classification.
8+
9+
# Instantiate a learner for L2-regularized L2-loss support vector classification (SVC).
910
my $learner = Algorithm::LibLinear->new(
1011
cost => 1,
1112
epsilon => 0.01,
@@ -15,15 +16,20 @@ Algorithm::LibLinear - A Perl binding for LIBLINEAR, a library for classificatio
1516
+{ label => -1, weight => 1, },
1617
],
1718
);
18-
# Loads a training data set from DATA filehandle.
19+
20+
# Load a training dataset from the DATA filehandle.
1921
my $data_set = Algorithm::LibLinear::DataSet->load(fh => \*DATA);
20-
# Updates training parameter.
22+
23+
# Automatically find optimal parameters.
2124
$learner->find_parameters(data_set => $data_set, num_folds => 5, update => 1);
22-
# Executes cross validation.
25+
26+
# Perform cross-validation to evaluate performance.
2327
my $accuracy = $learner->cross_validation(data_set => $data_set, num_folds => 5);
24-
# Executes training.
28+
29+
# Train the model on the dataset.
2530
my $classifier = $learner->train(data_set => $data_set);
26-
# Determines which (+1 or -1) is the class for the given feature to belong.
31+
32+
# Predict the class label (+1 or -1) for a given feature vector.
2733
my $class_label = $classifier->predict(feature => +{ 1 => 0.38, 2 => -0.5, ... });
2834

2935
__DATA__
@@ -36,138 +42,139 @@ Algorithm::LibLinear - A Perl binding for LIBLINEAR, a library for classificatio
3642

3743
=head1 DESCRIPTION
3844

39-
Algorithm::LibLinear is an XS module that provides features of LIBLINEAR, a fast C library for classification and regression.
45+
C<Algorithm::LibLinear> is an XS binding for LIBLINEAR, a fast C/C++ library for linear classification, regression, and outlier detection.
4046

41-
Current version is based on LIBLINEAR 2.48, released on January 5, 2025.
47+
This version is compatible with LIBLINEAR 2.48, released on January 5, 2025.
4248

4349
=head1 METHODS
4450

45-
=head2 new([bias => -1.0] [, cost => 1] [, epsilon => 0.1] [, loss_sensitivity => 0.1] [, nu => 0.5] [, regularize_bias => 1] [, solver => 'L2R_L2LOSS_SVC_DUAL'] [, weights => []])
51+
=head2 new([bias => -1.0] [, cost => 1] [, epsilon => undef] [, loss_sensitivity => 0.1] [, nu => 0.5] [, regularize_bias => 1] [, recalculate_weights => 0] [, solver => 'L2R_L2LOSS_SVC_DUAL'] [, weights => []])
4652

47-
Constructor. You can set several named parameters:
53+
Constructor. Accepts the following optional named parameters, also accessible via getter methods of the same name.
4854

4955
=over 4
5056

5157
=item bias
5258

53-
Bias term to be added to prediction result (i.e., C<-B> option for LIBLINEAR's C<train> command.).
54-
55-
This parameter makes sense only when its value is positive.
59+
The bias term added to feature vectors (corresponding to the C<-B> option of the LIBLINEAR C<train> command). This term is active only when its value is positive.
5660

5761
=item cost
5862

59-
Penalty cost for misclassification (C<-c>.)
63+
The penalty parameter C (corresponding to the C<-c> option).
6064

6165
=item epsilon
6266

63-
Termination criterion (C<-e>.)
64-
65-
Default value of this parameter depends on the value of C<solver>.
67+
The tolerance of the termination criterion (corresponding to the C<-e> option). The default depends on the chosen C<solver>.
6668

6769
=item loss_sensitivity
6870

69-
Epsilon in loss function of SVR (C<-p>.)
71+
The epsilon parameter (p) in the loss function of Support Vector Regression (SVR), corresponding to the C<-p> option.
7072

7173
=item nu
7274

73-
Nu parameter of one-class SVM (C<-n>.)
75+
The nu parameter for one-class SVM (corresponding to the C<-n> option).
7476

7577
=item regularize_bias
7678

77-
Whether to regularize the bias term (C<-R>, negated.)
79+
A boolean indicating whether to include the bias term in regularization (corresponding to the negation of the C<-R> option). Defaults to true.
80+
81+
=item recalculate_weights
82+
83+
A boolean indicating whether to recalculate class weights dynamically. This option is valid only for the dual solvers of L2-regularized L1- or L2-loss Support Vector Classifiers (C<L2R_L1LOSS_SVC_DUAL> and C<L2R_L2LOSS_SVC_DUAL>).
7884

7985
=item solver
8086

81-
Kind of solver (C<-s>.)
87+
The solver type to use (corresponding to the C<-s> option).
8288

83-
For classification:
89+
Supported solvers for classification:
8490

8591
=over 4
8692

87-
=item 'L2R_LR' - L2-regularized logistic regression
93+
=item * 'L2R_LR' - L2-regularized logistic regression
8894

89-
=item 'L2R_L2LOSS_SVC_DUAL' - L2-regularized L2-loss SVC (dual problem)
95+
=item * 'L2R_L2LOSS_SVC_DUAL' - L2-regularized L2-loss SVC (dual)
9096

91-
=item 'L2R_L2LOSS_SVC' - L2-regularized L2-loss SVC (primal problem)
97+
=item * 'L2R_L2LOSS_SVC' - L2-regularized L2-loss SVC (primal)
9298

93-
=item 'L2R_L1LOSS_SVC_DUAL' - L2-regularized L1-loss SVC (dual problem)
99+
=item * 'L2R_L1LOSS_SVC_DUAL' - L2-regularized L1-loss SVC (dual)
94100

95-
=item 'MCSVM_CS' - Crammer-Singer multi-class SVM
101+
=item * 'MCSVM_CS' - Crammer and Singer multi-class SVM
96102

97-
=item 'L1R_L2LOSS_SVC' - L1-regularized L2-loss SVC
103+
=item * 'L1R_L2LOSS_SVC' - L1-regularized L2-loss SVC
98104

99-
=item 'L1R_LR' - L1-regularized logistic regression (primal problem)
105+
=item * 'L1R_LR' - L1-regularized logistic regression (primal)
100106

101-
=item 'L1R_LR_DUAL' - L1-regularized logistic regression (dual problem)
107+
=item * 'L2R_LR_DUAL' - L2-regularized logistic regression (dual)
102108

103109
=back
104110

105-
For regression:
111+
Supported solvers for regression:
106112

107113
=over 4
108114

109-
=item 'L2R_L2LOSS_SVR' - L2-regularized L2-loss SVR (primal problem)
115+
=item * 'L2R_L2LOSS_SVR' - L2-regularized L2-loss SVR (primal)
110116

111-
=item 'L2R_L2LOSS_SVR_DUAL' - L2-regularized L2-loss SVR (dual problem)
117+
=item * 'L2R_L2LOSS_SVR_DUAL' - L2-regularized L2-loss SVR (dual)
112118

113-
=item 'L2R_L1LOSS_SVR_DUAL' - L2-regularized L1-loss SVR (dual problem)
119+
=item * 'L2R_L1LOSS_SVR_DUAL' - L2-regularized L1-loss SVR (dual)
114120

115121
=back
116122

117-
For outlier detection:
123+
Supported solvers for outlier detection:
118124

119125
=over 4
120126

121-
=item 'ONECLASS_SVM' - One-class SVM
127+
=item * 'ONECLASS_SVM' - One-class SVM
122128

123129
=back
124130

125131
=item weights
126132

127-
Weights to adjust the cost parameter of different classes (C<-wi>.)
133+
An array reference used to adjust the penalty cost for specific classes (corresponding to the C<-wi> option).
128134

129-
For example,
135+
For example:
130136

131137
my $learner = Algorithm::LibLinear->new(
132138
weights => [
133139
+{ label => 1, weight => 0.5 },
134-
+{ label => 2, weight => 1 },
140+
+{ label => 2, weight => 1.0 },
135141
+{ label => 3, weight => 0.5 },
136142
],
137143
);
138144

139-
is giving a doubling weight for class 2. This means that samples belonging to class 2 have stronger effect than other samples belonging class 1 or 3 on learning.
145+
This configuration doubles the penalty weight for class 2, making samples belonging to class 2 have twice the influence on training compared to those in classes 1 or 3.
140146

141-
This option is useful when the number of training samples of each class is not balanced.
147+
This option is particularly useful for addressing class imbalance in the training dataset.
142148

143149
=back
144150

145151
=head2 cross_validation(data_set => $data_set, num_folds => $num_folds)
146152

147-
Evaluates training parameter using N-fold cross validation method.
148-
Given data set will be split into N parts. N-1 of them will be used as a training set and the rest 1 part will be used as a test set.
149-
The evaluation iterates N times using each different part as a test set. Then average accuracy is returned as result.
153+
Evaluates training performance using N-fold cross-validation.
154+
The dataset is partitioned into N equal-sized folds. For each fold, the model is trained on the remaining N-1 folds and evaluated on the held-out fold.
155+
156+
Returns the average classification accuracy for classification solvers, or the mean squared error (MSE) for regression solvers.
150157

151158
=head2 find_cost_parameter(data_set => $data_set, num_folds => $num_folds [, initial => -1.0] [, update => 0])
152159

153-
Deprecated. Use C<find_parameters> instead.
160+
B<Deprecated.> Use C<find_parameters> instead.
154161

155-
Shorthand alias for C<find_parameters> only works on C<cost> parameter.
156-
Notice that C<loss_sensitivity> is affected too when C<update> is set.
162+
A convenience wrapper around C<find_parameters> that tunes only the C<cost> parameter. Note that if C<update> is enabled, the C<loss_sensitivity> parameter may also be updated in the process.
157163

158164
=head2 find_parameters(data_set => $data_set, num_folds => $num_folds [, initial_cost => -1.0] [, initial_loss_sensitivity => -1.0] [, update => 0])
159165

160-
Finds the best parameters by N-fold cross validation. If C<initial_cost> or C<initial_loss_sensitivity> is a negative, the value is automatically calculated.
161-
Works only for 3 solvers: C<'L2R_LR'>, C<'L2R_L2LOSS_SVC'> and C<'L2R_L2LOSS_SVR'>. Error will be thrown for otherwise.
166+
Finds the optimal hyperparameters using N-fold cross-validation. If C<initial_cost> or C<initial_loss_sensitivity> is negative, its optimal value is automatically determined.
167+
168+
This method is supported only by the C<'L2R_LR'>, C<'L2R_L2LOSS_SVC'>, and C<'L2R_L2LOSS_SVR'> solvers. It throws an exception if called when using any other solver.
162169

163-
When C<update> is set true, the instance is updated to use the found parameters. This behaviour is disabled by default.
170+
If the C<update> parameter is true, the learner instance is updated in-place with the discovered parameters. This is disabled by default.
164171

165-
Return value is an ArrayRef containing 3 values: found C<cost>, found C<loss_sensitivity> (only if solver is C<'L2R_L2LOSS_SVR'>) and mean accuracy of cross validation with the found parameters.
172+
Returns an array reference containing three elements: the optimal C<cost>, the optimal C<loss_sensitivity> (which is C<undef> unless the solver is C<'L2R_L2LOSS_SVR'>), and the evaluation metric (average accuracy or MSE) achieved with these parameters.
166173

167174
=head2 train(data_set => $data_set)
168175

169-
Executes training and returns a trained L<Algorithm::LibLinear::Model> instance.
170-
C<data_set> is same as the C<cross_validation>'s.
176+
Trains a model on the provided dataset and returns an L<Algorithm::LibLinear::Model> instance.
177+
The C<data_set> argument must be an L<Algorithm::LibLinear::DataSet> instance.
171178

172179
=head1 AUTHOR
173180

0 commit comments

Comments
 (0)