您当前的位置: 首页 >  ar
  • 0浏览

    0关注

    2393博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

ML之sklearn:sklearn.linear_mode中的LogisticRegression函数的简介、使用方法之详细攻略

一个处女座的程序猿 发布时间:2020-07-21 22:41:16 ,浏览量:0

ML之sklearn:sklearn.linear_mode中的LogisticRegression函数的简介、使用方法之详细攻略

 

 

 

目录

sklearn.linear_mode中的LogisticRegression函数的简介、使用方法

 

 

 

 

 

sklearn.linear_mode中的LogisticRegression函数的简介、使用方法

 

class LogisticRegression Found at: sklearn.linear_model._logisticclass LogisticRegression(BaseEstimator, LinearClassifierMixin,  SparseCoefMixin):     """     Logistic Regression (aka logit, MaxEnt) classifier.     In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme if the 'multi_class' option is set to 'ovr', and uses the cross-entropy loss if the 'multi_class' option is set to 'multinomial'. (Currently the 'multinomial' option is supported only by the 'lbfgs', 'sag', 'saga' and 'newton-cg' solvers.)          This class implements regularized logistic regression using the 'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note that regularization is applied by default**. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied).          The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization with primal formulation, or no regularization. The 'liblinear' solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. The Elastic-Net regularization is only supported by the 'saga' solver.          Read more in the :ref:`User Guide `.

 

 

逻辑回归(又名logit, MaxEnt)分类器。 在多类情况下,如果“multi_class”选项设置为“OvR”,训练算法使用one vs-rest (OvR)方案,如果“multi_class”选项设置为“多项”,训练算法使用交叉熵损失。(目前,“多项”选项仅由“lbfgs”、“sag”、“saga”和“newton-cg”求解器支持。)

这个类使用“liblinear”库、“newton-cg”、“sag”、“saga”和“lbfgs”求解器实现正则逻辑回归。**注意正则化是在默认情况下应用的**。它可以处理稠密和稀疏输入。使用C-ordered数组或包含64位浮点数的CSR矩阵,以获得最佳性能;任何其他输入格式都将被转换(和复制)。“newton-cg”、“sag”和“lbfgs”求解器只支持使用原始公式的L2正则化,或者不支持正则化。“liblinear”求解器支持L1和L2正则化,只有L2惩罚的对偶公式。弹性网正则化仅由“saga”求解器支持。 详见:ref: ' User Guide '。

  Parameters     ----------     penalty : {'l1', 'l2', 'elasticnet', 'none'}, default='l2'     Used to specify the norm used in the penalization. The 'newton-cg', 'sag' and 'lbfgs' solvers support only l2 penalties. 'elasticnet' is only supported by the 'saga' solver. If 'none' (not supported by the liblinear solver), no regularization is applied.          .. versionadded:: 0.19     l1 penalty with SAGA solver (allowing 'multinomial' + L1)          dual : bool, default=False     Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.          tol : float, default=1e-4     Tolerance for stopping criteria.          C : float, default=1.0     Inverse of regularization strength; must be a positive float.  Like in support vector machines, smaller values specify stronger regularization.          fit_intercept : bool, default=True     Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.          intercept_scaling : float, default=1     Useful only when the solver 'liblinear' is used  and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a "synthetic" feature with constant value equal to intercept_scaling is appended to the instance vector.The intercept becomes ``intercept_scaling * synthetic_feature_weight``.          Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.          class_weight : dict or 'balanced', default=None     Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one.          The "balanced" mode uses the values of y to automatically adjust  weights inversely proportional to class frequencies in the input data  as ``n_samples / (n_classes * np.bincount(y))``.          Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.          .. versionadded:: 0.17     *class_weight='balanced'*          random_state : int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary ` for details.          solver : {'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'}, \ default='lbfgs'          Algorithm to use in the optimization problem.          - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones.     - For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs'  handle multinomial loss; 'liblinear' is limited to one-versus-rest  schemes.     - 'newton-cg', 'lbfgs', 'sag' and 'saga' handle L2 or no penalty     - 'liblinear' and 'saga' also handle L1 penalty     - 'saga' also supports 'elasticnet' penalty     - 'liblinear' does not support setting ``penalty='none'``          Note that 'sag' and 'saga' fast convergence is only guaranteed on  features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.参数 --------- 处罚:{l1, l2,‘elasticnet’,‘没有’},默认=“l2” 用于指定在处罚中使用的规范。“newton-cg”,“sag”和“lbfgs”求解器只支持l2惩罚。“elasticnet”仅由“saga”求解器支持。如果“none”(liblinear求解器不支持),则不应用正则化。 . .versionadded:: 0.19 l1惩罚与SAGA求解器(允许“多项”+ l1) bool,默认=False 双重或原始配方。对偶公式仅适用于l2罚用线性求解器。当n_samples > n_features时,preferred dual=False。 tol:浮动,默认=1e-4 停止标准的容忍度。 C: float, default=1.0正则化强度的逆;必须是正浮点数。与支持向量机一样,值越小,正则化越强。 fit_intercept: bool,默认=True 指定一个常数(即偏差或拦截)是否应该添加到决策函数中。 intercept_scaling:浮动,默认=1 只有在使用“liblinear”求解器和self时才有用。fit_intercept设置为True。在这种情况下,x变成[x, self。intercept_scaling],即。一个常数值等于intercept_scaling的“合成”特性被附加到实例向量中。拦截变成' ' intercept_scaling * synthetic_feature_weight ' '。 注意!合成特征权重与所有其他特征一样,采用l1/l2正则化。为了减少正则化对合成特征权重的影响(因此对拦截的影响),必须增加intercept_scaling。 class_weight: dict或'balanced',默认为None 以' ' {class_label: weight} ' ' '形式关联类的权重。如果没有给出,所有类的权重都应该是1。 “平衡”模式使用y的值自动调整权重与输入数据中的类频率成反比,如' ' n_samples / (n_classes * np.bincount(y)) ' '。 注意,如果指定了sample_weight,那么这些权重将与sample_weight相乘(通过fit方法传递)。 . .versionadded:: 0.17 * class_weight = '平衡' * random_state: int, RandomState instance, default=None,当' ' solver ' ' = 'sag', 'saga'或'liblinear'洗发数据时使用。详见:term: ' Glossary '。 解决:{‘newton-cg’,‘lbfgs’,‘liblinear’,“凹陷”,“传奇”},\默认=“lbfgs” 算法用于优化问题。 对于小数据集,“liblinear”是一个不错的选择,而“sag”和“saga”对于大数据集更快。 -对于多类问题,只有“newton-cg”、“sag”、“saga”和“lbfgs”处理多项损失;“liblinear”仅限于“一对二”方案。 - 'newton-cg', 'lbfgs', 'sag'和'saga'处理L2或没有处罚 -“liblinear”和“saga”也可以处理L1惩罚 -《英雄传奇》也支持《弹性网》的惩罚 - 'liblinear'不支持设置' ' penalty='none' ' ' 请注意,“sag”和“saga”的快速收敛只能保证在大致相同规模的特性上。您可以使用sklearn.preprocessing中的scaler对数据进行预处理。

  .. versionadded:: 0.17     Stochastic Average Gradient descent solver.     .. versionadded:: 0.19     SAGA solver.     .. versionchanged:: 0.22     The default solver changed from 'liblinear' to 'lbfgs' in 0.22.          max_iter : int, default=100     Maximum number of iterations taken for the solvers to converge.          multi_class : {'auto', 'ovr', 'multinomial'}, default='auto'     If the option chosen is 'ovr', then a binary problem is fit for each label. For 'multinomial' the loss minimised is the multinomial loss fit  across the entire probability distribution, *even when the data is binary*. 'multinomial' is unavailable when solver='liblinear'.  'auto' selects 'ovr' if the data is binary, or if solver='liblinear',  and otherwise selects 'multinomial'.          .. versionadded:: 0.18     Stochastic Average Gradient descent solver for 'multinomial' case.     .. versionchanged:: 0.22     Default changed from 'ovr' to 'auto' in 0.22.          verbose : int, default=0     For the liblinear and lbfgs solvers set verbose to any positive  number for verbosity.          warm_start : bool, default=False     When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See :term:`the Glossary `.          .. versionadded:: 0.17     *warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers.          n_jobs : int, default=None     Number of CPU cores used when parallelizing over classes if  multi_class='ovr'". This parameter is ignored when the ``solver`` is set to 'liblinear' regardless of whether 'multi_class' is specified or not. ``None`` means 1 unless in a :obj:`joblib.parallel_backend`  context. ``-1`` means using all processors.     See :term:`Glossary ` for more details.          l1_ratio : float, default=None     The Elastic-Net mixing parameter, with ``0 1):                 raise ValueError(                     "l1_ratio must be between 0 and 1;"                     " got (l1_ratio=%r)" %                      self.l1_ratio)         elif self.l1_ratio is not None:             warnings.warn("l1_ratio parameter is only used when penalty is "                 "'elasticnet'. Got "                 "(penalty={})".                 format(self.penalty))         if self.penalty == 'none':             if self.C != 1.0: # default values                 warnings.warn("Setting penalty='none' will ignore the C and                   l1_ratio "                     "parameters")                     # Note that check for l1_ratio is done right above             C_ = np.inf             penalty = 'l2'         else:             C_ = self.C             penalty = self.penalty         if not isinstance(self.max_iter, numbers.Number) or self.max_iter < 0:             raise ValueError("Maximum number of iteration must be positive;"                 " got (max_iter=%r)" %                  self.max_iter)         if not isinstance(self.tol, numbers.Number) or self.tol < 0:             raise ValueError("Tolerance for stopping criteria must be "                 "positive; got (tol=%r)" %                  self.tol)         if solver == 'lbfgs':             _dtype = np.float64         else:             _dtype = [np.float64, np.float32]         X, y = self._validate_data(X, y, accept_sparse='csr', dtype=_dtype,           order="C",              accept_large_sparse=solver != 'liblinear')         check_classification_targets(y)         self.classes_ = np.unique(y)         multi_class = _check_multi_class(self.multi_class, solver,              len(self.classes_))         if solver == 'liblinear':             if effective_n_jobs(self.n_jobs) != 1:                 warnings.warn("'n_jobs' > 1 does not have any effect when"                     " 'solver' is set to 'liblinear'. Got 'n_jobs'"                     " = {}.".                     format(effective_n_jobs(self.n_jobs)))             self.coef_, self.intercept_, n_iter_ = _fit_liblinear(X, y, self.C, self.              fit_intercept, self.intercept_scaling, self.class_weight, self.penalty, self.              dual, self.verbose, self.max_iter, self.tol, self.random_state,                  sample_weight=sample_weight)             self.n_iter_ = np.array([n_iter_])             return self         if solver in ['sag', 'saga']:             max_squared_sum = row_norms(X, squared=True).max()         else:             max_squared_sum = None         n_classes = len(self.classes_)         classes_ = self.classes_         if n_classes < 2:             raise ValueError(                 "This solver needs samples of at least 2 classes"                 " in the data, but the data contains only one"                 " class: %r" %                  classes_[0])         if len(self.classes_) == 2:             n_classes = 1             classes_ = classes_[1:]         if self.warm_start:             warm_start_coef = getattr(self, 'coef_', None)         else:             warm_start_coef = None         if warm_start_coef is not None and self.fit_intercept:             warm_start_coef = np.append(warm_start_coef,                  self.intercept_[:np.newaxis],                  axis=1)         self.coef_ = list()         self.intercept_ = np.zeros(n_classes)         # Hack so that we iterate only once for the multinomial case.         if multi_class == 'multinomial':             classes_ = [None]             warm_start_coef = [warm_start_coef]         if warm_start_coef is None:             warm_start_coef = [None] * n_classes         path_func = delayed(_logistic_regression_path)         # The SAG solver releases the GIL so it's more efficient to use         # threads for this solver.         if solver in ['sag', 'saga']:             prefer = 'threads'         else:             prefer = 'processes'         fold_coefs_ = Parallel(n_jobs=self.n_jobs, verbose=self.verbose, **             _joblib_parallel_args(prefer=prefer))(             path_func(X, y, pos_class=class_, Cs=[C_],                  l1_ratio=self.l1_ratio, fit_intercept=self.fit_intercept,                  tol=self.tol, verbose=self.verbose, solver=solver,                  multi_class=multi_class, max_iter=self.max_iter,                  class_weight=self.class_weight, check_input=False,                  random_state=self.random_state, coef=warm_start_coef_,                  penalty=penalty, max_squared_sum=max_squared_sum,                  sample_weight=sample_weight) for              (class_, warm_start_coef_) in zip(classes_, warm_start_coef))         fold_coefs_, _, n_iter_ = zip(*fold_coefs_)         self.n_iter_ = np.asarray(n_iter_, dtype=np.int32)[:0]         n_features = X.shape[1]         if multi_class == 'multinomial':             self.coef_ = fold_coefs_[0][0]         else:             self.coef_ = np.asarray(fold_coefs_)             self.coef_ = self.coef_.reshape(n_classes, n_features +                  int(self.fit_intercept))         if self.fit_intercept:             self.intercept_ = self.coef_[:-1]             self.coef_ = self.coef_[::-1]         return self          def predict_proba(self, X):         """         Probability estimates.

        The returned estimates for all classes are ordered by the label of classes.  For a multi_class problem, if multi_class is set to be "multinomial"  the softmax function is used to find the predicted probability of  each class.         Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function.  and normalize these values across all the classes.

        Parameters         ----------         X : array-like of shape (n_samples, n_features)             Vector to be scored, where `n_samples` is the number of samples  and `n_features` is the number of features.

        Returns         -------         T : array-like of shape (n_samples, n_classes)             Returns the probability of the sample for each class in the model, where classes are ordered as they are in ``self.classes_``.         """         check_is_fitted(self)         ovr = self.multi_class in ["ovr", "warn"] or (self.multi_class == 'auto'           and (self.classes_.size

关注
打赏
1664196048
查看更多评论
立即登录/注册

微信扫码登录

0.0467s