ML之NB:利用朴素贝叶斯NB算法(TfidfVectorizer+不去除停用词)对20类新闻文本数据集进行分类预测、评估
目录
输出结果
设计思路
核心代码
输出结果
设计思路

核心代码
class TfidfVectorizer Found at: sklearn.feature_extraction.text
class TfidfVectorizer(CountVectorizer):
"""Convert a collection of raw documents to a matrix of TF-IDF features.
Equivalent to CountVectorizer followed by TfidfTransformer.
Read more in the :ref:`User Guide `.
Parameters
----------
input : string {'filename', 'file', 'content'}
If 'filename', the sequence passed as an argument to fit is
expected to be a list of filenames that need reading to fetch
the raw content to analyze.
If 'file', the sequence items must have a 'read' method (file-like
object) that is called to fetch the bytes in memory.
Otherwise the input is expected to be the sequence strings or
bytes items are expected to be analyzed directly.
encoding : string, 'utf-8' by default.
If bytes or files are given to analyze, this encoding is used to
decode.
decode_error : {'strict', 'ignore', 'replace'}
Instruction on what to do if a byte sequence is given to analyze that
contains characters not of the given `encoding`. By default, it is
'strict', meaning that a UnicodeDecodeError will be raised. Other
values are 'ignore' and 'replace'.
strip_accents : {'ascii', 'unicode', None}
Remove accents during the preprocessing step.
'ascii' is a fast method that only works on characters that have
an direct ASCII mapping.
'unicode' is a slightly slower method that works on any characters.
None (default) does nothing.
analyzer : string, {'word', 'char'} or callable
Whether the feature should be made of word or character n-grams.
If a callable is passed it is used to extract the sequence of features
out of the raw, unprocessed input.
preprocessor : callable or None (default)
Override the preprocessing (string transformation) stage while
preserving the tokenizing and n-grams generation steps.
tokenizer : callable or None (default)
Override the string tokenization step while preserving the
preprocessing and n-grams generation steps.
Only applies if ``analyzer == 'word'``.
ngram_range : tuple (min_n, max_n)
The lower and upper boundary of the range of n-values for different
n-grams to be extracted. All values of n such that min_n
关注
打赏
最近更新
- 深拷贝和浅拷贝的区别(重点)
- 【Vue】走进Vue框架世界
- 【云服务器】项目部署—搭建网站—vue电商后台管理系统
- 【React介绍】 一文带你深入React
- 【React】React组件实例的三大属性之state,props,refs(你学废了吗)
- 【脚手架VueCLI】从零开始,创建一个VUE项目
- 【React】深入理解React组件生命周期----图文详解(含代码)
- 【React】DOM的Diffing算法是什么?以及DOM中key的作用----经典面试题
- 【React】1_使用React脚手架创建项目步骤--------详解(含项目结构说明)
- 【React】2_如何使用react脚手架写一个简单的页面?