用于中文的Rasa NLU,从RasaHQ/rasa_nlu分支而来。
请参考Rasa NLU官方文档获取最新指南
中文博客
你应该拥有的文件:
- data/total_word_feature_extractor_zh.dat
由MITIE wordrep工具从中文语料库训练而成(训练需要2-3天)
如需训练,请构建MITIE Wordrep Tool。注意,中文语料在输入工具进行训练前应先进行分词。最适合用户案例的领域相关语料效果最佳。
从中文维基百科和百度百科训练的模型可以从中文博客下载。
- data/examples/rasa/demo-rasa_zh.json
应尽可能添加更多示例。
使用方法:
- 克隆此项目,并运行
python setup.py install
-
修改配置。
目前对于中文,我们有两种管道:
使用MITIE+Jieba (sample_configs/config_jieba_mitie.yml):
language: "zh"
pipeline:
- name: "nlp_mitie"
model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_classifier_mitie"
推荐:使用MITIE+Jieba+sklearn (sample_configs/config_jieba_mitie_sklearn.yml):
language: "zh"
pipeline:
- name: "nlp_mitie"
model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_mitie"
- name: "intent_classifier_sklearn"
-
(可选)使用结巴用户自定义词典或切换结巴默认词典:
你可以将文件路径或目录路径作为"user_dicts"的值。(sample_configs/config_jieba_mitie_sklearn_plus_dict_path.yml)
language: "zh"
pipeline:
- name: "nlp_mitie"
model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
default_dict: "./default_dict.big"
user_dicts: "./jieba_userdict"
# user_dicts: "./jieba_userdict/jieba_userdict.txt"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_mitie"
- name: "intent_classifier_sklearn"
-
通过运行以下命令训练模型:
如果你在配置文件中指定了项目名称,这将把你的模型保存在/models/your_project_name。
否则,你的模型将被保存在/models/default
python -m rasa_nlu.train -c sample_configs/config_jieba_mitie_sklearn.yml --data data/examples/rasa/demo-rasa_zh.json --path models
- 运行rasa_nlu服务器:
python -m rasa_nlu.server -c sample_configs/config_jieba_mitie_sklearn.yml --path models
- 打开一个新终端,现在你可以从服务器获取结果,例如:
$ curl -XPOST localhost:5000/parse -d '{"q":"我发烧了该吃什么药?", "project": "rasa_nlu_test", "model": "model_20170921-170911"}' | python -mjson.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 652 0 552 100 100 157 28 0:00:03 0:00:03 --:--:-- 157
{
"entities": [
{
"end": 3,
"entity": "disease",
"extractor": "ner_mitie",
"start": 1,
"value": "发烧"
}
],
"intent": {
"confidence": 0.5397186422631861,
"name": "medical"
},
"intent_ranking": [
{
"confidence": 0.5397186422631861,
"name": "medical"
},
{
"confidence": 0.16206323981749196,
"name": "restaurant_search"
},
{
"confidence": 0.1212448457737397,
"name": "affirm"
},
{
"confidence": 0.10333600028547868,
"name": "goodbye"
},
{
"confidence": 0.07363727186010374,
"name": "greet"
}
],
"text": "我发烧了该吃什么药?"
}