NL2SQL-规则
内容增强的基于BERT的文本到SQL生成 https://arxiv.org/abs/1910.07179
动机
合理:将数据库设计规则融入文本到SQL生成中:
-
我们使用表格单元格和问题字符串的匹配信息来构造一个向量,其长度与问题长度相同。这个问题向量主要改善了WHERE-VALUE推理结果的性能。因为它注入了答案单元格及其对应表头绑定在一起的知识。如果我们定位了答案单元格,那么我们就定位了包含答案单元格的答案列。
-
我们使用所有表头和问题字符串的匹配信息来构造一个向量,其长度与表头长度相同。这个表头向量主要改善了WHERE-COLUMN推理结果的性能。
要求
python 3.6
torch 1.1.0
运行
步骤1
数据准备:下载所有原始数据(https://drive.google.com/file/d/1iJvsf38f16el58H4NPINQ7uzal5-V4v4 或 https://download.csdn.net/download/guotong1988/13008037)并将它们放在`data_and_model`目录下。
然后运行data_and_model/output_entity.py
步骤2
训练和评估:train.py
在不使用执行引导解码的BERT-Base-Uncased上的结果
模型 | 开发集 逻辑形式 准确率 | 开发集 执行 准确率 | 测试集 逻辑形式 准确率 | 测试集 执行 准确率 |
---|---|---|---|---|
SQLova | 80.6 | 86.5 | 80.0 | 85.5 |
我们的方法 | 84.3 | 90.3 | 83.7 | 89.2 |
数据
一个数据视图:
{
"table_id": "1-1000181-1",
"phase": 1,
"question": "Tell me what the notes are for South Australia ",
"question_tok": ["Tell", "me", "what", "the", "notes", "are", "for", "South", "Australia"],
"sql": {
"sel": 5,
"conds": [
[3, 0, "SOUTH AUSTRALIA"]
],
"agg": 0
},
"query": {
"sel": 5,
"conds": [
[3, 0, "SOUTH AUSTRALIA"]
],
"agg": 0
},
"wvi_corenlp": [
[7, 8]
],
"bertindex_knowledge": [0, 0, 0, 0, 4, 0, 0, 1, 3],
"header_knowledge": [2, 0, 0, 2, 0, 1]
}
对应的表格:
{
"id": "1-1000181-1",
"header": ["State/territory", "Text/background colour", "Format", "Current slogan", "Current series", "Notes"],
"rows": [
["Australian Capital Territory", "blue/white", "Yaa·nna", "ACT · CELEBRATION OF A CENTURY 2013", "YIL·00A", "Slogan screenprinted on plate"],
["New South Wales", "black/yellow", "aa·nn·aa", "NEW SOUTH WALES", "BX·99·HI", "No slogan on current series"],
["New South Wales", "black/white", "aaa·nna", "NSW", "CPX·12A", "Optional white slimline series"],
["Northern Territory", "ochre/white", "Ca·nn·aa", "NT · OUTBACK AUSTRALIA", "CB·06·ZZ", "New series began in June 2011"],
["Queensland", "maroon/white", "nnn·aaa", "QUEENSLAND · SUNSHINE STATE", "999·TLG", "Slogan embossed on plate"],
["South Australia", "black/white", "Snnn·aaa", "SOUTH AUSTRALIA", "S000·AZD", "No slogan on current series"],
["Victoria", "blue/white", "aaa·nnn", "VICTORIA - THE PLACE TO BE", "ZZZ·562", "Current series will be exhausted this year"]
]
}
训练好的模型
https://drive.google.com/open?id=18MBm9qzobTBgWPZlpA2EErCQtsMhlTN2