.. image:: https://github.com/seatgeek/thefuzz/actions/workflows/ci.yml/badge.svg :target: https://github.com/seatgeek/thefuzz
TheFuzz
像老板一样进行模糊字符串匹配。它使用莱文斯坦距离 <https://en.wikipedia.org/wiki/Levenshtein_distance>
_来计算序列之间的差异,并提供简单易用的包。
要求
- Python 3.8 或更高版本
rapidfuzz <https://github.com/maxbachmann/RapidFuzz/>
_
测试需要
- pycodestyle
- hypothesis
- pytest
安装
============
通过PyPI使用pip
.. code:: bash
pip install thefuzz
通过GitHub使用pip
.. code:: bash
pip install git+git://github.com/seatgeek/thefuzz.git@0.19.0#egg=thefuzz
添加到你的``requirements.txt``文件中(之后运行``pip install -r requirements.txt``)
.. code:: bash
git+ssh://git@github.com/seatgeek/thefuzz.git@0.19.0#egg=thefuzz
通过GIT手动安装
.. code:: bash
git clone git://github.com/seatgeek/thefuzz.git thefuzz
cd thefuzz
python setup.py install
使用方法
=====
.. code:: python
>>> from thefuzz import fuzz
>>> from thefuzz import process
简单比率
.. code:: python
>>> fuzz.ratio("this is a test", "this is a test!")
97
部分比率
.. code:: python
>>> fuzz.partial_ratio("this is a test", "this is a test!")
100
标记排序比率
.. code:: python
>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
100
标记集比率
.. code:: python
>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100
部分标记排序比率
.. code:: python
>>> fuzz.token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
84
>>> fuzz.partial_token_sort_ratio("fuzzy was a bear", "wuzzy fuzzy was a bear")
100
处理
.. code:: python
>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
[('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
("Dallas Cowboys", 90)
你还可以向``extractOne``方法传递额外的参数,使其使用特定的评分器。一个典型的用例是匹配文件路径:
.. code:: python
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)
.. |Build Status| image:: https://github.com/seatgeek/thefuzz/actions/workflows/ci.yml/badge.svg
:target: https://github.com/seatgeek/thefuzz