Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget
A blazing fast and lightweight Information Extraction model for Entity Linking and Relation Extraction.
🛠️ Installation
Installation from PyPI
pip install relik
Other installation options
Install with optional dependencies
Install with all the optional dependencies.
pip install relik[all]
Install with optional dependencies for training and evaluation.
pip install relik[train]
Install with optional dependencies for FAISS
FAISS PyPI package is only available for CPU. For GPU, install it from source or use the conda package.
For CPU:
pip install relik[faiss]
For GPU:
conda create -n relik python=3.10
conda activate relik
# install pytorch
conda install -y pytorch=2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
# GPU
conda install -y -c pytorch -c nvidia faiss-gpu=1.8.0
# or GPU with NVIDIA RAFT
conda install -y -c pytorch -c nvidia -c rapidsai -c conda-forge faiss-gpu-raft=1.8.0
pip install relik
Install with optional dependencies for serving the models with FastAPI and Ray.
pip install relik[serve]
Installation from source
git clone https://github.com/SapienzaNLP/relik.git
cd relik
pip install -e .[all]
🤖 Models
New models:
- ReLiK Small for Entity Linking (🆕🤏⚡ Tiny and Fast):
sapienzanlp/relik-entity-linking-small
- ReLiK Large for Closed Information Extraction (🔥 EL + RE):
relik-ie/relik-cie-small
- ReLiK Large for Entity Linking (🔥 EL for the wild):
relik-ie/relik-entity-linking-large-robust
- ReLiK Large for Entity Linking (🔥 RE + NER):
relik-ie/relik-relation-extraction-small-wikipedia-ner
Models from the paper:
- ReLiK Large for Entity Linking (📝 Paper version):
sapienzanlp/relik-entity-linking-large
- ReLik Base for Entity Linking (📝 Paper version):
sapienzanlp/relik-entity-linking-base
- ReLiK Large for Relation Extraction (📝 Paper version):
sapienzanlp/relik-relation-extraction-nyt-large
A full list of models can be found on 🤗 Hugging Face.
Other models sizes will be available in the future 👀.
🚀 Quick Start
ReLiK is a lightweight and fast model for Entity Linking and Relation Extraction.
It is composed of two main components: a retriever and a reader.
The retriever is responsible for retrieving relevant documents from a large collection,
while the reader is responsible for extracting entities and relations from the retrieved documents.
ReLiK can be used with the from_pretrained
method to load a pre-trained pipeline.
Here is an example of how to use ReLiK for Entity Linking:
from relik import Relik
from relik.inference.data.objects import RelikOutput
relik = Relik.from_pretrained("sapienzanlp/relik-entity-linking-large")
relik_out: RelikOutput = relik("Michael Jordan was one of the best players in the NBA.")
Output:
RelikOutput(
text="Michael Jordan was one of the best players in the NBA.",
tokens=['Michael', 'Jordan', 'was', 'one', 'of', 'the', 'best', 'players', 'in', 'the', 'NBA', '.'],
id=0,
spans=[
Span(start=0, end=14, label="Michael Jordan", text="Michael Jordan"),
Span(start=50, end=53, label="National Basketball Association", text="NBA"),
],
triples=[],
candidates=Candidates(
span=[
[
[
{"text": "Michael Jordan", "id": 4484083},
{"text": "National Basketball Association", "id": 5209815},
{"text": "Walter Jordan", "id": 2340190},
{"text": "Jordan", "id": 3486773},
{"text": "50 Greatest Players in NBA History", "id": 1742909},
...
]
]
]
),
)
and for Relation Extraction:
from relik import Relik
from relik.inference.data.objects import RelikOutput
relik = Relik.from_pretrained("sapienzanlp/relik-relation-extraction-nyt-large")
relik_out: RelikOutput = relik("Michael Jordan was one of the best players in the NBA.")
Output:
RelikOutput(
text='Michael Jordan was one of the best players in the NBA.',
tokens=Michael Jordan was one of the best players in the NBA.,
id=0,
spans=[
Span(start=0, end=14, label='--NME--', text='Michael Jordan'),
Span(start=50, end=53, label='--NME--', text='NBA')
],
triplets=[
Triplets(
subject=Span(start=0, end=14, label='--NME--', text='Michael Jordan'),
label='company',
object=Span(start=50, end=53, label='--NME--', text='NBA'),
confidence=1.0
)
],
candidates=Candidates(
span=[],
triplet=[
[
[
{"text": "company", "id": 4, "metadata": {"definition": "company of this person"}},
{"text": "nationality", "id": 10, "metadata": {"definition": "nationality of this person or entity"}},