Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget

A blazing fast and lightweight Information Extraction model for Entity Linking and Relation Extraction.

🛠️ Installation

Installation from PyPI

pip install relik

Other installation options

Install with optional dependencies

Install with all the optional dependencies.

pip install relik[all]

Install with optional dependencies for training and evaluation.

pip install relik[train]

Install with optional dependencies for FAISS

FAISS PyPI package is only available for CPU. For GPU, install it from source or use the conda package.

For CPU:

pip install relik[faiss]

For GPU:

conda create -n relik python=3.10
conda activate relik

# install pytorch
conda install -y pytorch=2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia

# GPU
conda install -y -c pytorch -c nvidia faiss-gpu=1.8.0
# or GPU with NVIDIA RAFT
conda install -y -c pytorch -c nvidia -c rapidsai -c conda-forge faiss-gpu-raft=1.8.0

pip install relik

Install with optional dependencies for serving the models with FastAPI and Ray.

pip install relik[serve]

Installation from source

git clone https://github.com/SapienzaNLP/relik.git
cd relik
pip install -e .[all]

🤖 Models

New models:

ReLiK Small for Entity Linking (🆕🤏⚡ Tiny and Fast): sapienzanlp/relik-entity-linking-small
ReLiK Large for Closed Information Extraction (🔥 EL + RE): relik-ie/relik-cie-small
ReLiK Large for Entity Linking (🔥 EL for the wild): relik-ie/relik-entity-linking-large-robust
ReLiK Large for Entity Linking (🔥 RE + NER): relik-ie/relik-relation-extraction-small-wikipedia-ner

Models from the paper:

ReLiK Large for Entity Linking (📝 Paper version): sapienzanlp/relik-entity-linking-large
ReLik Base for Entity Linking (📝 Paper version): sapienzanlp/relik-entity-linking-base
ReLiK Large for Relation Extraction (📝 Paper version): sapienzanlp/relik-relation-extraction-nyt-large

A full list of models can be found on 🤗 Hugging Face.

Other models sizes will be available in the future 👀.

🚀 Quick Start

ReLiK is a lightweight and fast model for Entity Linking and Relation Extraction. It is composed of two main components: a retriever and a reader. The retriever is responsible for retrieving relevant documents from a large collection, while the reader is responsible for extracting entities and relations from the retrieved documents. ReLiK can be used with the from_pretrained method to load a pre-trained pipeline.

Here is an example of how to use ReLiK for Entity Linking:

from relik import Relik
from relik.inference.data.objects import RelikOutput

relik = Relik.from_pretrained("sapienzanlp/relik-entity-linking-large")
relik_out: RelikOutput = relik("Michael Jordan was one of the best players in the NBA.")

Output:

RelikOutput(
  text="Michael Jordan was one of the best players in the NBA.",
  tokens=['Michael', 'Jordan', 'was', 'one', 'of', 'the', 'best', 'players', 'in', 'the', 'NBA', '.'],
  id=0,
  spans=[
      Span(start=0, end=14, label="Michael Jordan", text="Michael Jordan"),
      Span(start=50, end=53, label="National Basketball Association", text="NBA"),
  ],
  triples=[],
  candidates=Candidates(
      span=[
          [
              [
                  {"text": "Michael Jordan", "id": 4484083},
                  {"text": "National Basketball Association", "id": 5209815},
                  {"text": "Walter Jordan", "id": 2340190},
                  {"text": "Jordan", "id": 3486773},
                  {"text": "50 Greatest Players in NBA History", "id": 1742909},
                  ...
              ]
          ]
      ]
  ),
)

and for Relation Extraction:

from relik import Relik
from relik.inference.data.objects import RelikOutput

relik = Relik.from_pretrained("sapienzanlp/relik-relation-extraction-nyt-large")
relik_out: RelikOutput = relik("Michael Jordan was one of the best players in the NBA.")

Output:

RelikOutput(
  text='Michael Jordan was one of the best players in the NBA.', 
  tokens=Michael Jordan was one of the best players in the NBA., 
  id=0, 
  spans=[
    Span(start=0, end=14, label='--NME--', text='Michael Jordan'), 
    Span(start=50, end=53, label='--NME--', text='NBA')
  ], 
  triplets=[
    Triplets(
      subject=Span(start=0, end=14, label='--NME--', text='Michael Jordan'), 
      label='company', 
      object=Span(start=50, end=53, label='--NME--', text='NBA'), 
      confidence=1.0
      )
  ], 
  candidates=Candidates(
    span=[], 
    triplet=[
              [
                [
                  {"text": "company", "id": 4, "metadata": {"definition": "company of this person"}}, 
                  {"text": "nationality", "id": 10, "metadata": {"definition": "nationality of this person or entity"}},