Best way to do traditional NLP? – Michał Pawłowski

We use LLMs for everything nowadays, from adding numbers, to finding a meaning of life. But no, it is not another post about Large Language Models as they are often not optimal (accuracy, reliability, environment, cost)-wise? But it is about good old NLP and about what it seems like use-cases from previous century, but still very usefull and with positive ROI.

Being consumed by LLMs, agents and stuff, I accidentally bumped into transformers based information extraction system and as an exercise tried it to replace some of my nlp code for NER and classification to use GLiNER2:

Extract entities, classify text, parse structured data, and extract relations—all in one efficient model.

TL;DR: the result was shorter code, faster inference, and a much cleaner pipeline.

So what I was trying to do:

extract some names and concepts,
classify documents with custom labels

… but just look at the code and benchmarks

Traditional NER and zero shot classification

# !pip install spacy transformers numpy torch

# !spacy download en_core_web_lg

# Load NLP
import spacy
from transformers import pipeline
import re
import numpy as np
nlp = spacy.load("en_core_web_lg")
classifier = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli"
)

docs_to_classify = [
    "John Smith joined Acme Corp in Jan 2021 and worked there until late 2025 as a senior engineer based in Berlin.",
    "It is always sunny in Filadelphia",
    "For us, Anonumoys Inc. based in Warsaw, Poland, traditional NLP is more predictible then LLMs, at least in February of 2026",
]

def classical_nlp(text:str) -> dict:
    # NER
    doc = nlp(text)

    entities = {
        "PERSON": [],
        "ORG": [],
        "GPE": [],
        "DATE": []
    }
    
    for ent in doc.ents:
        if ent.label_ in entities:
            entities[ent.label_].append(ent.text)
    
    # Classification
    labels = ["employment", "weather", "other"]
    clf = classifier(text, candidate_labels=labels)
    
    doc_type = clf["labels"][np.argmax(clf["scores"])]

    # glue it together
    return {
        "entities": {
        "person": entities["PERSON"],
        "company": entities["ORG"],
        "location": entities["GPE"],
        "date": entities["DATE"],
        },
        "category": doc_type
    }

… same with GLiNER2

# !pip install gliner2

# Load GLiNER
from gliner2 import GLiNER2
model = GLiNER2.from_pretrained("fastino/gliner2-base-v1")

You are using a model of type extractor to instantiate a model of type . This is not supported for all configurations of models and can yield errors.

============================================================
🧠 Model Configuration
============================================================
Encoder model      : microsoft/deberta-v3-base
Counting layer     : count_lstm_v2
Token pooling      : first
============================================================

def gliner2(text: str) -> dict:
    schema = (model.create_schema()
        .entities({
            "person": "A human individual",
            "company": "An organization or company",
            "location": "City or country",
            "date": "Date"
        })
        .classification("category", ["employment", "weather", "other"])
    )
    return model.extract(text, schema)

Comparison

classical_nlp(docs_to_classify[0])

{'entities': {'person': ['John Smith'],
  'company': ['Acme Corp'],
  'location': ['Berlin'],
  'date': ['Jan 2021', 'late 2025']},
 'category': 'employment'}

gliner2(docs_to_classify[0])

{'entities': {'person': ['John Smith'],
  'company': ['Acme Corp'],
  'location': ['Berlin'],
  'date': ['late 2025', 'Jan 2021']},
 'category': 'employment'}

classical_nlp(docs_to_classify[1])

{'entities': {'person': [],
  'company': [],
  'location': ['Filadelphia'],
  'date': []},
 'category': 'weather'}

gliner2(docs_to_classify[1])

{'entities': {'person': [],
  'company': [],
  'location': ['Filadelphia'],
  'date': []},
 'category': 'weather'}

classical_nlp(docs_to_classify[2])

{'entities': {'person': [],
  'company': ['Anonumoys Inc.', 'NLP'],
  'location': ['Warsaw', 'Poland'],
  'date': ['February of 2026']},
 'category': 'other'}

gliner2(docs_to_classify[2])

{'entities': {'person': [],
  'company': ['Anonumoys Inc.'],
  'location': ['Warsaw', 'Poland'],
  'date': ['February of 2026']},
 'category': 'other'}

%%timeit
classical_nlp(docs_to_classify[2])

727 ms ± 42.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
gliner2(docs_to_classify[2])

288 ms ± 29.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)