On this page
psychology
NLP with Hugging Face Transformers
Use Hugging Face Transformers for text classification, sentiment analysis, named entity recognition, and text generation in Python.
Hugging Face Transformers provides pre-trained models for NLP tasks — eliminating the need to train from scratch for most applications.
Installation
pip install transformers torch sentencepiece
Sentiment Analysis — Zero Setup
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
results = classifier([
"I love this product!",
"Terrible experience, would not recommend.",
"It was okay, nothing special.",
])
for text, result in zip(
["I love...", "Terrible...", "It was okay..."],
results,
):
print(f"{result['label']}: {result['score']:.3f}")
Text Classification with Custom Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def classify(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
label = model.config.id2label[probs.argmax().item()]
confidence = probs.max().item()
return label, confidence
label, conf = classify("This movie was absolutely fantastic!")
print(f"{label} ({conf:.2f})")
Named Entity Recognition (NER)
ner = pipeline("ner", grouped_entities=True)
text = "Apple Inc. was founded by Steve Jobs in Cupertino, California."
entities = ner(text)
for entity in entities:
print(f"{entity['word']}: {entity['entity_group']} ({entity['score']:.2f})")
Text Generation
generator = pipeline("text-generation", model="gpt2")
prompt = "Python is a programming language that"
output = generator(prompt, max_length=50, num_return_sequences=1)
print(output[0]["generated_text"])
Question Answering
qa = pipeline("question-answering")
context = """
Python was created by Guido van Rossum and first released in 1991.
It emphasizes code readability and supports multiple programming paradigms.
"""
result = qa(question="Who created Python?", context=context)
print(f"Answer: {result['answer']} (confidence: {result['score']:.2f})")
Fine-Tuning on Custom Data
For domain-specific tasks, fine-tune a pre-trained model:
from transformers import TrainingArguments, Trainer
from datasets import load_dataset
dataset = load_dataset("imdb") # movie reviews
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
eval_strategy="epoch",
logging_steps=100,
)
# Define model, tokenizer, data collator, then:
# trainer = Trainer(model=model, args=training_args, train_dataset=..., eval_dataset=...)
# trainer.train()
See Hugging Face docs for full fine-tuning tutorials.
Model Hub
Browse 500,000+ models at huggingface.co/models:
| Task | Example Model |
|---|---|
| Sentiment | distilbert-base-uncased-finetuned-sst-2-english |
| Translation | Helsinki-NLP/opus-mt-en-fr |
| Summarization | facebook/bart-large-cnn |
| NER | dslim/bert-base-NER |
| Code generation | bigcode/starcoder2-7b |
Production Tips
- Cache models locally — first download is slow
- Use GPU when available —
device=0in pipeline - Batch inputs for throughput
- Set
max_lengthto control memory usage - Consider distilled models (DistilBERT) for faster inference
Related Chapters
- PyTorch Basics — underlying tensor framework
- PyTorch Training — custom training loops
- Scikit-learn Pipelines — classical ML alternative
Hugging Face democratized NLP — tasks that required research teams now take five lines of Python.