Jobbert V2
| Entity Passport | |
| Registry ID | hf-model--techwolf--jobbert-v2 |
| License | MIT |
| Provider | huggingface |
Cite this model
Academic & Research Attribution
@misc{hf_model__techwolf__jobbert_v2,
author = {TechWolf},
title = {Jobbert V2 Model},
year = {2026},
howpublished = {\url{https://huggingface.co/techwolf/jobbert-v2}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
Quick Commands
huggingface-cli download techwolf/jobbert-v2 pip install -U transformers âī¸ Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for Jobbert V2: Semantic (S:50), Authority (A:0), Popularity (P:54), Recency (R:69), Quality (Q:65).
Verification Authority
đ What's Next?
Technical Deep Dive
SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
This is a sentence-transformers model specifically trained for job title matching and similarity. It's finetuned from sentence-transformers/all-mpnet-base-v2 on a large dataset of job titles and their associated skills/requirements. The model maps job titles and descriptions to a 1024-dimensional dense vector space and can be used for semantic job title matching, job similarity search, and related HR/recruitment tasks.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-mpnet-base-v2
- Maximum Sequence Length: 64 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
- Training Dataset: 5.5M+ job title - skills pairs
- Primary Use Case: Job title matching and similarity
- Performance: Achieves 0.6457 MAP on TalentCLEF benchmark
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 64, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Asym(
(anchor-0): Dense({'in_features': 768, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
(positive-0): Dense({'in_features': 768, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)
)
Usage
Direct Usage (Sentence Transformers)
First install the required packages:
pip install -U sentence-transformers
Then you can load and use the model with the following code:
import torch
import numpy as np
from tqdm.auto import tqdm
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import batch_to_device, cos_sim
# Load the model
model = SentenceTransformer("TechWolf/JobBERT-v2")
def encode_batch(jobbert_model, texts):
features = jobbert_model.tokenize(texts)
features = batch_to_device(features, jobbert_model.device)
features["text_keys"] = ["anchor"]
with torch.no_grad():
out_features = jobbert_model.forward(features)
return out_features["sentence_embedding"].cpu().numpy()
def encode(jobbert_model, texts, batch_size: int = 8):
# Sort texts by length and keep track of original indices
sorted_indices = np.argsort([len(text) for text in texts])
sorted_texts = [texts[i] for i in sorted_indices]
embeddings = []
# Encode in batches
for i in tqdm(range(0, len(sorted_texts), batch_size)):
batch = sorted_texts[i:i+batch_size]
embeddings.append(encode_batch(jobbert_model, batch))
# Concatenate embeddings and reorder to original indices
sorted_embeddings = np.concatenate(embeddings)
original_order = np.argsort(sorted_indices)
return sorted_embeddings[original_order]
# Example usage
job_titles = [
'Software Engineer',
'Senior Software Developer',
'Product Manager',
'Data Scientist'
]
# Get embeddings
embeddings = encode(model, job_titles)
# Calculate cosine similarity matrix
similarities = cos_sim(embeddings, embeddings)
print(similarities)
The output will be a similarity matrix where each value represents the cosine similarity between two job titles:
tensor([[1.0000, 0.8723, 0.4821, 0.5447],
[0.8723, 1.0000, 0.4822, 0.5019],
[0.4821, 0.4822, 1.0000, 0.4328],
[0.5447, 0.5019, 0.4328, 1.0000]])
In this example:
- The diagonal values are 1.0000 (perfect similarity with itself)
- 'Software Engineer' and 'Senior Software Developer' have high similarity (0.8723)
- 'Product Manager' and 'Data Scientist' show lower similarity with other roles
- All values range between 0 and 1, where higher values indicate greater similarity
Example Use Cases
- Job Title Matching: Find similar job titles for standardization or matching
- Job Search: Match job seekers with relevant positions based on title similarity
- HR Analytics: Analyze job title patterns and similarities across organizations
- Talent Management: Identify similar roles for career development and succession planning
Training Details
Training Dataset
generator
- Dataset: 5.5M+ job title pairs
- Format: Anchor job titles paired with related skills/requirements
- Training objective: Learn semantic similarity between job titles and their associated skills
- Loss: CachedMultipleNegativesRankingLoss with cosine similarity
Training Hyperparameters
- Batch Size: 2048
- Learning Rate: 5e-05
- Epochs: 1
- FP16 Training: Enabled
- Optimizer: AdamW
Framework Versions
- Python: 3.9.19
- Sentence Transformers: 3.1.0
- Transformers: 4.44.2
- PyTorch: 2.4.1+cu118
- Accelerate: 0.34.2
- Datasets: 3.0.0
- Tokenizers: 0.19.1
Citation
BibTeX
JobBERT-v2 paper
Please cite this paper when using JobBERT-v2:
@article{01K47W55SG7ZRKFG431ESRXC35,
abstract = {{Labor market analysis relies on extracting insights from job advertisements, which provide valuable yet unstructured information on job titles and corresponding skill requirements. While state-of-the-art methods for skill extraction achieve strong performance, they depend on large language models (LLMs), which are computationally expensive and slow. In this paper, we propose ConTeXT-match, a novel contrastive learning approach with token-level attention that is well-suited for the extreme multi-label classification task of skill classification. ConTeXT-match significantly improves skill extraction efficiency and performance, achieving state-of-the-art results with a lightweight bi-encoder model. To support robust evaluation, we introduce Skill-XL a new benchmark with exhaustive, sentence-level skill annotations that explicitly address the redundancy in the large label space. Finally, we present JobBERT V2, an improved job title normalization model that leverages extracted skills to produce high-quality job title representations. Experiments demonstrate that our models are efficient, accurate, and scalable, making them ideal for large-scale, real-time labor market analysis.}},
author = {{Decorte, Jens-Joris and Van Hautte, Jeroen and Develder, Chris and Demeester, Thomas}},
issn = {{2169-3536}},
journal = {{IEEE ACCESS}},
keywords = {{Taxonomy,Contrastive learning,Training,Annotations,Benchmark testing,Training data,Large language models,Computational efficiency,Accuracy,Terminology,Labor market analysis,text encoders,skill extraction,job title normalization}},
language = {{eng}},
pages = {{133596--133608}},
title = {{Efficient text encoders for labor market analysis}},
url = {{http://doi.org/10.1109/ACCESS.2025.3589147}},
volume = {{13}},
year = {{2025}},
}
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
â ī¸ Incomplete Data
Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.
View Original Source âđ Limitations & Considerations
- âĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
- âĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
- âĸ FNI scores are relative rankings and may change as new models are added.
- â License Unknown: Verify licensing terms before commercial use.
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Model Transparency Report
Technical metadata sourced from upstream repositories.
đ Identity & Source
- id
- hf-model--techwolf--jobbert-v2
- slug
- techwolf--jobbert-v2
- source
- huggingface
- author
- TechWolf
- license
- MIT
- tags
- sentence-transformers, safetensors, mpnet, sentence-similarity, feature-extraction, generated_from_trainer, dataset_size:5579240, loss:cachedmultiplenegativesrankingloss, en, arxiv:1908.10084, arxiv:2101.06983, license:mit, endpoints_compatible, region:us, text-embeddings-inference
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
- sentence-similarity
đ Engagement & Metrics
- downloads
- 52,692
- stars
- 0
- forks
- 0
Data indexed from public sources. Updated daily.