🧠

Model

Mmlw Retrieval E5 Large

Name: Mmlw Retrieval E5 Large
Author: sdadas

by sdadas hf-model--sdadas--mmlw-retrieval-e5-large

Nexus Index

39.7 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 17

R: Recency 86

Q: Quality 65

Tech Context

Vital Performance

356 DL / 30D

0.0%

Source →

Audited 39.7 FNI Score

Tiny - Params

- Context

356 Downloads

Commercial APACHE License

Model Information Summary
Entity Passport
Registry ID	hf-model--sdadas--mmlw-retrieval-e5-large
License	Apache-2.0
Provider	huggingface

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__sdadas__mmlw_retrieval_e5_large,
  author = {sdadas},
  title = {Mmlw Retrieval E5 Large Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/sdadas/mmlw-retrieval-e5-large}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

sdadas. (2026). Mmlw Retrieval E5 Large [Model]. Free2AITools. https://huggingface.co/sdadas/mmlw-retrieval-e5-large

🔬Technical Deep Dive

Full Specifications [+]

Quick Commands

🤗 HF Download

huggingface-cli download sdadas/mmlw-retrieval-e5-large

📦 Install Lib

pip install -U transformers

⚖️ Nexus Index V2.0

Methodology Index Protocol

39.7

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 17

Recency (R) 86

Quality (Q) 65

💬 Index Insight

FNI V2.0 for Mmlw Retrieval E5 Large: Semantic (S:50), Authority (A:0), Popularity (P:17), Recency (R:86), Quality (Q:65).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

Technical Deep Dive

MMLW-retrieval-e5-large

MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. This model is optimized for information retrieval tasks. It can transform queries and passages to 1024 dimensional vectors. The model was developed using a two-step procedure:

In the first step, it was initialized with multilingual E5 checkpoint, and then trained with multilingual knowledge distillation method on a diverse corpus of 60 million Polish-English text pairs. We utilised English FlagEmbeddings (BGE) as teacher models for distillation.
The second step involved fine-tuning the obtained models with contrastrive loss on Polish MS MARCO training split. In order to improve the efficiency of contrastive training, we used large batch sizes - 1152 for small, 768 for base, and 288 for large models. Fine-tuning was conducted on a cluster of 12 A100 GPUs.

⚠️ 2023-12-26: We have updated the model to a new version with improved results. You can still download the previous version using the v1 tag: AutoModel.from_pretrained("sdadas/mmlw-retrieval-e5-large", revision="v1") ⚠️

Usage (Sentence-Transformers)

⚠️ Our dense retrievers require the use of specific prefixes and suffixes when encoding texts. For this model, queries should be prefixed with "query: " and passages with "passage: " ⚠️

You can use the model like this with sentence-transformers:

python

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

query_prefix = "query: "
answer_prefix = "passage: "
queries = [query_prefix + "Jak dożyć 100 lat?"]
answers = [
    answer_prefix + "Trzeba zdrowo się odżywiać i uprawiać sport.",
    answer_prefix + "Trzeba pić alkohol, imprezować i jeździć szybkimi autami.",
    answer_prefix + "Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu."
]
model = SentenceTransformer("sdadas/mmlw-retrieval-e5-large")
queries_emb = model.encode(queries, convert_to_tensor=True, show_progress_bar=False)
answers_emb = model.encode(answers, convert_to_tensor=True, show_progress_bar=False)

best_answer = cos_sim(queries_emb, answers_emb).argmax().item()
print(answers[best_answer])
# Trzeba zdrowo się odżywiać i uprawiać sport.

Evaluation Results

The model achieves NDCG@10 of 58.30 on the Polish Information Retrieval Benchmark. See PIRB Leaderboard for detailed results.

Acknowledgements

This model was trained with the A100 GPU cluster support delivered by the Gdansk University of Technology within the TASK center initiative.

Citation

bibtex

@inproceedings{dadas2024pirb,
  title={PIRB: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods},
  author={Dadas, Slawomir and Pere{\l}kiewicz, Micha{\l} and Po{\'s}wiata, Rafa{\l}},
  booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
  pages={12761--12774},
  year={2024}
}

⚠️ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub

356Downloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-model--sdadas--mmlw-retrieval-e5-large
slug: sdadas--mmlw-retrieval-e5-large
source: huggingface
author: sdadas
license: Apache-2.0
tags: sentence-transformers, pytorch, safetensors, xlm-roberta, feature-extraction, sentence-similarity, transformers, information-retrieval, pl, license:apache-2.0, text-embeddings-inference, endpoints_compatible, deploy:azure, region:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag: sentence-similarity

📊 Engagement & Metrics

downloads: 356
stars: 0
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!