🧠

Model

Gemma 3 4b Cebuano Ilokano Tagalog

Name: Gemma 3 4b Cebuano Ilokano Tagalog
Author: nielle003

by nielle003 hf-model--nielle003--gemma_3_4b_cebuano_ilokano_tagalog

Nexus Index

39.6 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 16

R: Recency 97

Q: Quality 50

Tech Context

4 Params

4.096K Ctx

Vital Performance

287 DL / 30D

0.0%

Source →

Audited 39.6 FNI Score

4B Params

4k Context

287 Downloads

8G GPU ~5GB Est. VRAM

Model Information Summary
Entity Passport
Registry ID	hf-model--nielle003--gemma_3_4b_cebuano_ilokano_tagalog
Provider	huggingface

💾

Compute Threshold

~4.3GB VRAM

Interactive

Analyze Hardware

Hardware Compatibility Test

▼

* Static estimation for 4-Bit Quantization.

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__nielle003__gemma_3_4b_cebuano_ilokano_tagalog,
  author = {nielle003},
  title = {Gemma 3 4b Cebuano Ilokano Tagalog Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/nielle003/gemma_3_4b_cebuano_ilokano_tagalog}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

nielle003. (2026). Gemma 3 4b Cebuano Ilokano Tagalog [Model]. Free2AITools. https://huggingface.co/nielle003/gemma_3_4b_cebuano_ilokano_tagalog

🔬Technical Deep Dive

Full Specifications [+]

Quick Commands

🦙 Ollama Run

ollama run gemma_3_4b_cebuano_ilokano_tagalog

🤗 HF Download

huggingface-cli download nielle003/gemma_3_4b_cebuano_ilokano_tagalog

⚖️ Nexus Index V2.0

Methodology Index Protocol

39.6

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 16

Recency (R) 97

Quality (Q) 50

💬 Index Insight

FNI V2.0 for Gemma 3 4b Cebuano Ilokano Tagalog: Semantic (S:50), Authority (A:0), Popularity (P:16), Recency (R:97), Quality (Q:50).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

Deployment Guide

Understand deployment options

Technical Deep Dive

Gemma 3 4B - Cebuano, Ilocano, Tagalog Fine-tuning

A specialized fine-tuned version of Google's Gemma 3 4B optimized for low-resource Philippine languages: Cebuano, Ilocano, and Tagalog (Filipino).

Model Details

Model ID: nielle003/Gemma_3_4B_cebuano_ilocano_tagalog

Base Model: Google Gemma 3 4B License: Gemma Language: Cebuano, Ilocano, Tagalog (Filipino) Task: Instruction following, Question answering, Conversation

Training Data

Dataset Statistics

Total Training Samples: 30,000
- Training Set: 18,000 samples (7,500 real + 10,500 synthetic)
- Validation Set: 9,000 samples (4,500 real + 4,500 synthetic)
- Test Set: 3,000 samples (3,000 real - 1,000 per language)

Data Composition

Training Set (18,000 rows)

Real Data: 7,500 samples
- Cebuano: 2,500
- Ilocano: 2,500
- Tagalog: 2,500
Synthetic Data: 10,500 samples
- Synthetic Anchor (from curated sources): 3,298 samples
- Synthetic Not-Anchor: 7,202 samples
- Distribution: 3,500 per language

Validation Set (9,000 rows)

Real Data: 4,500 samples
- Cebuano: 1,500
- Ilocano: 1,500
- Tagalog: 1,500
Synthetic Data: 4,500 samples
- Only from synthetic not-anchor sources (no anchor data)
- Distribution: 1,500 per language

Test Set (3,000 rows)

All Real Data: 3,000 samples
- Cebuano: 1,000
- Ilocano: 1,000
- Tagalog: 1,000

Data Sources

Real Dataset Folder
- Cebuano: cebuano_sharegpt_5k_random_real.jsonl (5,000 samples)
- Ilocano: ilocano_v2_sharegpt_5k_random.jsonl (5,000 samples)
- Tagalog: Tagalog_real_and_augmented_modified.jsonl (5,241 samples)
Real and Augmented Tagalog Folder
- Real: filipino_aya_sharegpt_tagalog_real.jsonl (1,241 samples)
- Synthetic: filipino_augmented_sharegpt_for_training.jsonl (4,000 samples)
Synthetic Anchors Folder
- Cebuano: cebuano_sharegpt_curated.jsonl (1,098 samples)
- Ilocano: ilocano_sharegpt_for_training.jsonl (1,100 samples)
- Tagalog: filipino_curated_sharegpt_for_training_modified.jsonl (1,100 samples)
Synthetic Not Anchor Folder
- All datasets: all_datasets_changed_sharegpt_shuffled.jsonl (12,015 samples)

Data Splitting Strategy

The dataset was carefully split with the following constraints:

Test Set: Balanced across all 3 languages with only real data (1,000 per language)
Validation Set: Balanced across languages with synthetic data from non-anchor sources only (no synthetic anchor data)
Training Set: Includes ALL synthetic anchor data to maximize training capacity while maintaining language balance

This strategy ensures:

Fair evaluation across all languages
Validation on both real and synthetic data (but not anchor data)
Maximum use of high-quality synthetic anchor data for training

Model Performance

Evaluation Metrics

Test Set (3,000 real samples):
- Balanced evaluation across Cebuano, Ilocano, Tagalog

Supported Tasks

Instruction following
Question answering
Conversational AI
Text generation in Philippine languages

Usage

Installation

bash

pip install transformers torch

Quick Start

python

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "nielle003/Gemma_3_4B_cebuano_ilocano_tagalog"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Example input in Tagalog
prompt = "Ano ang pinakamahusay na paraan para matuto ng programming?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=128)
response = tokenizer.decode(outputs[0])
print(response)

With Hugging Face Pipeline

python

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="nielle003/Gemma_3_4B_cebuano_ilocano_tagalog",
    device=0  # 0 for GPU, -1 for CPU
)

result = generator("Paano gumawa ng masarap na adobo?", max_length=150)
print(result[0]['generated_text'])

Technical Details

Model Architecture

Base: Google Gemma 3 4B
Parameters: 4 billion
Context Length: 8,192 tokens
Precision: bfloat16 (recommended), float16, float32

Training Configuration

Framework: Hugging Face Transformers
Training Approach: Supervised Fine-tuning (SFT)
Data Format: JSONL (JSON Lines)

Limitations

Model is optimized for Cebuano, Ilocano, and Tagalog
May have limited performance on other languages
Generated content should be reviewed for accuracy
Not suitable for production use without additional validation

License

This model is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use this model, please cite:

bibtex

@misc{gemma_3_4b_philippine_languages,
  author = "Barcelona, Nielle E. and Crisologo Aaron",
  title = {A Lightweight SLM-based Specific Question Answer in Cebuano and ilocano for Edge Devices},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/nielle003/Gemma_3_4B_cebuano_ilocano_tagalog}}
}

Dataset Attribution

Cebuano Data: ShareGPT format datasets
Ilocano Data: ShareGPT format datasets
Tagalog Data: ShareGPT format datasets + Aya dataset

Acknowledgments

Google for the Gemma model
Hugging Face for the model hosting platform
All data contributors and annotators

Contact

For questions or issues, please reach out through Hugging Face Model Hub.

Last Updated: March 2024 Model Version: 1.0

⚠️ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub

287Downloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-model--nielle003--gemma_3_4b_cebuano_ilokano_tagalog
slug: nielle003--gemma_3_4b_cebuano_ilokano_tagalog
source: huggingface
author: nielle003
license
tags: safetensors, gguf, question-answering, endpoints_compatible, region:us, conversational

⚙️ Technical Specs

architecture: null
params billions: 4
context length: 4,096
pipeline tag: question-answering
vram gb: 4.3
vram is estimated: true
vram formula: VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

downloads: 287
stars: 0
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!