CV Parser NER — bert-base-uncased (v2)
Token-classification model that extracts Job Titles, Skills, and Education from resumes/CVs using a BIO tag scheme.
Provenance
- Trained from scratch on dataset 4 (
resume_bio_annotated_full.csv, 2,483 resumes — 1,739 train / 372 val / 372 test), the team's finalized AI-Studio/Vertex-relabelled dataset. - Reproduced end-to-end with the project notebooks/scripts
(
retokenize.py+train_bert_run.py). - Base model:
bert-base-uncased· epochs: 4 · learning rate: 2e-5 · max_length 512 · stride 128 · seed 42.
Resume-level performance (dataset-4 splits)
| split | precision | recall | F1 |
|---|---|---|---|
| validation | — | — | 0.5540 |
| test | — | — | 0.5852 |
Labels
O, B-JOB_TITLE, I-JOB_TITLE, B-SKILL, I-SKILL, B-EDUCATION, I-EDUCATION
- Downloads last month
- 55
Model tree for Zeqhx/cv-parser-bert-v2
Base model
google-bert/bert-base-uncased