ruBERT-ruLaw-NER-judicial

Named Entity Recognition model for Russian judicial decisions. Fine-tuned on a manually annotated corpus of 1335 sentence fragments drawn from administrative, civil, and criminal cases, covering six entity classes.

Base model: TryDotAtwo/ruBERT-ruLaw

ruBERT with domain-adaptive pre-training (DAPT) on the RusLawOD corpus of Russian normative legal acts.

This model was developed as part of a bachelor's thesis. Full code, dataset, and experiments: github.com/Bishop-Y/Bachelor_Thesis

Performance

Metrics below are for this single best checkpoint on the held-out test set (201 sentence fragments). The primary metric is strict span-level F1: a prediction counts as a true positive only on an exact (start, end, label) match - any boundary shift or label mismatch is an error.

Class	F1
DATE	0.917
LAW	0.966
ORG	0.901
PENALTY	0.807
PERSON	0.977
PROVISION	0.950
Macro F1	0.920

Training data

The corpus was built on top of court-decision texts from the RuLegalNER dataset [Shaheen et al., 2023].

Splits (document-level, stratified by case type and rarest penalty subtype): 1000 / 134 / 201 fragments (train / val / test).

Limitations

Annotated by a single annotator.
Built on a single corpus of court decisions - generalization to other legal document types (contracts, claims, etc.) is not guaranteed.
Cased model: performance degrades under case perturbation of the input.
PENALTY is the weakest and most data-hungry class, especially on rare subtypes.

Citation

The dataset is based on:

@article{shaheen2023rulegalner,
    author = {Zein Shaheen and Dmitry I. Mouromtsev and Ignat Postny},
    title = {RuLegalNER: A New Dataset for Russian Legal Named Entities Recognition},
    journal = {Scientific and Technical Journal of Information Technologies, Mechanics and Optics},
    volume = {23},
    number = {4},
    pages = {854--857},
    year = {2023}
}

Downloads last month: 68

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for B1KO1L/ruBERT-ruLaw-NER-judicial

Base model

DeepPavlov/rubert-base-cased

Finetuned

TryDotAtwo/ruBERT-ruLaw

Finetuned

(1)

this model