Model Grading System | Dusk in Xuthal

We grade our models on a simple 1.0 to 5.0 scale.

The point of the score is practical: how good is this model under normal use? Not whether it can produce one cherry-picked hero image, but whether it holds together across ordinary prompts, poses, and compositions.

## Allowed scores

We only use full points and half points:

1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0

We do not use quarter-points or arbitrary decimals.

## Release threshold

Scores below 3.0 are usually not worth releasing
Scores 3.0 and above can be released
Scores 4.0+ are normal release territory for strong catalog entries

That threshold matters because some models can be interesting without being dependable. We do not grade for novelty alone. We grade for usable quality.

## The scale

### `5.0` — Excellent

This is the top tier.

excellent detail
flexible across prompts, scenes, and poses
strong prompt adherence
low artifact rate

These are the models we can trust broadly. They do not need prompt gymnastics to stay on target.

### `4.0` to `4.5` — Good

These models are good, but more rigid than a true 5.

solid detail
generally clean generations
some narrower behavior or prompting friction
still dependable enough for normal release use

4.5 means the model is close to excellent, but not quite as flexible or robust as a flagship.

### `3.0` to `3.5` — Average

These models are usable, but clearly limited.

more rigid
sometimes underdetailed
less dependable across prompt types
quality issues show up often enough to matter

3.0 is the bottom of the releaseable range. 3.5 is better than average, but not fully in the strong tier.

### `2.0` to `2.5` — Poor

These models are usually not worth releasing.

weak prompt adherence
frequent visual issues
lightly overtrained behavior is common
narrow usefulness even with extra prompting effort

2.5 is a caution score, not a quiet endorsement.

### `1.0` to `1.5` — Bad

This is the bottom of the scale.

badly overtrained or unstable
major artifacts
poor or nonexistent prompt adherence
not a usable release candidate

1.0 is the minimum score we assign.

## What we are actually grading

When we score a model, we are looking at:

detail quality
prompt adherence
flexibility across prompt types
anatomy reliability
pose reliability
artifact frequency
signs of overtraining

The score is based on repeated normal-use testing, not a single lucky render.

## Short version

5.0: excellent, flexible
4.0: good, more rigid
3.0: average, rigid or underdetailed
2.0: poor, poor adherence or lightly overtrained
1.0: bad, badly overtrained, major artifacts

Half scores let us be more precise without turning the rubric into fake precision.

## Why we publish the score

The rating is there to set expectations. Some models are broad tools. Some are narrower. Some are interesting experiments that do not hold together well enough to recommend widely.

The number is a quick read on that difference.

## Allowed scores

## Release threshold

## The scale

### 5.0 — Excellent

### 4.0 to 4.5 — Good

### 3.0 to 3.5 — Average

### 2.0 to 2.5 — Poor

### 1.0 to 1.5 — Bad