Back to Pages

Model Grading System

Rating SFW

We grade our models on a simple 1.0 to 5.0 scale.

The point of the score is practical: how good is this model under normal use? Not whether it can produce one cherry-picked hero image, but whether it holds together across ordinary prompts, poses, and compositions.

## Allowed scores

We only use full points and half points:

  • 1.0
  • 1.5
  • 2.0
  • 2.5
  • 3.0
  • 3.5
  • 4.0
  • 4.5
  • 5.0

We do not use quarter-points or arbitrary decimals.

## Release threshold

  • Scores below 3.0 are usually not worth releasing
  • Scores 3.0 and above can be released
  • Scores 4.0+ are normal release territory for strong catalog entries

That threshold matters because some models can be interesting without being dependable. We do not grade for novelty alone. We grade for usable quality.

## The scale

### 5.0 — Excellent

This is the top tier.

  • excellent detail
  • flexible across prompts, scenes, and poses
  • strong prompt adherence
  • low artifact rate

These are the models we can trust broadly. They do not need prompt gymnastics to stay on target.

### 4.0 to 4.5 — Good

These models are good, but more rigid than a true 5.

  • solid detail
  • generally clean generations
  • some narrower behavior or prompting friction
  • still dependable enough for normal release use

4.5 means the model is close to excellent, but not quite as flexible or robust as a flagship.

### 3.0 to 3.5 — Average

These models are usable, but clearly limited.

  • more rigid
  • sometimes underdetailed
  • less dependable across prompt types
  • quality issues show up often enough to matter

3.0 is the bottom of the releaseable range. 3.5 is better than average, but not fully in the strong tier.

### 2.0 to 2.5 — Poor

These models are usually not worth releasing.

  • weak prompt adherence
  • frequent visual issues
  • lightly overtrained behavior is common
  • narrow usefulness even with extra prompting effort

2.5 is a caution score, not a quiet endorsement.

### 1.0 to 1.5 — Bad

This is the bottom of the scale.

  • badly overtrained or unstable
  • major artifacts
  • poor or nonexistent prompt adherence
  • not a usable release candidate

1.0 is the minimum score we assign.

## What we are actually grading

When we score a model, we are looking at:

  • detail quality
  • prompt adherence
  • flexibility across prompt types
  • anatomy reliability
  • pose reliability
  • artifact frequency
  • signs of overtraining

The score is based on repeated normal-use testing, not a single lucky render.

## Short version

  • 5.0: excellent, flexible
  • 4.0: good, more rigid
  • 3.0: average, rigid or underdetailed
  • 2.0: poor, poor adherence or lightly overtrained
  • 1.0: bad, badly overtrained, major artifacts

Half scores let us be more precise without turning the rubric into fake precision.

## Why we publish the score

The rating is there to set expectations. Some models are broad tools. Some are narrower. Some are interesting experiments that do not hold together well enough to recommend widely.

The number is a quick read on that difference.