Prompt Craft: Getting the Most from Flux LoRA Models

## The Language of Flux

If you’ve been prompting for SDXL or Pony and you’re moving to Flux, the first thing you need to understand is that Flux speaks a different language.

SDXL and its derivatives use a dual text encoder system — CLIP ViT-L and OpenCLIP ViT-bigG — that was trained on short, tag-style descriptions. These encoders respond well to comma-separated keywords: 1girl, long hair, dark fantasy, dramatic lighting, masterpiece. The model learned to associate those fragments with visual concepts during training.

Flux uses Google’s T5-XXL text encoder — a 4.7 billion parameter language model that was trained on natural language. It doesn’t want tags. It wants sentences. It wants you to describe what you see the way you’d describe it to someone standing next to you.

This is the single most important difference, and it changes everything about how you prompt.

## Natural Language Prompting

The old way:

1girl, witch queen, dark fantasy, long black hair, glowing eyes, 
ornate crown, dark robes, dramatic lighting, masterpiece, best quality

The Flux way:

A dark-haired queen stands at the threshold of her obsidian throne room. 
Her eyes burn with an amber glow beneath a crown of twisted iron and 
black gemstones. Dark robes pool around her feet like spilled ink. 
The only light comes from the embers in her eyes and the faint 
phosphorescence of runes carved into the walls behind her.

Both describe the same scene. The Flux prompt will produce a more coherent, intentional result because the T5 encoder understands the relationships between elements — the queen is at the threshold, her eyes burn beneath the crown, the robes pool around her feet. The tag-style prompt gives the model a bag of concepts and hopes it assembles them correctly.

### What T5 Understands Well

Spatial relationships: “standing behind,” “reflected in,” “emerging from,” “draped across”
Materials and textures: “hammered bronze,” “translucent resin,” “wet obsidian,” “cracked porcelain”
Lighting descriptions: “lit from below by amber light,” “silhouetted against a green glow,” “harsh fluorescent overhead”
Atmosphere and mood: “an oppressive silence,” “the air thick with heat,” “a sense of something watching”
Actions and poses: “reaching toward the light,” “turning away from the viewer,” “caught mid-stride”

### What T5 Handles Poorly

Precise counting: “exactly three candles” will sometimes give you two or five
Text and writing: “a sign that reads DANGER” — text generation in images remains unreliable
Negation: “a room with no windows” may still produce windows. Use negative prompts or LoRA-specific techniques instead
Complex multi-subject scenes: More than two subjects with distinct attributes gets unreliable

## Working with LoRA Trigger Words

Most LoRAs are trained with trigger words — specific tokens that activate the learned concept. When you see a model card that says “use trigger word crystal_plague,” this means the model was trained with that token consistently appearing in its captions. The T5 encoder has learned to associate that token with the visual concept.

### Where to Place Trigger Words

Put trigger words early in the prompt, integrated naturally into the sentence:

**Good:** A crystal_plague infected woman stands in an abandoned subway station, crystalline growths spreading across her shoulders and down her arms, the formations catching fluorescent light **Acceptable:** A woman standing in a subway station, crystal_plague, crystalline growths on her body, fluorescent lighting **Weak:** detailed photo, masterpiece, best quality, crystal_plague, woman, subway, crystals

The trigger word activates the LoRA’s learned concept. Everything else in the prompt shapes how that concept manifests — the setting, the pose, the lighting, the mood.

### Multiple LoRAs

When stacking multiple LoRAs, include all trigger words and be explicit about how the concepts interact:

A resin_wax_clay figure of a witch_queen, sculpted from dark malachite 
with veins of gold running through the stone. She sits on a throne 
carved from a single block of green-black mineral, her crown fused 
to her head as part of the sculpture.

Here resin_wax_clay activates the material transformation LoRA while witch_queen activates the character concept. The rest of the prompt tells the model how to combine them.

LoRA strength matters. If one concept is overpowering the other, reduce its strength (typically 0.5–0.8 for the secondary LoRA). The model has a fixed attention budget — two LoRAs at full strength compete for the same space.

## Composition and Camera

Flux responds well to photographic language. Think like a cinematographer:

Concept	Prompt Language
Close-up portrait	“tight framing on her face, shallow depth of field”
Full body	“full body shot, standing in the center of the frame”
Environment emphasis	“wide establishing shot, the figure small against the vast interior”
Dutch angle	“tilted camera angle, creating unease”
Low angle	“camera looking up from below, emphasizing height and power”
Over-the-shoulder	“shot from behind and to the side, looking at what she sees”

### Aspect Ratio

Flux generates at any resolution, but the aspect ratio shapes composition:

1:1 (1024×1024): Portraits, centered subjects, icons
3:4 (896×1152): Standing figures, vertical compositions
4:3 (1152×896): Landscapes, environments, group scenes
16:9 (1344×768): Cinematic, panoramic, establishing shots
9:16 (768×1344): Full-body vertical, phone-format content

Match your aspect ratio to your subject. A full-body portrait in 16:9 wastes most of the frame. A sweeping landscape in 1:1 loses the horizon.

## Style Direction

Flux can produce a wide range of styles. Specify what you want rather than relying on defaults:

A photorealistic portrait with the quality of medium-format film photography, 
soft natural window light from the left, slight film grain, shallow depth of field

A semi-anime illustration with painterly rendering, luminous skin, 
soft glow effects, and large expressive eyes in the style of modern 
digital fantasy illustration

A dark oil painting with heavy impasto brushwork, deep shadows, 
and the warm candlelight palette of a Baroque master

The more specific your style direction, the more consistent your results. “Masterpiece, best quality” is SDXL-era thinking — it’s a prayer, not an instruction. Tell the model what a masterpiece looks like to you.

## Common Mistakes

### The Tag Dump

1girl, solo, long hair, black hair, large breasts, fantasy, dark, 
dramatic, beautiful, detailed, masterpiece, 8k, best quality

Flux will try to make something from this, but T5 processes it as a run-on list without relationships. The result is generic and uncontrolled.

### The Novel

In the far reaches of the northern kingdom, where the ancient forests 
give way to barren mountain peaks shrouded in perpetual mist, there 
exists a queen whose power is whispered of in taverns from coast to 
coast. She was born during the eclipse of the twin moons...

T5 has a token limit (~77 tokens for CLIP, ~256 for T5 in most implementations). Past that limit, tokens get truncated. Front-load the visual description. Save the lore for the model card.

### Ignoring Negative Prompts

Some Flux workflows support negative prompts, others don’t (pure Flux uses classifier-free guidance without negative conditioning). When available, use them for things the model consistently gets wrong:

Negative: extra fingers, deformed hands, blurry, watermark, text

When negative prompts aren’t available, compensate with positive specificity: “perfectly formed hands with five fingers” works better than nothing.

### Fighting the LoRA

If a LoRA produces green crystals, don’t prompt for “red crystals” at LoRA strength 1.0 and expect clean results. Work with the LoRA’s trained distribution. Lower the strength if you want more variation, or use a second LoRA to shift the color palette.

## A Complete Example

Let’s build a prompt step by step for the Crystal Plague LoRA:

Subject: A woman affected by the crystal plague

Setting: An abandoned hospital corridor

Mood: Clinical horror — fluorescent lights, institutional decay

Composition: Medium shot, slightly low angle

Assembled:

crystal_plague, a woman standing in an abandoned hospital corridor, 
crystalline formations growing from her left shoulder and spreading 
down her arm, catching the harsh fluorescent light overhead. 
She wears a torn hospital gown. The walls are institutional green 
tile, cracked and water-stained. Her expression is calm, almost 
serene, despite the crystals consuming her body. Medium shot from 
a slightly low angle. Photorealistic, clinical lighting, 
shallow depth of field focused on the crystal growths.

This prompt gives the model:

The LoRA trigger (crystal_plague)
A clear subject with specific crystal placement
A defined setting with material details
Emotional direction (“calm, almost serene”)
Camera instructions (medium shot, low angle)
Style direction (photorealistic, clinical)
Depth-of-field guidance for the renderer

The result will be far more controlled than “crystal_plague, woman, hospital, crystals, dramatic.”

## Final Thoughts

The shift from SDXL to Flux isn’t just a model upgrade — it’s a change in how you communicate with the machine. You’re no longer filling out a tag form. You’re describing a vision. The more precisely you can articulate what you see in your mind’s eye, the more faithfully the model will render it.

The people of Xuthal synthesized food from primal elements and light from radium gems. They understood that creation requires precision — the right elements, in the right proportions, at the right moment. Prompting is no different. Every word is an element. Every sentence is a formula. Learn the language, and the visions follow.

The lotus-eaters type tags and hope for the best. We craft prompts and know what we’ll get.