Steve Kinney

Courses

→

AI Fundamentals with Python

Quick Checklists — Pipelines, Attention, Generation

Pipeline Setup Checklist

Pick task and model ID; pin a revision for reproducibility.
Move compute to GPU if available; consider 8-bit/4-bit for memory limits.
Batch inputs and enable fast tokenizers.
Configure truncation, padding, and max length to match task.
Log model name, revision, seeds, and decoding parameters for audits.

Attention Sanity Checks

Confirm correct masks (padding vs causal) for your use case.
Inspect sequence lengths; avoid quadratic blow-ups when unnecessary.
Prefer optimized attention kernels (e.g., FlashAttention where supported).
Verify positional settings for long-context workflows.

Text Generation Defaults

Deterministic baseline: greedy or low-temp beam (2–4 beams).
Balanced creativity: top-p 0.9–0.95, temperature 0.7–0.9.
Reduce loops: small repetition penalty or no-repeat n-gram 2–3.
Use stop sequences and max tokens appropriate to the task.

Last modified on September 14, 2025.