Steve Kinney

Quick Checklists — Pipelines, Attention, Generation

Pipeline Setup Checklist

  • Pick task and model ID; pin a revision for reproducibility.
  • Move compute to GPU if available; consider 8-bit/4-bit for memory limits.
  • Batch inputs and enable fast tokenizers.
  • Configure truncation, padding, and max length to match task.
  • Log model name, revision, seeds, and decoding parameters for audits.

Attention Sanity Checks

  • Confirm correct masks (padding vs causal) for your use case.
  • Inspect sequence lengths; avoid quadratic blow-ups when unnecessary.
  • Prefer optimized attention kernels (e.g., FlashAttention where supported).
  • Verify positional settings for long-context workflows.

Text Generation Defaults

  • Deterministic baseline: greedy or low-temp beam (2–4 beams).
  • Balanced creativity: top-p 0.9–0.95, temperature 0.7–0.9.
  • Reduce loops: small repetition penalty or no-repeat n-gram 2–3.
  • Use stop sequences and max tokens appropriate to the task.

Last modified on .