Meta AI Llama 4 Scout can hold a long story in one go — up to about 128,000 tokens in our catalog. That is like a very big book, but you still pay more when you stuff more text in each ask.
RAG is a fancy way to say "search my files first, then ask the model with only the best bits." That is cheaper than dumping a whole library into one prompt, and it often answers better too.
Our catalog caps one combined message around 128K tokens for this model — still huge, but not infinite. Split giant PDFs into chunks, only paste the top matches, and cap how long each chunk can be.
Files & docs hint: Typically text-in via your ingestion pipeline; size to context limit