Amazing read by M Waleed Kadous from Anyscale httpslnkd

Amazing read by M Waleed Kadous from Anyscale: https://lnkd.in/e77fUTiv

“At Google, there was a document put together by Jeff Dean, the legendary engineer, called Numbers every Engineer should know. It’s really useful to have a similar set of numbers for LLM developers to know that are useful for back-of-the envelope calculations.”

Some of my favorites: “”” 6:1 – Cost Ratio of OpenAI fine tuned vs base model queries 1:1 – Cost Ratio of Self-Hosted base vs fine-tuned model queries

1.3:1 – Average tokens per word

2x number of parameters: Typical GPU memory requirements of an LLM for serving

~1 MB: GPU Memory required for 1 token of output with a 13B parameter model “”””

Cheat sheet by Huaiwei Sun.