Fraser Sim - ASO Company of the Year. Apple Search Ads. on LinkedIn: Fancy breakfast next week with fellow app professionals at Gordon Ramsay's… (2024)

Fraser Sim - ASO Company of the Year. Apple Search Ads.

Apple Search Ads Partner.

Report this post

Fancy breakfast next week with fellow app professionals at Gordon Ramsay's Bread St Kitchen?Thursday 25th from 8am, roll up and help your self style buffet.hosted by Redbox Mobile, Batch, Branch

1 Comment

Like Comment

Fraser Sim - ASO Company of the Year. Apple Search Ads.

Apple Search Ads Partner.

Report this comment

Stick you details in here and i'll see you there.. https://batch.com/ressources/events/breakfast-at-the-aps-london-by-batch-branch-redbox-mobile

Like Reply

1Reaction

To view or add a comment, sign in

More Relevant Posts

Yasir Altaf

Data Science/AI Consultant | Big Data Solutions | Cellular Radio & IoT

2w
Report this post
Having worked on various commercial RAG products over the past year, costs can become prohibitive unless you optimize both embedding and retrieval models. Firstly embedding models are fairly cheap, even the commercial ones like OpenAI's text-embedding-3-large will cost $0.13 / 1M tokens. You don't always need a commercial embedding model though, depending on your task, you may even get away with a TF-IDF based vectorizer or BERT(sbert) based sentence_Transformer running locally on your machine. Models like e5,bge/bgm, mpnet,minilm, roberta, LABSE provide excellent performance. You can even save the embeddings on Huggingface for free.Another aspect is to avoid using a generic large model for all tasks. Finetuned OSS embedding models are available for specific use cases whether you are looking for semantic similarity, Search, Information retrieval etc, so depending on your specific use case, it is recommended to chose a smaller( also distilled model) model, which will also keep the infra costs down. MTEB leaderboard on Huggingface is the best place to simplify model selection process. Another area that contributes significantly to your costs is poor prompting techniques. Completions take into account the input text as well as the output it generates, so if you have a habit of shoving everything in your prompt, your per token costs are going to skyrocket. Techniques such as Chain of Though prompting& ICL add 100s or thousands of tokens. This verbosity can be handled by Prompt Compression techniques i.e.adding a "Information Entropy" checker which is basically a "Sentence Rank" algorithm which assesses the informativeness of various text blocks in the prompt and discards unnecessary segments in the prompt. This verbosity can also be handled by passing the prompt to a summarization model such as BART, T5 or even GPT-2. Microsoft recently announced LLMLingua Library to do just this..Another method is to use prompt caching, rather than use an LLM for each query, lookup in your local cache whether a response already exists. Build cache ahead of time, we use chatGPT or a local model to generate question answer pairs, coupled with HYDE this has done wonders in retrieval performance. Majority of the costs incurred are however are not from embedding models but the LLM choice at retrieval phase. Making a wrong choice here will lead to significant cost difference here. for example GPT4 will cost you $30/$60(input/Output) while GPT4(32K ) will $60/$120(input/Output) per million tokens. GPT3.5 Turbo on the other hand will only cost you $0.5/1.5(in/out). I always start with a GPT3.5 Turbo and it has never failed me unless my use case needs complex, multi hop reasoning (in which case GPT4 is the king). Besides the commercial APIs, you should consider OSS LLMs such as Mistra, Zephyr, Mixtral, LLama2, Open Hermes etc. You'll need reasonably robust infra, however their eval markers on information retrieval is not too far off from commercial models.

8

1 Comment

Like Comment

To view or add a comment, sign in
Vant App (AWS Build '23)

36 followers

1mo
Report this post
You asked, we listened! Now you can earn points just by spreading the word about Vant! Refer your friends and family to Vant and start earning points today! 💰 Don't miss out – download the Vant app now from the Google and Apple Store! ✨ #ReferAndEarn #SpreadTheWord #vantcares
1

Like Comment

To view or add a comment, sign in
Cassiano Surek

Chief Technology Officer @ Beyond | AI Engineering Chair @ Next15

3w
Report this post
Cost planning for cloud based solutions is never easy.New architectures introduced by GenerativeAI, such as a RAG, kick that up a notch.Magdalena Kuhn and Joanna Stoffregen help you avoid the shock or at least prepare for that AI Cloud Provider bill generated by your #generativeai explorations!

6

6 Comments

Like Comment

To view or add a comment, sign in
Amer Saleh. BSc. MSc.

Technology and Business Consultant

3w
Report this post
Make sure that you are aware of your ROI before investing heavily in AI tools!

Like Comment

To view or add a comment, sign in
Brandon Rich

Assoc. Director of Enterprise Data & Integration Services at University of Notre Dame

3w
Report this post
A good explainer on RAG costs and the tradeoff between cost and quality as you go to cheaper models. However, look out for a mistake in the graphic that shows "11 input tokens." The other slides express this point, but any time you are introducing additional data to the context -- relevant document snippets for RAG, conversation history for memory, or pre-prompting for behavior influence -- that equals more tokens, which equals more cost.

8

Like Comment

To view or add a comment, sign in
Yohanes Nuwara

Consultant in AI & Computer Vision | Developer of Open-source Programs for Oil and Gas | M.Sc Business Analytics and Big Data presso Politecnico di Milano

3w
Report this post
Very useful to estimate the cost of building LLM apps and how to minimize it

4

1 Comment

Like Comment

To view or add a comment, sign in
Hugh Frost

2w
Report this post
Many organizations struggle with how and when to leverage Gen AI. This post and presentation from Magdalena Kuhn and Joanna Stoffregen provides a clear illustration of the costs to operate a RAG based LLM.In their example the costs are staggering. They also provide a clear explanation for what is driving the costs, the cost differential between Chat GPT 3.5 v 4.0, along with the trade-offs.

2

1 Comment

Like Comment

To view or add a comment, sign in
Man and Van

29 followers

11mo
Report this post
📣 Exciting news! Check out Man and Van App's blog post on five tips for disassembling a bed when moving house. Learn step-by-step instructions, essential tools, and expert advice for hassle-free bed disassembly and reassembly. Save time, effort, and headaches! Read more: https://lnkd.in/eKc4uXkADon't forget to like and share. Let's make moving a breeze together! 🌟 #MovingTips #ManAndVan #BedDisassembly #SmoothRelocation #ExpertAdvice
Like Comment

To view or add a comment, sign in
Jean Bernard Yung Hing Hin

Principal Director of Engineering | Canadian Lead at Nuvalence

3w Edited
Report this post
When you plan the cost of your infrastructure, it's very critical to account for the cost of operating your RAG-based LLM apps (especially if you're planning to deploy them at scale). The main cost drivers are usually associated with:1. Vector database cost2. Embedding creation/manipulation cost3. Inference costPer Magdalena Kuhn post, using a smaller (and cheaper) LLM will significantly lower the cost per user due to a significant reduction in inference cost. Or you could consider other cheaper alternatives that use open-source LLMs. You could also look at different strategies to create/manipulate your embeddings. You can select a "good enough" (and cheaper) embedding model to achieve what you want or even use utility libraries/tools to create and manipulate your embeddings. You can also select a vector DB vendor that is best suited for your needs (each product from each vendor has its own characteristics and cost associated with it).

6

1 Comment

Like Comment

To view or add a comment, sign in
Dispense

3,789 followers

3mo
Report this post
Day 5 of 5 Dispense Product Tips 💡:Keep your menu note short and sweet 🌱. Most customers don't read the entire note, or even may dismiss it before reading.If you want to highlight any deals you're running, leave them out of the note and use the dedicated deals section of the app - customers will look there first!
4

Like Comment

To view or add a comment, sign in

3,508 followers

56 Posts

View Profile

Explore topics

Sales
Marketing
Business Administration
HR Management
Content Management
Engineering
Soft Skills
See All

Fraser Sim - ASO Company of the Year. Apple Search Ads. on LinkedIn: Fancy breakfast next week with fellow app professionals at Gordon Ramsay&#39;s… (2024)

More Relevant Posts

Explore topics

References

Fraser Sim - ASO Company of the Year. Apple Search Ads. on LinkedIn: Fancy breakfast next week with fellow app professionals at Gordon Ramsay's… (2024)