Kuldeep Singh Sidhu

singhsidhukuldeep

https://www.linkedin.com/in/singhsidhukuldeep/

kuldeep_s_s

singhsidhukuldeep

AI & ML interests

None yet

Organizations

Posts 4

Post

2342

Is GPT-4o everything you expected? 🤔

@OpenAI has gone omni (GPT-4"o" 🌐), a multimodal LLM, it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. 🎤📸✏️

1️⃣ Based on the examples seen:
Inputs possible are Text ✏️, Text + Image 📝🖼️, Text + Audio 📝🎧, Text + Video 📝🎥, Audio 🎧
and outputs possible are Image 🖼️, Image + Text 🖼️📝, Text 📝, Audio 🎧

2️⃣ 88.7% on MMLU 🏆; 90.2% on HumanEval (best in class) 🥇

3️⃣ Up to 50% cheaper 💸 and 2x faster ⚡ than GPT-4 Turbo

4️⃣ GPT-4o will be available in the free tier of ChatGPT 🎉

5️⃣ Near real-time audio with 320ms on average, similar to human conversation 🗣️**

6️⃣ New tokenizer with a 200k token vocabulary 📚 (previously 100k vocabulary) leading to 1.1x - 4.4x fewer tokens needed across 20 languages 🌍

7️⃣ Tokenizer compression and more efficient across non-English languages (3-5 times fewer tokens for major Indian languages 🇮🇳)

👐Open questions:
- What is the context length? ❓
- Why does GPT-4 still exist, if GPT-4o is better, faster, and cheaper? 🤨

Blog: https://openai.com/index/hello-gpt-4o/ 🌐
Available today:https://chatgpt.com/ 🚀

I just wanted it to be cheaper, and more accessible! 🙌

Still not open source, but a price reduction is welcome! 💰

Also, something fun happened, for the first 10-15 mins all search engines were correcting GPT-4o to GPT-4 😂

Also, also, GPT-4o is the model which was powering the GPT2 chatbot in the LMsys arena (ELO 1310 vs. 1253 for GPT-4 Turbo) 🏅

Post

1170

You are all happy 😊 that @meta-llama released Llama 3.

Then you are sad 😔 that it only has a context length of 8k.

Then you are happy 😄 that you can just scale llama-3 PoSE to 96k without training, only needing to modify max_position_embeddings and rope_theta.

But then you are sad 😢 it only improves the model's long-context retrieval performance (i.e., finding needles) while hardly improving its long-context utilization capability (doing QA and summarization).

But then you are happy 😁 that the
@GradientsTechnologies community has released the long-context Llama-3-8B-Instruct-262K with long context (262k-1M+).

Now we have another paper "Extending Llama-3's Context Ten-Fold Overnight" 📜.

The context length of Llama-3-8B-Instruct is extended from 8K to 80K using QLoRA fine-tuning⚙️.

The training cycle is highly efficient, taking "only" 😂 8 hours on a single 8xA800 (80G) GPU machine.

The model also preserves its original capability over short contexts. ✁

The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4.📊

The paper suggests that the context length could be extended far beyond 80K with more computation resources (😅 GPU-poor).

The team plans to publicly release all resources, including data, model, data generation pipeline, and training code, to facilitate future research from the ❤️ community.

Paper: https://arxiv.org/abs/2404.19553

This is where we are... until next time... 🌟

Extending Llama-3's Context Ten-Fold Overnight (2404.19553)

View all posts

models

None public yet

datasets

None public yet