Kuldeep Singh Sidhu's picture
1

Kuldeep Singh Sidhu

singhsidhukuldeep

AI & ML interests

None yet

Organizations

Posts 4

view post
Post
2342
Is GPT-4o everything you expected? ๐Ÿค”

@OpenAI has gone omni (GPT-4"o" ๐ŸŒ), a multimodal LLM, it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. ๐ŸŽค๐Ÿ“ธโœ๏ธ

1๏ธโƒฃ Based on the examples seen:
Inputs possible are Text โœ๏ธ, Text + Image ๐Ÿ“๐Ÿ–ผ๏ธ, Text + Audio ๐Ÿ“๐ŸŽง, Text + Video ๐Ÿ“๐ŸŽฅ, Audio ๐ŸŽง
and outputs possible are Image ๐Ÿ–ผ๏ธ, Image + Text ๐Ÿ–ผ๏ธ๐Ÿ“, Text ๐Ÿ“, Audio ๐ŸŽง

2๏ธโƒฃ 88.7% on MMLU ๐Ÿ†; 90.2% on HumanEval (best in class) ๐Ÿฅ‡

3๏ธโƒฃ Up to 50% cheaper ๐Ÿ’ธ and 2x faster โšก than GPT-4 Turbo

4๏ธโƒฃ GPT-4o will be available in the free tier of ChatGPT ๐ŸŽ‰

5๏ธโƒฃ Near real-time audio with 320ms on average, similar to human conversation ๐Ÿ—ฃ๏ธ**

6๏ธโƒฃ New tokenizer with a 200k token vocabulary ๐Ÿ“š (previously 100k vocabulary) leading to 1.1x - 4.4x fewer tokens needed across 20 languages ๐ŸŒ

7๏ธโƒฃ Tokenizer compression and more efficient across non-English languages (3-5 times fewer tokens for major Indian languages ๐Ÿ‡ฎ๐Ÿ‡ณ)

๐Ÿ‘Open questions:
- What is the context length? โ“
- Why does GPT-4 still exist, if GPT-4o is better, faster, and cheaper? ๐Ÿคจ

Blog: https://openai.com/index/hello-gpt-4o/ ๐ŸŒ
Available today:https://chatgpt.com/ ๐Ÿš€

I just wanted it to be cheaper, and more accessible! ๐Ÿ™Œ

Still not open source, but a price reduction is welcome! ๐Ÿ’ฐ

Also, something fun happened, for the first 10-15 mins all search engines were correcting GPT-4o to GPT-4 ๐Ÿ˜‚

Also, also, GPT-4o is the model which was powering the GPT2 chatbot in the LMsys arena (ELO 1310 vs. 1253 for GPT-4 Turbo) ๐Ÿ…
view post
Post
1170
You are all happy ๐Ÿ˜Š that @meta-llama released Llama 3.

Then you are sad ๐Ÿ˜” that it only has a context length of 8k.

Then you are happy ๐Ÿ˜„ that you can just scale llama-3 PoSE to 96k without training, only needing to modify max_position_embeddings and rope_theta.

But then you are sad ๐Ÿ˜ข it only improves the model's long-context retrieval performance (i.e., finding needles) while hardly improving its long-context utilization capability (doing QA and summarization).

But then you are happy ๐Ÿ˜ that the
@GradientsTechnologies community has released the long-context Llama-3-8B-Instruct-262K with long context (262k-1M+).

Now we have another paper "Extending Llama-3's Context Ten-Fold Overnight" ๐Ÿ“œ.

The context length of Llama-3-8B-Instruct is extended from 8K to 80K using QLoRA fine-tuningโš™๏ธ.

The training cycle is highly efficient, taking "only" ๐Ÿ˜‚ 8 hours on a single 8xA800 (80G) GPU machine.

The model also preserves its original capability over short contexts. โœ

The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4.๐Ÿ“Š

The paper suggests that the context length could be extended far beyond 80K with more computation resources (๐Ÿ˜… GPU-poor).

The team plans to publicly release all resources, including data, model, data generation pipeline, and training code, to facilitate future research from the โค๏ธ community.

Paper: https://arxiv.org/abs/2404.19553

This is where we are... until next time... ๐ŸŒŸ

Extending Llama-3's Context Ten-Fold Overnight (2404.19553)

models

None public yet

datasets

None public yet