Daily Papers
1.LLM in a flash: Efficient Large Language Model Inference with Limited Memory ( paper )
This paper presents efficient methods to run large language models (LLMs) that exceed DRAM capacity by using flash memory. It introduces a cost model for flash memory, focusing on minimizing data transfers and optimizing sequential data access. Techniques like 'windowing' and 'row-column bundling' enable running models up to twice the DRAM size, significantly speeding up inference on memory-limited devices.
2.Gemini: A Family of Highly Capable Multimodal Models ( paper )
The report introduces Gemini, a versatile multimodal model family, excelling in image, audio, video, and text analysis. Gemini includes Ultra, Pro, and Nano models, tailored for complex tasks to memory-limited devices. The Ultra model sets new benchmarks in 30 out of 32 evaluations, achieving human-level performance in the MMLU exam and leading in 20 multimodal benchmarks. Gemini's cross-modal reasoning and language skills promise broad applications, with a focus on responsible deployment.
3.Tracking Any Object Amodally ( paper )
The paper highlights the importance of amodal perception, crucial in scenarios like autonomous driving for understanding partially visible objects. Current detection and tracking algorithms often neglect this aspect due to dataset limitations. Addressing this, the TAO-Amodal benchmark is introduced with 880 categories in numerous video sequences, providing both amodal and modal annotations. A novel 'amodal expander' plug-in module transforms standard trackers into amodal ones, significantly improving occluded object detection and tracking on TAO-Amodal. Particularly in people tracking, this method doubles the performance compared to existing modal baselines.
4.Social Learning: Towards Collaborative Learning with Large Language Models ( paper )
The paper introduces "social learning" for large language models (LLMs), enabling knowledge sharing via natural language while ensuring privacy. It explores two methods: abstract prompts and synthetic examples, showing effective knowledge transfer with low data memorization and performance comparable to traditional methods. This approach opens new possibilities for LLMs' future development.
AI News
1.Google VideoPoet: A large language model for zero-shot video generation ( google research | sites )
2.Purple Llama:An approach to open trust and safety in the era of generative AI ( meta ai )
3.Turn your ideas into songs with Suno on Microsoft Copilot ( link )
AI Repos
1.fffiloni/SDXL-Auto-FaceSwap ( huggingface space )
2.Osprey: Pixel Understanding with Visual Instruction Tuning ( repo)
3.PowerInfer:High-speed Large Language Model Serving on PCs with Consumer-grade GPUs ( repo )
4.Large MultiModal Model Hallucination ( repo )
5.AnyDoor: Zero-shot Object-level Image Customization ( repo )