Daily Papers
1.LLM360: Towards Fully Transparent Open-Source LLMs(paper | webpage)
The LLM360 initiative aims to enhance transparency and collaboration in AI research by fully open-sourcing all aspects of Large Language Models, including training code, data, and checkpoints. This contrasts with the limited release of artifacts by most existing LLMs. LLM360's first release includes two 7B parameter models, Amber and CrystalCoder, with complete training details, promoting reproducibility and ongoing advancements in the field.
2.A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting(paper | repo | webpage )
PowerPaint introduces a novel approach for high-quality, versatile image inpainting, capable of both context-aware and text-guided tasks. It employs learnable task prompts and fine-tuning strategies for targeted inpainting, achieving state-of-the-art results. The model also excels in object removal and shape-guided inpainting through prompt interpolation, demonstrating its versatility and superiority in various inpainting benchmarks.
3.A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges( paper | repo )
This survey explores the integration of Large Language Models (LLMs) like ChatGPT in medicine, examining their construction, performance, real-world applications, challenges, and improvement strategies. It aims to provide a comprehensive understanding of the potential and obstacles of medical LLMs, offering valuable insights for developing effective and practical LLMs in healthcare.
4.Context Tuning for Retrieval Augmented Generation ( paper )
The proposed Context Tuning for Retrieval Augmented Generation (RAG) enhances large language models (LLMs) by improving tool retrieval and plan generation through a smart context retrieval system. This system, utilizing numerical, categorical, and habitual signals, overcomes the limitations of semantic search in cases of incomplete or context-lacking queries. Empirical results show significant improvements in context and tool retrieval effectiveness, as well as an increase in LLM-based planner accuracy. The lightweight model, combining Reciprocal Rank Fusion with LambdaMART, surpasses GPT-4 based retrieval and reduces hallucinations in plan generation.
5.Efficient Quantization Strategies for Latent Diffusion Models( paper )
This study focuses on efficiently quantizing Latent Diffusion Models (LDMs), crucial for deploying large generative models on edge devices. LDMs, effective in applications like text-to-image generation, face challenges with Post Training Quantization (PTQ) due to their temporal and structural complexities. The proposed strategy uses the Signal-to-Quantization-Noise Ratio (SQNR) for evaluation, treating quantization discrepancy as relative noise. It includes both global and local quantization approaches, where global strategies prioritize sensitive model parts for higher-precision quantization, and local strategies address specific challenges in quantization-sensitive areas. This dual approach results in a more efficient and effective PTQ for LDMs.
6.Photorealistic Video Generation with Diffusion Models( paper | webpage)
W.A.L.T is a transformer-based method for creating photorealistic videos using diffusion modeling, excelling in both video and image generation benchmarks. It employs a causal encoder for unified latent space compression of images and videos, and a window attention architecture for efficient spatial and spatiotemporal modeling. This approach achieves top performance on UCF-101, Kinetics-600, and ImageNet benchmarks without classifier guidance. Additionally, W.A.L.T includes a text-to-video generation cascade with a latent video diffusion model and two super-resolution models, generating high-resolution videos at 8 fps.
AI News
1. Audiobox: Where anyone can make a sound with an idea (webpage | meta blog | paper )
2. Introducing General World Models (runwayml research)
3.Mixture of Experts Explained (huggingface blog)
AI Repo
1.llama-api: An OpenAI-like LLaMA inference API( repo )
2.stable-diffusion-reference-only:Anime Character Remix. Line Art Automatic Coloring. Style Transfer.( repo )
3.BricksLLM:Simplifying LLM ops in production ( repo )