Daily Papers
1.MagicAnimate:Temporally Consistent Human Image Animation using Diffusion Model(paper | webpage | code )
This study introduces MagicAnimate, a diffusion-based framework enhancing temporal consistency and animation fidelity in human image animation, outperforming existing techniques by over 38% in video fidelity, particularly on the challenging TikTok dancing dataset.
2.Fast View Synthesis of Casual Videos(paper | webpage)
This paper presents a method for efficient synthesis of high-quality novel views from monocular videos, using a hybrid video representation that combines extended plane-based scene models with per-frame point clouds, achieving 100x faster training and real-time rendering compared to state-of-the-art methods.
3.Mamba: Linear-Time Sequence Modeling with Selective State Spaces(paper)
This paper presents Mamba, a simplified neural network architecture surpassing Transformers in efficiency and performance. It addresses long sequence handling and computational inefficiency in foundational models by integrating improved structured state space models (SSMs) with input-dependent parameters and a hardware-aware parallel algorithm. Mamba offers 5× faster inference, linear scaling, and state-of-the-art results in modalities like language, audio, and genomics, outperforming or matching larger Transformers.
4.Human Motion Generation: A Survey(paper )
Human motion generation, crucial for creating natural human pose sequences, is gaining traction due to advancements in motion data collection and generation methods. This survey presents a first-of-its-kind comprehensive review in this field, covering the background of human motion and generative models. It examines methods for text-conditioned, audio-conditioned, and scene-conditioned human motion generation, provides an overview of common datasets and evaluation metrics, and discusses open problems and future research directions. This survey aims to offer a detailed perspective on this evolving field and inspire new solutions for existing challenges
.5.VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence(paper | webpage)
This work introduces VideoSwap, a new framework for video editing that focuses on subject swapping, replacing the main subject in a video with one of different identity and shape. Unlike previous methods that rely on dense correspondences, VideoSwap uses semantic point correspondences, allowing for effective shape changes. It also incorporates user interactions like point removal and dragging for fine-tuning. Extensive experiments show its state-of-the-art performance in real-world videos.
AI News
1.How to Build LLM Apps that can See Hear Speak(link)
2.llamahub (link)
3.OpenAI COO thinks AI for business is overhyped(link)
4.AssemblyAI lands $50M to build and serve AI speech models(link)
5.OpenAI bugbounty (link)
AI Repo
1.awesome-assistants(repo)
2.TaskWeaver:A code-first agent framework for seamlessly planning and executing data analytics tasks (repo)