Daily Papers
1.DiffusionLight: Light Probes for Free by Painting a Chrome Ball ( paper |webpage )
We developed a technique using diffusion models to estimate lighting from a single image, overcoming the limitations of current HDR-based methods and showing superior performance in diverse, real-world settings.
2.FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection ( paper | webpage )
FineControlNet takes as input a text prompt and a spatial conditioning image. The text prompt is then parsed to assign each conditioning instance it's own individual prompt. Each grouping of prompt, skeleton, mask, and noise map is then passed to our model. By spatially injecting the individual prompt information into different areas of the image, we can produce high quality images that adhere to the input prompt.
3.VideoLCM: Video Latent Consistency Model ( papaer )
We introduce VideoLCM, applying consistency models to video generation for efficient, high-quality synthesis. This framework, building on latent video diffusion models, uses consistency distillation for training. VideoLCM excels in computational efficiency, fidelity, and temporal consistency, achieving smooth video synthesis in just four steps. Aimed as a baseline for future research, its source code and models will be shared publicly.
4.DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models ( paper | webpage | code )
DreamTalk introduces a framework for expressive talking head generation using diffusion models. It comprises a denoising network, style-aware lip expert, and style predictor, achieving photo-realistic talking faces with diverse styles and accurate lip motions, surpassing existing methods.
5.ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent ( paper )
A ReAct-style LLM agent is introduced to answer complex questions using external knowledge. It undergoes iterative training through a ReST-like method, continuously improving its performance. After two iterations, a fine-tuned smaller model matches the performance of larger models with significantly fewer parameters in compositional question-answering benchmarks.
6.Extending Context Window of Large Language Models via Semantic Compression ( paper )
7.Perspectives on the State and Future of Deep Learning -- 2023 ( paper )
8.Amphion: An Open-Source Audio, Music and Speech Generation Toolkit ( paper | code )
AI News
1.Turn ideas into models, Lightning fast ( link )
2.Introducing Resemble Enhance: Open Source Speech Super Resolution AI Model( link )
3.turboART ( link )
AI Repo
1.pytorch-frame:Tabular Deep Learning Library for PyTorch ( repo )
2.jupyter-tldraw ( repo )
3.Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources ( repo )
4.upstage/SOLAR-10.7B-v1.0 ( huggingface)