时间: 2023.5.15-2023.5.21
本周大事记
1. ChatGPT iOS APP上线
报道:
2. StabilityAI开源StableStudio
github: github.com
3.DragGAN
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
更多:
非官方实现: github.com
报道: mp.weixin.qq.com
最新技术:
Dr. LLaMA: Improving Small Language Models in Domain-Specific QA via Generative Data Augmentation
论文: arxiv.org
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
论文: arxiv.org
RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs
论文: arxiv.org
GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content
论文: arxiv.org
Small Models are Valuable Plug-ins for Large Language Models 、
论文: arxiv.org
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
论文:arxiv.org
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
论文: arxiv.org
Understanding 3D Object Interaction from a Single Image
demo: huggingface.co
Towards Expert-Level Medical Question Answering with Large Language Models
论文:arxiv.org
Dual-Alignment Pre-training for Cross-lingual Sentence Embedding
论文:arxiv.org
SoundStorm: Efficient Parallel Audio Generation
论文:arxiv.org
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
论文: arxiv.org
Common Diffusion Noise Schedules and Sample Steps are Flawed
论文:arxiv.org
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
论文:arxiv.org
What You See is What You Read? Improving Text-Image Alignment Evaluation 论文: arxiv.org
Smart Word Suggestions for Writing Assistance
论文:arxiv.org
github: github.com
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
论文:arxiv.org
Explaining black box text modules in natural language with language models
论文:arxiv.org
A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot
论文: arxiv.org
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
论文:arxiv.org
github: github.com
21. Going Denser with Open-Vocabulary Part Segmentation
论文: arxiv.org
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
论文:arxiv.org
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework
LDM3D: Latent Diffusion Model for 3D
论文:arxiv.org
Going Denser with Open-Vocabulary Part Segmentation
论文:arxiv.org
代码:github.com
Comparing Software Developers with ChatGPT: An Empirical Investigation
论文:arxiv.org
Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning
论文:arxiv.org
Pengi: An Audio Language Model for Audio Tasks
论文:arxiv.org
Cross-Lingual Supervision improves Large Language Models Pre-training
论文:arxiv.org
WebCPM:首个联网支持中文问答开源模型
商业:
1. 替代还是共生?LLM时代软件从业者的机遇与进化
2. 关于大型语言模型的争论和局限
3. 游戏业的AIGC工业化,每家公司都该有可控生产线!
4.Midjourney中国版开启内测
案例:
1. 多模态 VisualGLM-6B,最低只需 8.7G 显存
2.Skybox AI
3.Chat with NeRF
4.drawit
5.taesiri/ClaudeReadsArxiv