📺演示
Mobile-Agent-v3 (注意:视频未加速)
YouTube
Bilibili
Mobile-Agent-v2
https://github.com/X-PLUG/MobileAgent/assets/127390760/d907795d-b5b9-48bf-b1db-70cf3f45d155
Mobile-Agent
https://github.com/X-PLUG/MobileAgent/assets/127390760/26c48fb0-67ed-4df6-97b2-aa0c18386d31
📢新闻
- 🔥🔥[7.29] Mobile-Agent在第23届中国计算语言学大会(CCL 2024)上获得了最佳演示奖。在CCL 2024上,我们展示了即将推出的Mobile-Agent-v3。它具有较小的内存开销(8 GB)、更快的推理速度(每次操作10至15秒),并且全部使用开源模型。视频演示请见最后一节 📺演示。
- 🔥[6.27] 我们在Hugging Face和ModelScope上推出了可以上传手机截图体验Mobile-Agent-V2的演示。用户无需配置模型和设备即可立即体验。
- [6. 4] Modelscope-Agent已支持Mobile-Agent-V2,基于Android Adb环境,请参见应用。
- [6. 4] 我们推出了Mobile-Agent-v2,一个通过多代理协作实现有效导航的移动设备操作助手。
- [3.10] Mobile-Agent已被ICLR 2024大型语言模型(LLM)代理研讨会接受。
📱版本
- Mobile-Agent-v3
- Mobile-Agent-v2 - 通过多代理协作实现有效导航的移动设备操作助手
- Mobile-Agent - 具有视觉感知的自主多模态移动设备代理
⭐Star历史
📑引用
如果您发现Mobile-Agent对您的研究和应用有帮助,请使用以下BibTeX引用:
@article{wang2024mobile2,
title={Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration},
author={Wang, Junyang and Xu, Haiyang and Jia, Haitao and Zhang, Xi and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
journal={arXiv preprint arXiv:2406.01014},
year={2024}
}
@article{wang2024mobile,
title={Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception},
author={Wang, Junyang and Xu, Haiyang and Ye, Jiabo and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
journal={arXiv preprint arXiv:2401.16158},
year={2024}
}
📦相关项目
- AppAgent: Multimodal Agents as Smartphone Users
- mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
- Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
- GroundingDINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
- CLIP: Contrastive Language-Image Pretraining