Research
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization
We introduce InfiGUI-G1, a multimodal GUI agent that employs Adaptive Exploration Policy Optimization (AEPO) to improve semantic alignment in GUI grounding, achieving up to 8.3% relative improvement …
InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion
InfiGFusion is the first structure-aware fusion framework for large language models that models semantic dependencies among logits using feature-level graphs. We introduce a novel Graph-on-Logits …
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
We propose InfiFPO, a principled and efficient framework for performing model fusion during the preference alignment phase. Our key insight is that the reference model in preference optimization …
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners
We present InfiGUI-R1, a novel GUI agent that combines spatial reasoning with reinforcement learning to achieve superior performance in GUI automation tasks across desktop, mobile, and web platforms.
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion
InfiFusion is the first fusion framework for large language models that fuse up to 4 models with 14B~24B parameters. We introduce a unified framework which can fuse many heterogeneous models in one …
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
A multimodal large language model-based GUI agent that enables enhanced task automation on computing devices through hierarchical reasoning and expectation-reflection reasoning.