Research

Explore our publications, models, and datasets advancing AI research.

Dec 11, 2025

From Generalist to Medical Specialist: Building Domain-Specific Multimodal LLMs via CPT, SFT and RLVR

Domain AdaptationMedical AI

A systematic overview of a three-stage pipeline for building medical multimodal LLMs from generalist backbones: Continual Pretraining (CPT), Supervised Fine-Tuning (SFT), and Reinforcement Learning with Verifiable Rewards (RLVR). InfiMed2-4B achieves 60.7% average accuracy across medical benchmarks, a 7.0 point improvement over the baseline.

Oct 3, 2025

InfiAgent: A self-evolving pyramid agent framework for infinite scenarios released

With the rapid development of artificial intelligence, Large Language Model (LLM) agents have demonstrated remarkable capabilities in organizing and executing complex tasks. However, their development heavily relies on carefully designed workflows, repeatedly debugged prompts, and deep domain expertise. This highly manual approach significantly hinders the large-scale adoption and cost-effectiveness of agent-based technology across industries.

Oct 3, 2025

Model Merging Scaling Laws: A New Way to Predict and Plan LLM Composition

We study empirical scaling laws for language model merging measured by cross-entropy. Despite wide practical use, merging has lacked a quantitative rule predicting returns as experts are added or model size scales. We identify a compact power law coupling model size and expert count: a size-dependent loss floor that decreases with capacity and an inverse‑k merging tail with clear diminishing returns. The law holds in-domain and cross-domain, tightly fits curves across architectures and methods (Average, TA, TIES, DARE), and explains two regularities: most gains arrive early and variability shrinks as more experts are included. A simple theory accounts for the ~1/k gain pattern and links floor and tail to base model properties and cross-domain diversity. This enables predictive planning: estimate experts needed for a target loss, decide when to stop, and trade off scaling the base model versus adding experts under a fixed budget—turning merging from heuristic practice into a computationally efficient, plannable alternative to multitask training. It suggests a scaling principle for distributed generative AI: predictable gains via composing specialists, offering a complementary path toward AGI-level systems.

Sep 26, 2025

InfiMed: Low-Resource Medical MLLMs with Advancing Understanding and Reasoning

We introduce InfiMed-Series models, InfiMed-SFT-3B and InfiMed-RL-3B, medical-focused Multimodal Large Language Models (MLLMs) developed by the InfiX-AI team. InfiMed-RL-3B achieves an average accuracy of 59.2% across seven authoritative medical benchmarks (including MMMU Health & Medicine, OmniMedVQA, and PMC-VQA), significantly outperforming all comparable-scale models such as MedGemma-4B-IT (54.8%) and even surpassing the larger InternVL3-8B (57.3%).

Sep 26, 2025

InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

Low Precision TrainingPost-Training

The immense computational cost of training Large Language Models (LLMs) presents a major barrier to innovation. While FP8 training offers a promising solution with significant theoretical efficiency gains, its widespread adoption has been hindered by the lack of a comprehensive, open-source training recipe. To bridge this gap, we introduce an end-to-end FP8 training recipe that seamlessly integrates continual pre-training and supervised fine-tuning. Our methodology employs a fine-grained, hybrid-granularity quantization strategy to maintain numerical fidelity while maximizing computational efficiency. Through extensive experiments, including the continue pre-training of models on a 160B-token corpus, we demonstrate that our recipe is not only remarkably stable but also essentially lossless, achieving performance on par with the BF16 baseline across a suite of reasoning benchmarks. Crucially, this is achieved with substantial efficiency improvements, including up to a 22% reduction in training time, a 14% decrease in peak memory usage, and a 19% increase in throughput. Our results establish FP8 as a practical and robust alternative to BF16, and we will release the accompanying code to further democratize large-scale model training.

Sep 15, 2025

InfiR-FP8

Reasoning-Enhanced (FP8)Not specified

A smaller reasoning-enhanced model trained from scratch using FP8 precision, achieving successful convergence.

Aug 13, 2025

InfiAlign-Qwen-7B-DPO

Preference-Optimized Reasoning Pro7B

DPO-enhanced version delivering notable gains in mathematical reasoning, built upon our high-quality SFT foundation.

Aug 12, 2025

InfiAlign-Qwen-7B-SFT

Supervised Fine-Tuned Reasoning Specialist7B

Our data-efficient SFT model achieving DeepSeek-R1 parity using only 1/8 training data, featuring multidimensional quality-curated training corpus.

Aug 12, 2025

InfiAlign: A Scalable and Sample-Efficient Framework for Enhancing LLM Reasoning

LLM AlignmentReasoning

InfiAlign is a novel framework that combines SFT and DPO with an advanced data selection pipeline to efficiently enhance LLM reasoning capabilities.

Aug 12, 2025

InfiGUI-G1-3B

GUI Agent3B

A novel policy optimization framework for multimodal large language models that addresses semantic alignment challenges in GUI grounding.

Aug 12, 2025

InfiGUI-G1-7B

GUI Agent7B

A novel policy optimization framework for multimodal large language models that addresses semantic alignment challenges in GUI grounding.

Aug 5, 2025

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

We introduce InfiGUI-G1, a multimodal GUI agent that employs Adaptive Exploration Policy Optimization (AEPO) to improve semantic alignment in GUI grounding, achieving up to 8.3% relative improvement over baseline methods.

Jul 31, 2025

android_control_test

Android Control

Test dataset for evaluating Android control models, complementing the training set with unseen scenarios to assess generalization and performance of automation agents on Android interfaces.

Jul 31, 2025

android_control_train

Android Control

Training dataset for Android control tasks, likely containing interaction data, command sequences, or GUI operation logs to support the development of Android-based automation agents.

Jul 31, 2025

InfiGUIAgent-Data

GUI Agent

Specialized dataset for training InfiGUIAgent models, containing multimodal data (e.g., GUI screenshots, user actions, task descriptions) to enable robust GUI task automation and reasoning.

Jul 31, 2025

s1K-1.1-850

General

An updated version (1.1) of a small dataset, likely containing 850 samples (inferred from '850'). May extend or refine the content of 's1K-QwQ' for more targeted model training.

Jul 31, 2025

s1K-QwQ

General

A dataset with 's1K' (likely 1,000 samples) in its name, potentially focused on question-answer pairs, dialogue, or reasoning tasks, supporting model training in conversational or logical reasoning abilities.

Jul 31, 2025

Infi-MMR-3B

Multimodal Reasoning3B

A multimodal model developed via the InfiMMR three-phase curriculum framework, enhancing multimodal reasoning capabilities in small language models.

Jul 31, 2025

InfiFPO-14B

Preference Alignment Fusion15B

A lightweight fusion method during the preference alignment phase that injects fused model behavior into preference learning.

Jul 31, 2025

InfiFusion-14B

Model Fusion14B

A logit-level fusion pipeline based on Universal Logit Distillation, enhanced with Top-K filtering and logits standardization. Supports both pairwise and unified fusion strategies to balance performance and efficiency.

Jul 31, 2025

InfiGFusion-14B

Structure-Aware Model Fusion14B

A structure-aware extension that builds co-activation graphs from logits and aligns them via an efficient Gromov-Wasserstein loss approximation.

Jul 31, 2025

InfiGUI-R1-3B

GUI Agent3B

A GUI agent developed via the Actor2Reasoner framework, evolving a reactive model into a deliberative reasoner through spatial reasoning distillation and reinforcement learning.

Jul 31, 2025

InfiGUIAgent-2B-Stage1

GUI Agent2B

A multimodal generalist GUI agent with native hierarchical and expectation-reflection reasoning through a unique two-stage supervised pipeline.

Jul 31, 2025

InfiR-1B-Base

Reasoning-Enhanced1B

Part of the InfiR reasoning-enhanced low-resource training pipeline, crafted to be an effective small language model with improved reasoning.

Jul 31, 2025

InfiR-1B-Instruct

Instructed Reasoning-Enhanced1B

An instructed version of the InfiR small language model, part of the reasoning-enhanced low-resource training pipeline.

May 21, 2025

InfiGFusion: Graph-on-Logits Distillation for Scalable Model Fusion

To appear at NeurIPS 2025!

Recent advances in large language models (LLMs) have intensified efforts to fuse heterogeneous open-source models into a unified system that inherits their complementary strengths. Existing logit-based fusion methods maintain inference efficiency but treat vocabulary dimensions independently, overlooking semantic dependencies encoded by cross-dimension interactions. These dependencies reflect how token types interact under a model's internal reasoning and are essential for aligning models with diverse generation behaviors. To explicitly model these dependencies, we propose InfiGFusion, the first structure-aware fusion framework with a novel Graph-on-Logits Distillation (GLD) loss. Specifically, we retain the top-k logits per output and aggregate their outer products across sequence positions to form a global co-activation graph, where nodes represent vocabulary channels and edges quantify their joint activations. To ensure scalability and efficiency, we design a sorting-based closed-form approximation that reduces the original O(n^4) cost of Gromov-Wasserstein distance to O(n log n), with provable approximation guarantees. Experiments across multiple fusion settings show that GLD consistently improves fusion quality and stability. InfiGFusion outperforms SOTA models and fusion baselines across 11 benchmarks spanning reasoning, coding, and mathematics. It shows particular strength in complex reasoning tasks, with +35.6 improvement on Multistep Arithmetic and +37.06 on Causal Judgement over SFT, demonstrating superior multi-step and relational inference.

May 20, 2025

InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models

To appear at NeurIPS 2025 Spotlight !Model FusionPreference Optimization

Model fusion combines multiple Large Language Models (LLMs) with different strengths into a more powerful, integrated model through lightweight training methods. Existing works on model fusion focus primarily on supervised fine-tuning (SFT), leaving preference alignment (PA) --a critical phase for enhancing LLM performance--largely unexplored. The current few fusion methods on PA phase, like WRPO, simplify the process by utilizing only response outputs from source models while discarding their probability information. To address this limitation, we propose InfiFPO, a preference optimization method for implicit model fusion. InfiFPO replaces the reference model in Direct Preference Optimization (DPO) with a fused source model that synthesizes multi-source probabilities at the sequence level, circumventing complex vocabulary alignment challenges in previous works and meanwhile maintaining the probability information. By introducing probability clipping and max-margin fusion strategies, InfiFPO enables the pivot model to align with human preferences while effectively distilling knowledge from source models. Comprehensive experiments on 11 widely-used benchmarks demonstrate that InfiFPO consistently outperforms existing model fusion and preference optimization methods. When using Phi-4 as the pivot model, InfiFPO improve its average performance from 79.95 to 83.33 on 11 benchmarks, significantly improving its capabilities in mathematics, coding, and reasoning tasks.

Apr 19, 2025

InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners

InfiGUI-R1 is a multimodal large language model-based GUI agent trained using reinforcement learning to enhance planning and error recovery skills for GUI tasks, achieving state-of-the-art performance on multiple benchmarks.

Feb 17, 2025

InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion

Unified FusionKnowledge Distillation
Jan 9, 2025

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

GUI AgentMultimodal

InfiGUIAgent is a multimodal generalist GUI agent trained through a two-stage supervised fine-tuning approach, focusing on fundamental GUI understanding skills and advanced reasoning capabilities for native GUI interactions.