Click titles below to view article summaries. Generated using a custom pipeline with OpenAI mini models (with automatic fallbacks).
Sink-Aware Pruning for Diffusion Language Models
Objective
Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning
Method
Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typically preserve attention sink tokens because AR sinks serve as stable global anchors
Results
We show that this assumption does not hold for DLMs: the attention-sink position exhibits substantially higher variance over the full generation trajectory (measured by how the dominant sink locations shift across timesteps), indicating that sinks are often transient and less structurally essential than in AR models
Significance
Based on this observation, we propose Sink-Aware Pruning, which automatically identifies and prunes unstable sinks in DLMs (prior studies usually keep sinks for AR LLMs)
CLEF HIPE-2026: Evaluating Accurate and Efficient Person-Place Relation Extraction from Multilingual Historical Texts
Objective
HIPE-2026 is a CLEF evaluation lab dedicated to person-place relation extraction from noisy, multilingual historical texts
Method
Building on the HIPE-2020 and HIPE-2022 campaigns, it extends the series toward semantic relation extraction by targeting the task of identifying person--place associations in multiple languages and time periods
Results
Systems are asked to classify relations of two types - at ("Has the person ever been at this place?") and isAt ("Is the person located at this place around publication time?") - requiring reasoning over temporal and geographical cues
Significance
The lab introduces a three-fold evaluation profile that jointly assesses accuracy, computational efficiency, and domain generalization
MARS: Margin-Aware Reward-Modeling with Self-Refinement
Objective
Reward modeling is a core component of modern alignment pipelines including RLHF and RLAIF, underpinning policy optimization methods including PPO and TRPO
Method
However, training reliable reward models relies heavily on human-labeled preference data, which is costly and limited, motivating the use of data augmentation
Results
Existing augmentation approaches typically operate at the representation or semantic level and remain agnostic to the reward model's estimation difficulty
Significance
In this paper, we propose MARS, an adaptive, margin-aware augmentation and sampling strategy that explicitly targets ambiguous and failure modes of the reward model
Mine and Refine: Optimizing Graded Relevance in E-commerce Search Retrieval
Objective
We propose a two-stage "Mine and Refine" contrastive training framework for semantic text embeddings to enhance multi-category e-commerce search retrieval
Method
Large scale e-commerce search demands embeddings that generalize to long tail, noisy queries while adhering to scalable supervision compatible with product and policy constraints
Results
A practical challenge is that relevance is often graded: users accept substitutes or complements beyond exact matches, and production systems benefit from clear separation of similarity scores across these relevance strata for stable hybrid blending and thresholding
Significance
To obtain scalable policy consistent supervision, we fine-tune a lightweight LLM on human annotations under a three-level relevance guideline and further reduce residual noise via engagement driven auditing
Multi-Round Human-AI Collaboration with User-Specified Requirements
Objective
As humans increasingly rely on multiround conversational AI for high stakes decisions, principled frameworks are needed to ensure such interactions reliably improve decision quality
Method
We adopt a human centric view governed by two principles: counterfactual harm, ensuring the AI does not undermine human strengths, and complementarity, ensuring it adds value where the human is prone to err
Results
We formalize these concepts via user defined rules, allowing users to specify exactly what harm and complementarity mean for their specific task
Significance
We then introduce an online, distribution free algorithm with finite sample guarantees that enforces the user-specified constraints over the collaboration dynamics