The four-volume set, LNAI 16597-16600 constitutes the proceedings of the 30th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2026, held in Hong Kong, China, during June 9–12, 2026.
The 184 full papers presented in this book were carefully selected and reviewed from 728 submissions.
This program featured three tracks, the Main Track, the Survey Track, and the Special Track on LLMs for Data Science.
The Main Track continued its tradition of being the premier forum for the presentation of research results and experience reports on knowledge discovery, data science, and machine learning.
The Survey Track was introduced in 2025 for the first time, to promote the dissemination of insightful survey papers: survey papers are intended to provide a structured synthesis of a particular topic in the area of data mining, including but not limited to theoretical foundations of mining, inference, and learning, big data technologies, as well as security, privacy, and integrity, for the perusal of junior researchers and of experts from other research fields.
As the rapid advancements in Large Language Models (LLMs) have opened new avenues for innovation and research across various domains, particularly in the field of data science, we introduced the Special Track on LLMs, which aimed to explore the transformative potential of LLMs for data science, bringing together researchers, practitioners, and industry experts to discuss the latest developments, challenges, and opportunities in this rapidly growing area.
Table of Contents:
.- Recommendation Systems Features: A Comprehensive Review.
.- A Survey on Advances of Foundation Models in Federated Learning.
.- Beyond Single AI: The Rise of Multi-Agent Orchestration - A Survey on Bias, Privacy, Robustness, and Interpretability.
.- Evaluation Metrics for Data Valuation Method.
.- FMs-energized Anomaly Detection and Outlier Detection: A Survey.
.- Opportunities for AutoML in the Era of Agents and LLMs.
.- A Comprehensive Survey on Enterprise Financial Risk Analysis from Big Data and LLMs Perspective.
.- Learning Bloom Filters: A Review.
.- ICL-KB: Charting the Failure Frontier of Few-Shot LLMs.
.- Memory in LLM-based Multi-agent Systems: Mechanisms, Challenges, and Collective Intelligence.
.- Clustering Ensembles: A Data Perspective Survey.
.- Federated Large Language Models: Current Progress and Future Directions.
.- DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following.
.- H^2RAG: A Hierarchical Knowledge and Hypergraph Reasoning Framework for Retrieval-Augmented Generation.
.- Adapter-Only Bridging of Frozen Speech Encoder and Frozen LLM for ASR.
.- Improving Uncertainty Quantification and Knowledge-Intensive Routing via Query Understanding in Large Language Models.
.- DependencyRAG: Dependency-guided RAG for Multi-hop Question Answering.
.- Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis.
.- GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs.
.- On Reasoning Behind Next Occupation Recommendation.
.- Enhancing LLMs for Manufacturing Information Extraction.
.- RED: Rule Guided Prompt Engineering for Graph Data Imputation.
.- Mixture-of-Adapters with Routed Distillation: Unsupervised Expert Routing for Efficient Multi-Task LoRA.
.- Unified Multimodal Retrieval Framework for Multimodal RAG.
.- Enhancing Financial Reasoning via Program-of-Thought Learning.
.- Uncertainty-aware Language Guidance for Concept Bottleneck Models.
.- Whose Politics Do LLMs Represent? Uncovering Political Bias in LLMs' Latent Space.
.- Selective Forgetting for Large Reasoning Models.
.- SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards.
.- GLLM-KT: A Graph-Incorporated Ultra-Small Large Language Model for Knowledge Tracing.
.- Graph-based learning for taxonomy optimization.
.- Dodgersort: Uncertainty-Aware VLM-Guided Human-in-the-Loop Pairwise Ranking.
.- The Imitation Game: Evaluating Persona-Driven LLM Response Behavior in Web Surveys.
.- Using Text-Based Life Trajectories from Swedish Register Data to Predict Residential Mobility with Pretrained Transformers.
.- AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science.
.- Towards Unveiling Vulnerabilities of Large Reasoning Models in the Context of the Right to Be Forgotten.
.- RLKGC: Reinforcement Learning Retrieval with Large Language Models for Knowledge Graph Completion.
.- Smart Trial: Evaluating LLMs for Recruiting Clinical Trial Participants on Social Media.
.- MACA: A Framework for Distilling Trustworthy LLMs into Efficient Retrievers.
,- Online Domain-aware LLM Decoding for Continual Domain Evolution.
.- ASTRA: Adversarial Stealthy Trigger Reasoning Attacks for Black-Box LLMs.
.- Disentangling Style and Semantics for Calendar-Driven Text Generation: A Knowledge Graph-Guided Activation Steering Approach.
.- Adaptive Beam Search with Shannon Entropy for Data-centric Reasoning in LLMs.
.- Evaluating Social Bias in RAG Systems: When External Context Helps and Reasoning Hurts.
.- ReForest: Interpretability Meets Reasoning in Isolation Forest.