Overview of methods for bilinear modeling of batch data, including theory, methodologies and examples for experienced professionals in the biotech, pharmaceutical and petrochemical industries.
Process Analytical Technologies (PAT) have become increasingly important with the establishment of the quality-by-design paradigm in industrial processes, particularly where batch operation is standard. PAT plays an instrumental role in advancing process understanding and operational efficiency, while strengthening safety and reliability to ensure consistent on-spec product quality and minimize environmental impact. Empirical methods based on latent variables, often referred to as chemometric methods, are a main component of PAT. When used alongside Batch Multivariate Statistical Process Control (BMSPC), these methods enable the timely detection and diagnosis of process upsets. Furthermore, process understanding can be improved by applying Latent Variable Models (LVMs), such as Principal Component Analysis (PCA) and Partial Least Squares (PLS), particularly relevant in batch processes, where the inherent complexity of the model results in a high degree of uncertainty in the operation.
Data Science for Batch Processes: Statistical Learning, Monitoring and Understanding provides a comprehensive and rigorous examination of the bilinear modeling and monitoring of batch processes, comprising data alignment, pre-processing, three-way-to-two-way data transformation, data analysis and design of monitoring systems, including practical challenges and considerations when analyzing multi-dimensional batch data. Case studies and hands-on MATLAB examples using the MVBatch toolbox bridge theory and practice, illustrating how these methods can be applied.
Data Science for Batch Processes: Statistical Learning, Monitoring and Understanding is an essential guide for professionals and academics who seek both foundational knowledge and advanced techniques in batch processes and data analysis.
Table of Contents:
Foreword vii
Prologue: Challenges for the Third Millennium ix
1 Introduction 1
1.1 Industrial Batch Processes 1
1.2 Types of Sensors 3
1.3 Batch Process Modeling 5
1.3.1 Knowledge-based Models 5
1.3.2 Data-driven Models 6
1.3.3 Hybrid Models 7
1.4 Bilinear Modeling Cycle for Batch Process Monitoring 7
2 Data-driven Models Based on Latent Variables 13
2.1 Compression 13
2.2 Principal Component Analysis 18
2.2.1 Data Preprocessing 21
2.2.2 Selection of the Number of Principal Components 26
2.2.3 Parameters Stability 30
2.3 Regression 33
2.4 Regression Models Based on Latent Variables 35
2.4.1 Principal Component Regression 35
2.4.2 Partial Least Squares 36
2.4.3 Data Preprocessing 38
2.4.4 Selection of the Number of Latent Variables 41
2.4.5 PLS Versus Other Regression Models 42
2.5 Multivariate Exploratory Data Analysis 43
2.6 Missing Data 46
2.6.1 Model Exploitation 47
2.6.2 Model Building 52
2.6.3 Final Reflections About Missing Data Imputation and MSPC 52
3 Batch Data Equalization 55
3.1 Introduction 55
3.2 Challenges in Batch Equalization 56
3.3 Equalization of Variables Within a Batch 59
3.3.1 Discarding Intermediate Values 62
3.3.2 Estimating Missing Values 64
3.3.2.1 Comparison of Equalization Methods Based on Latent Variable Models 70
3.3.3 Rearranging Data 71
3.4 Multirate System 74
4 Batch Synchronization 79
4.1 Introduction 79
4.2 Synchronization Approaches 81
4.2.1 Indicator Variable 83
4.2.2 Time Linear Expanding/Compressing 87
4.2.2.1 Observation (OWU) Level and TLEC Synchronization Approach 89
4.2.3 Dynamic Time Warping 90
4.2.3.1 Warping Function Constraints 92
4.2.3.2 The DTW Algorithm 94
4.2.3.3 Optimization Problem 95
4.2.3.4 End-of-batch DTW Synchronization for Batch Process Monitoring 97
4.2.3.5 On the Use of Warping Information 100
4.2.4 Relaxed Greedy Time Warping 105
4.2.4.1 Enhanced Global Constraints 107
4.2.4.2 Cross-validation for the Estimation of the RGTW Parameters 110
4.2.5 Multisynchro 114
4.2.5.1 Asynchronism Detection 115
4.2.5.2 Specific Batch Synchronization 117
4.2.5.3 Iterative Batch Synchronization and Anomaly Detection Procedure 120
4.3 Effects of Synchronization on the Correlation Structure 129
5 Batch Data Preprocessing 141
5.1 Batch Preprocessing Operations 141
5.2 Mean Centering 143
5.3 Scaling 144
6 Three-way to Two-way Transformation 149
6.1 Introduction 149
6.2 Single-model Approach 150
6.2.1 Batch-wise Unfolding 150
6.2.2 Variable-wise Unfolding 156
6.2.3 Batch Dynamic Unfolding 160
6.3 K-models Approach 162
6.3.1 Hierarchical-model Approach 168
6.4 Multiphase Approach 171
6.4.1 Phases in Batch-wise Data 172
6.4.2 Phases in Variable-wise Data 175
6.4.3 Phases in Batch Dynamic Data 177
6.5 Conclusion 178
7 Batch Process Data Analysis and Statistical Monitoring 181
7.1 Introduction 181
7.2 Historical Batch Data Analysis 181
7.3 Batch Multivariate Statistical Process Control 186
7.3.1 Phase I 186
7.3.2 Phase II 187
7.3.2.1 Post-batch Process Monitoring 187
7.3.2.2 Real-time Process Monitoring 188
7.4 Practical Issues 190
List of Acronyms 197
Bibliography 199
Index 211
About the Author :
José M. González-Martínez is Manager of the Department of Chemometrics and Digital Chemistry at Shell in the Netherlands, overseeing worldwide operations and leading key consultancy efforts, new technology developments and R&D business initiatives. He specializes in Chemometrics and Statistics for Chemicals, Catalysis, Integrated Gas, CO2 Abatement, and Low Carbon Fuel and Gas solutions. He has published multiple scientific articles and patents, and has been awarded several academic and industry prizes.
José Camacho is a Full Professor at the Department of Signal Theory, Telematics and Communication and leader of the Computational Data Science Laboratory (CoDaS Lab) at the University of Granada, Spain. He specializes in extracting knowledge from data and the design of new data science algorithms and software in domains like precision medicine, industrial processes, cybersecurity or ecology. He is Scientific Advisor at Datharsis.
Joan Borràs-Ferrís is a researcher and specialist in chemical engineering, applied statistics, and process modeling in digitalized industrial environments. He holds a PhD in Statistics and Optimization from the Universitat Politcnica de Valencia, Spain. He is currently Chief Technology Officer at Kensight Solutions. He has received the ENBIS Young Statistician Award for his work introducing innovative methods that promote the use of statistics in daily practice.
Alberto Ferrer is a Full Professor of Statistics at the Universitat Politècnica de València, Spain, head of the Multivariate Statistical Engineering Group, Chief Scientific Officer at Kenko Imalytics, Scientific Advisor at Kensight Solutions, and elected member of the International Statistical Institute. His research focuses on the development and integration of machine learning and multivariate statistics to address the digitalization challenges in industry, healthcare, and technology. He is the recipient of the ENBIS Box Medal Award 2025.