Multiblock Data Fusion in Statistics and Machine Learning
Home > Mathematics and Science Textbooks > Chemistry > Multiblock Data Fusion in Statistics and Machine Learning: Applications in the Natural and Life Sciences
Multiblock Data Fusion in Statistics and Machine Learning: Applications in the Natural and Life Sciences

Multiblock Data Fusion in Statistics and Machine Learning: Applications in the Natural and Life Sciences

|
     0     
5
4
3
2
1




Available


About the Book

Multiblock Data Fusion in Statistics and Machine Learning Explore the advantages and shortcomings of various forms of multiblock analysis, and the relationships between them, with this expert guide Arising out of fusion problems that exist in a variety of fields in the natural and life sciences, the methods available to fuse multiple data sets have expanded dramatically in recent years. Older methods, rooted in psychometrics and chemometrics, also exist. Multiblock Data Fusion in Statistics and Machine Learning: Applications in the Natural and Life Sciences is a detailed overview of all relevant multiblock data analysis methods for fusing multiple data sets. It focuses on methods based on components and latent variables, including both well-known and lesser-known methods with potential applications in different types of problems. Many of the included methods are illustrated by practical examples and are accompanied by a freely available R-package. The distinguished authors have created an accessible and useful guide to help readers fuse data, develop new data fusion models, discover how the involved algorithms and models work, and understand the advantages and shortcomings of various approaches. This book includes: A thorough introduction to the different options available for the fusion of multiple data sets, including methods originating in psychometrics and chemometrics Practical discussions of well-known and lesser-known methods with applications in a wide variety of data problems Included, functional R-code for the application of many of the discussed methods Perfect for graduate students studying data analysis in the context of the natural and life sciences, including bioinformatics, sensometrics, and chemometrics, Multiblock Data Fusion in Statistics and Machine Learning: Applications in the Natural and Life Sciences is also an indispensable resource for developers and users of the results of multiblock methods.

Table of Contents:
Foreword xiii Preface xv List of Figures xvii List of Tables xxxi Part I Introductory Concepts and Theory 1 1 Introduction 3 1.1 Scope of the Book 3 1.2 Potential Audience 4 1.3 Types of Data and Analyses 5 1.3.1 Supervised and Unsupervised Analyses 5 1.3.2 High-, Mid- and Low-level Fusion 5 1.3.3 Dimension Reduction 7 1.3.4 Indirect Versus Direct Data 8 1.3.5 Heterogeneous Fusion 8 1.4 Examples 8 1.4.1 Metabolomics 8 1.4.2 Genomics 11 1.4.3 Systems Biology 13 1.4.4 Chemistry 13 1.4.5 Sensory Science 15 1.5 Goals of Analyses 16 1.6 Some History 17 1.7 Fundamental Choices 17 1.8 Common and Distinct Components 19 1.9 Overview and Links 20 1.10 Notation and Terminology 21 1.11 Abbreviations 22 2 Basic Theory and Concepts 25 2.i General Introduction 25 2.1 Component Models 25 2.1.1 General Idea of Component Models 25 2.1.2 Principal Component Analysis 26 2.1.3 Sparse PCA 30 2.1.4 Principal Component Regression 31 2.1.5 Partial Least Squares 32 2.1.6 Sparse PLS 36 2.1.7 Principal Covariates Regression 37 2.1.8 Redundancy Analysis 38 2.1.9 Comparing PLS, PCovR and RDA 38 2.1.10 Generalised Canonical Correlation Analysis 38 2.1.11 Simultaneous Component Analysis 39 2.2 Properties of Data 39 2.2.1 Data Theory 39 2.2.2 Scale-types 42 2.3 Estimation Methods 44 2.3.1 Least-squares Estimation 44 2.3.2 Maximum-likelihood Estimation 45 2.3.3 Eigenvalue Decomposition-based Methods 47 2.3.4 Covariance or Correlation-based Estimation Methods 47 2.3.5 Sequential Versus Simultaneous Methods 48 2.3.6 Homogeneous Versus Heterogeneous Fusion 50 2.4 Within- and Between-block Variation 52 2.4.1 Definition and Example 52 2.4.2 MAXBET Solution 54 2.4.3 MAXNEAR Solution 54 2.4.4 PLS2 Solution 55 2.4.5 CCA Solution 55 2.4.6 Comparing the Solutions 56 2.4.7 PLS, RDA and CCA Revisited 56 2.5 Framework for Common and Distinct Components 60 2.6 Preprocessing 63 2.7 Validation 64 2.7.1 Outliers 64 2.7.1.1 Residuals 64 2.7.1.2 Leverage 66 2.7.2 Model Fit 67 2.7.3 Bias-variance Trade-off 69 2.7.4 Test Set Validation 70 2.7.5 Cross-validation 72 2.7.6 Permutation Testing 75 2.7.7 Jackknife and Bootstrap 76 2.7.8 Hyper-parameters and Penalties 77 2.8 Appendix 78 3 Structure of Multiblock Data 87 3.i General Introduction 87 3.1 Taxonomy 87 3.2 Skeleton of a Multiblock Data Set 87 3.2.1 Shared Sample Mode 88 3.2.2 Shared Variable Mode 88 3.2.3 Shared Variable or Sample Mode 88 3.2.4 Shared Variable and Sample Mode 89 3.3 Topology of a Multiblock Data Set 90 3.3.1 Unsupervised Analysis 90 3.3.2 Supervised Analysis 93 3.4 Linking Structures 95 3.4.1 Linking Structure for Unsupervised Analysis 95 3.4.2 Linking Structures for Supervised Analysis 96 3.5 Summary 98 4 Matrix Correlations 99 4.i General Introduction 99 4.1 Definition 99 4.2 Most Used Matrix Correlations 101 4.2.1 Inner Product Correlation 101 4.2.2 GCD coefficient 101 4.2.3 RV-coefficient 102 4.2.4 SMI-coefficient 102 4.3 Generic Framework of Matrix Correlations 104 4.4 Generalised Matrix Correlations 105 4.4.1 Generalised RV-coefficient 105 4.4.2 Generalised Association Coefficient 106 4.5 Partial Matrix Correlations 108 4.6 Conclusions and Recommendations 110 4.7 Open Issues 111 Part II Selected Methods for Unsupervised and Supervised Topologies 113 5 Unsupervised Methods 115 5.i General Introduction 115 5.ii Relations to the General Framework 115 5.1 Shared Variable Mode 117 5.1.1 Only Common Variation 117 5.1.1.1 Simultaneous Component Analysis 117 5.1.1.2 Clustering and SCA 123 5.1.1.3 Multigroup Data Analysis 125 5.1.2 Common, Local, and Distinct Variation 126 5.1.2.1 Distinct and Common Components 127 5.1.2.2 Multivariate Curve Resolution 130 5.2 Shared Sample Mode 133 5.2.1 Only Common Variation 133 5.2.1.1 SUM-PCA 133 5.2.1.2 Multiple Factor Analysis and STATIS 135 5.2.1.3 Generalised Canonical Analysis 136 5.2.1.4 Regularised Generalised Canonical Correlation Analysis 139 5.2.1.5 Exponential Family SCA 140 5.2.1.6 Optimal-scaling 143 5.2.2 Common, Local, and Distinct Variation 146 5.2.2.1 Joint and Individual Variation Explained 146 5.2.2.2 Distinct and Common Components 147 5.2.2.3 PCA-GCA 148 5.2.2.4 Advanced Coupled Matrix and Tensor Factorisation 153 5.2.2.5 Penalised-ESCA 156 5.2.2.6 Multivariate Curve Resolution 158 5.3 Generic Framework 159 5.3.1 Framework for Simultaneous Unsupervised Methods 159 5.3.1.1 Description of the Framework 159 5.3.1.2 Framework Applied to Simultaneous Unsupervised Data Analysis Methods 161 5.3.1.3 Framework of Common/Distinct Applied to Simultaneous Unsupervised Multiblock Data Analysis Methods 161 5.4 Conclusions and Recommendations 162 5.5 Open Issues 164 6 ASCA and Extensions 167 6.i General Introduction 167 6.ii Relations to the General Framework 167 6.1 ANOVA-Simultaneous Component Analysis 168 6.1.1 The ASCA Method 168 6.1.2 Validation of ASCA 176 6.1.2.1 Permutation Testing 176 6.1.2.2 Back-projection 178 6.1.2.3 Confidence Ellipsoids 178 6.1.3 The ASCA+ and LiMM-PCA Methods 181 6.2 Multilevel-SCA 182 6.3 Penalised-ASCA 183 6.4 Conclusions and Recommendations 185 6.5 Open Issues 186 7 Supervised Methods 187 7.i General Introduction 187 7.ii Relations to the General Framework 187 7.1 Multiblock Regression: General Perspectives 188 7.1.1 Model and Assumptions 188 7.1.2 Different Challenges and Aims 188 7.2 Multiblock PLS Regression 190 7.2.1 Standard Multiblock PLS Regression 190 7.2.2 MB-PLS Used for Classification 194 7.2.3 Sparse Multiblock PLS Regression (sMB-PLS) 196 7.3 The Family of SO-PLS Regression Methods (Sequential and Orthogonalised PLS Regression) 199 7.3.1 The SO-PLS Method 199 7.3.2 Order of Blocks 202 7.3.3 Interpretation Tools 202 7.3.4 Restricted PLS Components and their Application in SO-PLS 203 7.3.5 Validation and Component Selection 204 7.3.6 Relations to ANOVA 205 7.3.7 Extensions of SO-PLS to Handle Interactions Between Blocks 212 7.3.8 Further Applications of SO-PLS 215 7.3.9 Relations Between SO-PLS and ASCA 215 7.4 Parallel and Orthogonalised PLS (PO-PLS) Regression 217 7.5 Response Oriented Sequential Alternation 222 7.5.1 The ROSA Method 222 7.5.2 Validation 225 7.5.3 Interpretation 225 7.6 Conclusions and Recommendations 228 7.7 Open Issues 229 Part III Methods for Complex Multiblock Structures 231 8 Complex Block Structures; with Focus on L-Shape Relations 233 8.i General Introduction 233 8.ii Relations to the General Framework 234 8.1 Analysis of L-shape Data: General Perspectives 235 8.2 Sequential Procedures for L-shape Data Based on PLS/PCR and ANOVA 236 8.2.1 Interpretation of X1, Quantitative X2-data, Horizontal Axis First 236 8.2.2 Interpretation of X1, Categorical X2-data, Horizontal Axis First 238 8.2.3 Analysis of Segments/Clusters of X1 Data 240 8.3 The L-PLS Method for Joint Estimation of Blocks in L-shape Data 246 8.3.1 The Original L-PLS Method, Endo-L-PLS 247 8.3.2 Exo- Versus Endo-L-PLS 250 8.4 Modifications of the Original L-PLS Idea 252 8.4.1 Weighting Information from X3 and X1 in L-PLS Using a Parameter α252 8.4.2 Three-blocks Bifocal PLS 253 8.5 Alternative L-shape Data Analysis Methods 254 8.5.1 Principal Component Analysis with External Information 254 8.5.2 A Simple PCA Based Procedure for Using Unlabelled Data in Calibration 255 8.5.3 Multivariate Curve Resolution for Incomplete Data 256 8.5.4 An Alternative Approach in Consumer Science Based on Correlations Between X3 and X1 257 8.6 Domino PLS and More Complex Data Structures 258 8.7 Conclusions and Recommendations 258 8.8 Open Issues 260 Part IV Alternative Methods for Unsupervised and Supervised Topologies 261 9 Alternative Unsupervised Methods 263 9.i General Introduction 263 9.ii Relationship to the General Framework 263 9.1 Shared Variable Mode 263 9.2 Shared Sample Mode 265 9.2.1 Only Common Variation 265 9.2.1.1 DIABLO 265 9.2.1.2 Generalised Coupled Tensor Factorisation 266 9.2.1.3 Representation Matrices 267 9.2.1.4 Extended PCA 272 9.2.2 Common, Local, and Distinct Variation 273 9.2.2.1 Generalised SVD 273 9.2.2.2 Structural Learning and Integrative Decomposition 273 9.2.2.3 Bayesian Inter-battery Factor Analysis 275 9.2.2.4 Group Factor Analysis 276 9.2.2.5 OnPLS 277 9.2.2.6 Generalised Association Study 278 9.2.2.7 Multi-Omics Factor Analysis 278 9.3 Two Shared Modes and Only Common Variation 281 9.3.1 Generalised Procrustes Analysis 282 9.3.2 Three-way Methods 282 9.4 Conclusions and Recommendations 283 9.4.1 Open Issues 284 10 Alternative Supervised Methods 287 10.i General Introduction 287 10.ii Relations to the General Framework 287 10.1 Model and Focus 288 10.2 Extension of PCovR 288 10.2.1 Sparse Multiblock Principal Covariates Regression, Sparse PCovR 288 10.2.2 Multiway Multiblock Covariates Regression 289 10.3 Multiblock Redundancy Analysis 292 10.3.1 Standard Multiblock Redundancy Analysis 292 10.3.2 Sparse Multiblock Redundancy Analysis 294 10.4 Miscellaneous Multiblock Regression Methods 295 10.4.1 Multiblock Variance Partitioning 296 10.4.2 Network Induced Supervised Learning 296 10.4.3 Common Dimensions for Multiblock Regression 298 10.5 Modifications and Extensions of the SO-PLS Method 298 10.5.1 Extensions of SO-PLS to Three-Way Data 298 10.5.2 Variable Selection for SO-PLS 299 10.5.3 More Complicated Error Structure for SO-PLS 299 10.5.4 SO-PLS Used for Path Modelling 300 10.6 Methods for Data Sets Split Along the Sample Mode, Multigroup Methods 304 10.6.1 Multigroup PLS Regression 304 10.6.2 Clustering of Observations in Multiblock Regression 306 10.6.3 Domain-Invariant PLS, DI-PLS 307 10.7 Conclusions and Recommendations 308 10.8 Open Issues 309 Part V Software 311 11 Algorithms and Software 313 11.1 Multiblock Software 313 11.2 R package multiblock 313 11.3 Installing and Starting the Package 314 11.4 Data Handling 314 11.4.1 Read From File 314 11.4.2 Data Pre-processing 315 11.4.3 Re-coding Categorical Data 316 11.4.4 Data Structures for Multiblock Analysis 317 11.4.4.1 Create List of Blocks 317 11.4.4.2 Create data.frame of Blocks 317 11.5 Basic Methods 318 11.5.1 Prepare Data 319 11.5.2 Modelling 319 11.5.3 Common Output Elements Across Methods 319 11.5.4 Scores and Loadings 320 11.6 Unsupervised Methods 321 11.6.1 Formatting Data for Unsupervised Data Analysis 321 11.6.2 Method Interfaces 322 11.6.3 Shared Sample Mode Analyses 322 11.6.4 Shared Variable Mode 322 11.6.5 Common Output Elements Across Methods 323 11.6.6 Scores and Loadings 324 11.6.7 Plot From Imported Package 325 11.7 ANOVA Simultaneous Component Analysis 325 11.7.1 Formula Interface 325 11.7.2 Simulated Data 325 11.7.3 ASCA Modelling 325 11.7.4 ASCA Scores 326 11.7.5 ASCA Loadings 326 11.8 Supervised Methods 327 11.8.1 Formatting Data for Supervised Analyses 327 11.8.2 Multiblock Partial Least Squares 328 11.8.2.1 MB-PLS Modelling 328 11.8.2.2 MB-PLS Summaries and Plotting 328 11.8.3 Sparse Multiblock Partial Least Squares 328 11.8.3.1 Sparse MB-PLS Modelling 328 11.8.3.2 Sparse MB-PLS Plotting 329 11.8.4 Sequential and Orthogonalised Partial Least Squares 330 11.8.4.1 SO-PLS Modelling 330 11.8.4.2 Måge Plot 331 11.8.4.3 SO-PLS Loadings 332 11.8.4.4 SO-PLS Scores 333 11.8.4.5 SO-PLS Prediction 334 11.8.4.6 SO-PLS Validation 334 11.8.4.7 Principal Components of Predictions 336 11.8.4.8 CVANOVA 336 11.8.5 Parallel and Orthogonalised Partial Least Squares 337 11.8.5.1 PO-PLS Modelling 337 11.8.5.2 PO-PLS Scores and Loadings 338 11.8.6 Response Optimal Sequential Alternation 339 11.8.6.1 ROSA Modelling 339 11.8.6.2 ROSA Loadings 340 11.8.6.3 ROSA Scores 340 11.8.6.4 ROSA Prediction 340 11.8.6.5 ROSA Validation 341 11.8.6.6 ROSA Image Plots 342 11.8.7 Multiblock Redundancy Analysis 343 11.8.7.1 MB-RDA Modelling 343 11.8.7.2 MB-RDA Loadings and Scores 343 11.9 Complex Data Structures 344 11.9.1 L-PLS 344 11.9.1.1 Simulated L-shaped Data 344 11.9.1.2 Exo-L-PLS 344 11.9.1.3 Endo-L-PLS 344 11.9.1.4 L-PLS Cross-validation 345 11.9.2 SO-PLS-PM 345 11.9.2.1 Single SO-PLS-PM Model 346 11.9.2.2 Multiple Paths in an SO-PLS-PM Model 346 11.10 Software Packages 347 11.10.1 R Packages 347 11.10.2 MATLAB Toolboxes 348 11.10.3 Python 349 11.10.4 Commercial Software 349                References 351 Index 373


Best Sellers


Product Details
  • ISBN-13: 9781119600961
  • Publisher: John Wiley & Sons Inc
  • Publisher Imprint: John Wiley & Sons Inc
  • Height: 244 mm
  • No of Pages: 416
  • Returnable: N
  • Sub Title: Applications in the Natural and Life Sciences
  • Width: 170 mm
  • ISBN-10: 1119600960
  • Publisher Date: 28 Apr 2022
  • Binding: Hardback
  • Language: English
  • Returnable: N
  • Spine Width: 26 mm
  • Weight: 1098 gr


Similar Products

Add Photo
Add Photo

Customer Reviews

REVIEWS      0     
Click Here To Be The First to Review this Product
Multiblock Data Fusion in Statistics and Machine Learning: Applications in the Natural and Life Sciences
John Wiley & Sons Inc -
Multiblock Data Fusion in Statistics and Machine Learning: Applications in the Natural and Life Sciences
Writing guidlines
We want to publish your review, so please:
  • keep your review on the product. Review's that defame author's character will be rejected.
  • Keep your review focused on the product.
  • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
  • Refrain from mentioning competitors or the specific price you paid for the product.
  • Do not include any personally identifiable information, such as full names.

Multiblock Data Fusion in Statistics and Machine Learning: Applications in the Natural and Life Sciences

Required fields are marked with *

Review Title*
Review
    Add Photo Add up to 6 photos
    Would you recommend this product to a friend?
    Tag this Book Read more
    Does your review contain spoilers?
    What type of reader best describes you?
    I agree to the terms & conditions
    You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

    CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

    These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


    By submitting any content to Bookswagon, you guarantee that:
    • You are the sole author and owner of the intellectual property rights in the content;
    • All "moral rights" that you may have in such content have been voluntarily waived by you;
    • All content that you post is accurate;
    • You are at least 13 years old;
    • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
    You further agree that you may not submit any content:
    • That is known by you to be false, inaccurate or misleading;
    • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
    • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
    • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
    • For which you were compensated or granted any consideration by any unapproved third party;
    • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
    • That contains any computer viruses, worms or other potentially damaging computer programs or files.
    You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


    For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


    All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

    Accept

    New Arrivals

    Inspired by your browsing history


    Your review has been submitted!

    You've already reviewed this product!