Steganography is the art of communicating a secret message, hiding the very existence of a secret message. This book is an introduction to steganalysis as part of the wider trend of multimedia forensics, as well as a practical tutorial on machine learning in this context. It looks at a wide range of feature vectors proposed for steganalysis with performance tests and comparisons. Python programs and algorithms are provided to allow readers to modify and reproduce outcomes discussed in the book.
Table of Contents:
Preface xi
Part I Overview
1 Introduction 3
1.1 Real Threat or Hype? 3
1.2 Artificial Intelligence and Learning 4
1.3 How to Read this Book 5
2 Steganography and Steganalysis 7
2.1 Cryptography versus Steganography 7
2.2 Steganography 8
2.2.1 The Prisoners’ Problem 9
2.2.2 Covers – Synthesis and Modification 10
2.2.3 Keys and Kerckhoffs’ Principle 12
2.2.4 LSB Embedding 13
2.2.5 Steganography and Watermarking 15
2.2.6 Different Media Types 16
2.3 Steganalysis 17
2.3.1 The Objective of Steganalysis 17
2.3.2 Blind and Targeted Steganalysis 18
2.3.3 Main Approaches to Steganalysis 19
2.3.4 Example: Pairs of Values 22
2.4 Summary and Notes 23
3 Getting Started with a Classifier 25
3.1 Classification 25
3.1.1 Learning Classifiers 26
3.1.2 Accuracy 27
3.2 Estimation and Confidence 28
3.3 Using libSVM 30
3.3.1 Training and Testing 30
3.3.2 Grid Search and Cross-validation 31
3.4 Using Python 33
3.4.1 Why we use Python 33
3.4.2 Getting Started with Python 34
3.4.3 Scientific Computing 35
3.4.4 Python Imaging Library 36
3.4.5 An Example: Image Histogram 37
3.5 Images for Testing 38
3.6 Further Reading 39
Part II Features
4 Histogram Analysis 43
4.1 Early Histogram Analysis 43
4.2 Notation 44
4.3 Additive Independent Noise 44
4.3.1 The Effect of Noise 45
4.3.2 The Histogram Characteristic Function 47
4.3.3 Moments of the Characteristic Function 48
4.3.4 Amplitude of Local Extrema 51
4.4 Multi-dimensional Histograms 54
4.4.1 HCF Features for Colour Images 55
4.4.2 The Co-occurrence Matrix 57
4.5 Experiment and Comparison 63
5 Bit-plane Analysis 65
5.1 Visual Steganalysis 65
5.2 Autocorrelation Features 67
5.3 Binary Similarity Measures 69
5.4 Evaluation and Comparison 72
6 More Spatial Domain Features 75
6.1 The Difference Matrix 75
6.1.1 The EM Features of Chen et al. 76
6.1.2 Markov Models and the SPAM Features 79
6.1.3 Higher-order Differences 81
6.1.4 Run-length Analysis 81
6.2 Image Quality Measures 82
6.3 Colour Images 86
6.4 Experiment and Comparison 86
7 The Wavelets Domain 89
7.1 A Visual View 89
7.2 The Wavelet Domain 90
7.2.1 The Fast Wavelet Transform 91
7.2.2 Example: The Haar Wavelet 92
7.2.3 The Wavelet Transform in Python 93
7.2.4 Other Wavelet Transforms 94
7.3 Farid’s Features 96
7.3.1 The Image Statistics 96
7.3.2 The Linear Predictor 96
7.3.3 Notes 98
7.4 HCF in the Wavelet Domain 98
7.4.1 Notes and Further Reading 100
7.5 Denoising and the WAM Features 101
7.5.1 The Denoising Algorithm 101
7.5.2 Locally Adaptive LAW-ML 103
7.5.3 Wavelet Absolute Moments 104
7.6 Experiment and Comparison 106
8 Steganalysis in the JPEG Domain 107
8.1 JPEG Compression 107
8.1.1 The Compression 108
8.1.2 Programming JPEG Steganography 110
8.1.3 Embedding in JPEG 111
8.2 Histogram Analysis 114
8.2.1 The JPEG Histogram 114
8.2.2 First-order Features 118
8.2.3 Second-order Features 119
8.2.4 Histogram Characteristic Function 121
8.3 Blockiness 122
8.4 Markov Model-based Features 124
8.5 Conditional Probabilities 126
8.6 Experiment and Comparison 128
9 Calibration Techniques 131
9.1 Calibrated Features 131
9.2 JPEG Calibration 133
9.2.1 The FRI-23 Feature Set 133
9.2.2 The Pevný Features and Cartesian Calibration 135
9.3 Calibration by Downsampling 137
9.3.1 Downsampling as Calibration 137
9.3.2 Calibrated HCF-COM 138
9.3.3 The Sum and Difference Images 139
9.3.4 Features for Colour Images 143
9.3.5 Pixel Selection 143
9.3.6 Other Features Based on Downsampling 145
9.3.7 Evaluation and Notes 146
9.4 Calibration in General 146
9.5 Progressive Randomisation 148
Part III Classifiers
10 Simulation and Evaluation 153
10.1 Estimation and Simulation 153
10.1.1 The Binomial Distribution 153
10.1.2 Probabilities and Sampling 155
10.1.3 Monte Carlo Simulations 156
10.1.4 Confidence Intervals 157
10.2 Scalar Measures 158
10.2.1 Two Error Types 158
10.2.2 Common Scalar Measures 160
10.3 The Receiver Operating Curve 161
10.3.1 The libSVM API for Python 161
10.3.2 The ROC Curve 164
10.3.3 Choosing a Point on the ROC Curve 167
10.3.4 Confidence and Variance 168
10.3.5 The Area Under the Curve 169
10.4 Experimental Methodology 170
10.4.1 Feature Storage 171
10.4.2 Parallel Computation 171
10.4.3 The Dangers of Large-scale Experiments 173
10.5 Comparison and Hypothesis Testing 173
10.5.1 The Hypothesis Test 174
10.5.2 Comparing Two Binomial Proportions 174
10.6 Summary 176
11 Support Vector Machines 179
11.1 Linear Classifiers 179
11.1.1 Linearly Separable Problems 180
11.1.2 Non-separable Problems 183
11.2 The Kernel Function 186
11.2.1 Example: The XOR Function 187
11.2.2 The SVM Algorithm 187
11.3 ν-SVM 189
11.4 Multi-class Methods 191
11.5 One-class Methods 192
11.5.1 The One-class SVM Solution 193
11.5.2 Practical Problems 194
11.5.3 Multiple Hyperspheres 195
11.6 Summary 196
12 Other Classification Algorithms 197
12.1 Bayesian Classifiers 198
12.1.1 Classification Regions and Errors 199
12.1.2 Misclassification Risk 200
12.1.3 The Naïve Bayes Classifier 201
12.1.4 A Security Criterion 202
12.2 Estimating Probability Distributions 203
12.2.1 The Histogram 204
12.2.2 The Kernel Density Estimator 204
12.3 Multivariate Regression Analysis 209
12.3.1 Linear Regression 209
12.3.2 Support Vector Regression 211
12.4 Unsupervised Learning 212
12.4.1 K-means Clustering 213
12.5 Summary 215
13 Feature Selection and Evaluation 217
13.1 Overfitting and Underfitting 217
13.1.1 Feature Selection and Feature Extraction 219
13.2 Scalar Feature Selection 220
13.2.1 Analysis of Variance 220
13.3 Feature Subset Selection 222
13.3.1 Subset Evaluation 223
13.3.2 Search Algorithms 224
13.4 Selection Using Information Theory 225
13.4.1 Entropy 225
13.4.2 Mutual Information 227
13.4.3 Multivariate Information 229
13.4.4 Information Theory with Continuous Sets 232
13.4.5 Estimation of Entropy and Information 233
13.4.6 Ranking Features 234
13.5 Boosting Feature Selection 238
13.6 Applications in Steganalysis 239
13.6.1 Correlation Coefficient 240
13.6.2 Optimised Feature Vectors for JPEG 241
14 The Steganalysis Problem 245
14.1 Different Use Cases 245
14.1.1 Who are Alice and Bob? 245
14.1.2 Wendy’s Role 247
14.1.3 Pooled Steganalysis 248
14.1.4 Quantitative Steganalysis 249
14.2 Images and Training Sets 250
14.2.1 Choosing the Cover Source 250
14.2.2 The Training Scenario 253
14.2.3 The Steganalytic Game 257
14.3 Composite Classifier Systems 258
14.3.1 Fusion 258
14.3.2 A Multi-layer Classifier for JPEG 260
14.3.3 Benefits of Composite Classifiers 261
14.4 Summary 262
15 Future of the Field 263
15.1 Image Forensics 263
15.2 Conclusions and Notes 265
Bibliography 267
Index 279
About the Author :
Hans Georg Schaathun, Department of Computing, University of Surrey, UK
Dr Schaathun was previously a lecturer in coding and cryptography at the University of Bergen. Since February 2006, he has been a lecturer at the University of Surrey, UK, belonging to the research group in Digital Watermarking and Multimedia Security. His main research areas are applications of coding theory in information hiding, and machine learning techniques in steganalysis. He teaches Computer Security and Steganography at MSc level, and Functional Programming Techniques at u/g level. Dr Scaathun has published more than 35 international, peer-reviewed articles, and is an associate editor of EURASIP Journal of Information Security.