Methodological Developments in Data Linkage
Home > Mathematics and Science Textbooks > Mathematics > Probability and statistics > Methodological Developments in Data Linkage: (Wiley Series in Probability and Statistics)
Methodological Developments in Data Linkage: (Wiley Series in Probability and Statistics)

Methodological Developments in Data Linkage: (Wiley Series in Probability and Statistics)


     0     
5
4
3
2
1



Available


X
About the Book

A comprehensive compilation of new developments in data linkage methodology The increasing availability of large administrative databases has led to a dramatic rise in the use of data linkage, yet the standard texts on linkage are still those which describe the seminal work from the 1950-60s, with some updates. Linkage and analysis of data across sources remains problematic due to lack of discriminatory and accurate identifiers, missing data and regulatory issues. Recent developments in data linkage methodology have concentrated on bias and analysis of linked data, novel approaches to organising relationships between databases and privacy-preserving linkage. Methodological Developments in Data Linkage brings together a collection of contributions from members of the international data linkage community, covering cutting edge methodology in this field. It presents opportunities and challenges provided by linkage of large and often complex datasets, including analysis problems, legal and security aspects, models for data access and the development of novel research areas.  New methods for handling uncertainty in analysis of linked data, solutions for anonymised linkage and alternative models for data collection are also discussed. Key Features: Presents cutting edge methods for a topic of increasing importance to a wide range of research areas, with applications to data linkage systems internationally Covers the essential issues associated with data linkage today Includes examples based on real data linkage systems, highlighting the opportunities, successes and challenges that the increasing availability of linkage data provides Novel approach incorporates technical aspects of both linkage, management and analysis of linked data This book will be of core interest to academics, government employees, data holders, data managers, analysts and statisticians who use administrative data. It will also appeal to researchers in a variety of areas, including epidemiology, biostatistics, social statistics, informatics, policy and public health.

Table of Contents:
Foreword xi Contributors xiii 1 Introduction 1 Katie Harron, Harvey Goldstein and Chris Dibben 1.1 Introduction: data linkage as it exists 1 1.2 Background and issues 2 1.3 Data linkage methods 3 1.3.1 Deterministic linkage 3 1.3.2 Probabilistic linkage 3 1.3.3 Data preparation 4 1.4 Linkage error 5 1.5 Impact of linkage error on analysis of linked data 6 1.6 Data linkage: the future 7 2 Probabilistic linkage 8 William E. Winkler 2.1 Introduction 8 2.2 Overview of methods 10 2.2.1 The Fellegi–Sunter model of record linkage 10 2.2.2 Learning parameters 13 2.2.3 Additional methods for matching 20 2.2.4 An empirical example 22 2.3 Data preparation 23 2.3.1 Description of a matching project 24 2.3.2 Initial file preparation 25 2.3.3 Name standardisation and parsing 26 2.3.4 Address standardisation and parsing 27 2.3.5 Summarising comments on preprocessing 27 2.4 Advanced methods 28 2.4.1 Estimating false]match rates without training data 28 2.4.2 Adjusting analyses for linkage error 32 2.5 Concluding comments 35 3 The data linkage environment 36 Chris Dibben, Mark Elliot, Heather Gowans, Darren Lightfoot and Data Linkage Centres 3.1 Introduction 36 3.2 The data linkage context 37 3.2.1 Administrative or routine data 37 3.2.2 The law and the use of administrative (personal) data for research 38 3.2.3 The identifiability problem in data linkage 42 3.3 The tools used in the production of functional anonymity through a data linkage environment 42 3.3.1 Governance, rules and the researcher 43 3.3.2 Application process, ethics scrutiny and peer review 43 3.3.3 Shaping ‘safe’ behaviour: training, sanctions, contracts and licences 43 3.3.4 ‘Safe’ data analysis environments 44 3.3.5 Fragmentation: separation of linkage process and temporary linked data 47 3.4 Models for data access and data linkage 50 3.4.1 Single centre 50 3.4.2 Separation of functions: firewalls within single centre 51 3.4.3 Separation of functions: TTP linkage 53 3.4.4 Secure multiparty computation 53 3.5 Four case study data linkage centres 54 3.5.1 Population Data BC 54 3.5.2 The Secure Anonymised Information Linkage Databank, United Kingdom 58 3.5.3 Centre for Data Linkage (Population Health Research Network), Australia 59 3.5.4 The Centre for Health Record Linkage, Australia 61 3.6 Conclusion 62 4 Bias in data linkage studies 63 Megan Bohensky 4.1 Background 63 4.2 Description of types of linkage error 65 4.2.1 Missed matches from missing linkage variables 65 4.2.2 Missed matches from inconsistent case ascertainment 66 4.2.3 False matches: Description of cases incorrectly matched 66 4.3 How linkage error impacts research findings 68 4.3.1 Results 68 4.3.2 Assessment of linkage bias 75 4.4 Discussion 78 4.4.1 Potential biases in the review process 79 4.4.2 Recommendations and implications for practice 79 5 Secondary analysis of linked data 83 Raymond Chambers and Gunky Kim 5.1 Introduction 83 5.2 Measurement error issues arising from linkage 84 5.2.1 Correct links, incorrect links and non]links 84 5.2.2 Characterising linkage errors 85 5.2.3 Characterising errors from non]linkage 86 5.3 Models for different types of linking errors 86 5.3.1 Linkage errors under binary linking 86 5.3.2 Linkage errors under multi]linking 88 5.3.3 Incomplete linking 88 5.3.4 Modelling the linkage error 89 5.4 Regression analysis using complete binary]linked data 90 5.4.1 Linear regression 91 5.4.2 Logistic regression 95 5.5 Regression analysis using incomplete binary]linked data 95 5.5.1 Linear regression using incomplete sample to register linked data 97 5.6 Regression analysis with multi]linked data 99 5.6.1 Uncorrelated multi]linking: Complete linkage 100 5.6.2 Uncorrelated multi]linking: Sample to register linkage 101 5.6.3 Correlated multi]linkage 105 5.6.4 Incorporating auxiliary population information 105 5.7 Conclusion and discussion 107 6 Record linkage: A missing data problem 109 Harvey Goldstein and Katie Harron 6.1 Introduction 109 6.2 Probabilistic Record Linkage (PRL) 111 6.3 Multiple Imputation (MI) 112 6.4 Prior-Informed Imputation (PII) 113 6.4.1 Estimating matching probabilities 115 6.5 Example 1: Linking electronic healthcare data to estimate trends in bloodstream infection 115 6.5.1 Methods 115 6.5.2 Results 117 6.5.3 Conclusions 118 6.6 Example 2: Simulated data including non]random linkage error 118 6.6.1 Methods 118 6.6.2 Results 119 6.7 Discussion 122 6.7.1 Non]random linkage error 122 6.7.2 Strengths and limitations: Handling linkage error 122 6.7.3 Implications for data linkers and data users 123 7 Using graph databases to manage linked data 125 James M. Farrow 7.1 Summary 125 7.2 Introduction 126 7.2.1 Flat approach 127 7.2.2 Oops, your legacy is showing 128 7.2.3 Shortcomings 128 7.3 Graph approach 131 7.3.1 Overview of graph concepts 131 7.3.2 Graph queries versus relational queries 133 7.3.3 Comparison of data in flat database versus graph database 136 7.3.4 Relaxing the notion of ‘truth’ 137 7.3.5 Not a linkage approach per se but a management approach which enables novel linkage approaches 138 7.3.6 Linkage engine independent 139 7.3.7 Separates out linkage from cluster identification phase (and clerical review) 139 7.4 Methodologies 139 7.4.1 Overview of storage and extraction approach 140 7.4.2 Overall management of data as collections 141 7.4.3 Data loading 142 7.4.4 Identification of equivalence sets and deterministic linkage 143 7.4.5 Probabilistic linkage 144 7.4.6 Clerical review 144 7.4.7 Determining cut]off thresholds 145 7.4.8 Final cluster extraction 147 7.4.9 Graph partitioning 147 7.4.10 Data management/curation 150 7.4.11 User interface challenges 150 7.4.12 Final cluster extraction 154 7.4.13 A typical end]to]end workflow 155 7.5 Algorithm implementation 156 7.5.1 Graph traversal 156 7.5.2 Cluster identification 157 7.5.3 Partitioning visitor 158 7.5.4 Encapsulating edge following policies 158 7.5.5 Graph partitioning 158 7.5.6 Insertion of review links 158 7.5.7 How to migrate while preserving current clusters 158 7.6 New approaches facilitated by graph storage approach 158 7.6.1 Multiple threshold extraction 160 7.6.2 Possibility of returning graph to end users 165 7.6.3 Optimised cluster analysis 166 7.6.4 Other link types 167 7.7 Conclusion 167 8 Large]scale linkage for total populations in official statistics 170 Owen Abbott, Peter Jones and Martin Ralphs 8.1 Introduction 170 8.2 Current practice in record linkage for population censuses 171 8.2.1 Introduction 171 8.2.2 Case study: the 2011 England and Wales Census assessment of coverage 172 8.3 Population]level linkage in countries that operate a population register: register]based censuses 178 8.3.1 Introduction 178 8.3.2 Case study 1: Finland 179 8.3.3 Case study 2: The Netherlands Virtual Census 180 8.3.4 Case study 3: Poland 180 8.3.5 Case study 4: Germany 181 8.3.6 Summary 181 8.4 New challenges in record linkage: the Beyond 2011 Programme 182 8.4.1 Introduction 182 8.4.2 Beyond 2011 linking methodology 183 8.4.3 The anonymisation process in Beyond 2011 184 8.4.4 Beyond 2011 linkage strategy using pseudonymised data 185 8.4.5 Linkage quality 195 8.4.6 Next steps 197 8.4.7 Conclusion 198 8.5 Summary 199 9 Privacy]preserving record linkage 201 Rainer Schnell 9.1 Introduction 201 9.2 Chapter outline 202 9.3 Linking with and without personal identification numbers 202 9.3.1 Linking using a trusted third party 203 9.3.2 Linking with encrypted PIDs 204 9.3.3 Linking with encrypted quasi]identifiers 204 9.3.4 PPRL in decentralised organisations 204 9.4 PPRL approaches 206 9.4.1 Phonetic codes 206 9.4.2 High]dimensional embeddings 206 9.4.3 Reference tables 207 9.4.4 Secure multiparty computations for PPRL 207 9.4.5 Bloom filter]based PPRL 207 9.5 PPRL for very large databases: blocking 209 9.5.1 Blocking for PPRL with Bloom filters 210 9.5.2 Blocking Bloom filters with MBT 211 9.5.3 Empirical comparison of blocking techniques for Bloom filters 211 9.5.4 Current recommendations for linking very large datasets with Bloom filters 213 9.6 Privacy considerations 213 9.6.1 Probability of attacks 214 9.6.2 Kind of attacks 215 9.6.3 Attacks on Bloom filters 215 9.7 Hardening Bloom filters 217 9.7.1 Randomly selected hash values 218 9.7.2 Random bits 218 9.7.3 Avoiding padding 220 9.7.4 Standardising the length of identifiers 220 9.7.5 Sampling bits for composite Bloom filters 221 9.7.6 Rehashing 221 9.7.7 Salting keys with record]specific data 223 9.7.8 Fake injections 223 9.7.9 Evaluation of Bloom filter hardening procedures 223 9.8 Future research 224 9.9 PPRL research and implementation with national databases 225 10 Summary 226 Katie Harron, Chris Dibben and Harvey Goldstein 10.1 Introduction 226 10.2 Part 1: Data linkage as it exists today 226 10.3 Part 2: Analysis of linked data 227 10.3.1 Quality of identifiers 227 10.3.2 Quality of linkage methods 228 10.3.3 Quality of evaluation 228 10.4 Part 3: Data linkage in practice: new developments 229 10.5 Concluding remarks 231 References 233 Index 253

About the Author :
Editors: Katie Harron, London School of Hygiene and Tropical Medicine, UK Harvey Goldstein, University of Bristol and University College London, UK Chris Dibben, University of Edinburgh, UK


Best Sellers


Product Details
  • ISBN-13: 9781118745878
  • Publisher: John Wiley & Sons Inc
  • Publisher Imprint: John Wiley & Sons Inc
  • Height: 252 mm
  • No of Pages: 288
  • Returnable: N
  • Spine Width: 20 mm
  • Width: 178 mm
  • ISBN-10: 1118745876
  • Publisher Date: 11 Dec 2015
  • Binding: Hardback
  • Language: English
  • Returnable: N
  • Series Title: Wiley Series in Probability and Statistics
  • Weight: 590 gr


Similar Products

Add Photo
Add Photo

Customer Reviews

REVIEWS      0     
Click Here To Be The First to Review this Product
Methodological Developments in Data Linkage: (Wiley Series in Probability and Statistics)
John Wiley & Sons Inc -
Methodological Developments in Data Linkage: (Wiley Series in Probability and Statistics)
Writing guidlines
We want to publish your review, so please:
  • keep your review on the product. Review's that defame author's character will be rejected.
  • Keep your review focused on the product.
  • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
  • Refrain from mentioning competitors or the specific price you paid for the product.
  • Do not include any personally identifiable information, such as full names.

Methodological Developments in Data Linkage: (Wiley Series in Probability and Statistics)

Required fields are marked with *

Review Title*
Review
    Add Photo Add up to 6 photos
    Would you recommend this product to a friend?
    Tag this Book Read more
    Does your review contain spoilers?
    What type of reader best describes you?
    I agree to the terms & conditions
    You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

    CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

    These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


    By submitting any content to Bookswagon, you guarantee that:
    • You are the sole author and owner of the intellectual property rights in the content;
    • All "moral rights" that you may have in such content have been voluntarily waived by you;
    • All content that you post is accurate;
    • You are at least 13 years old;
    • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
    You further agree that you may not submit any content:
    • That is known by you to be false, inaccurate or misleading;
    • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
    • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
    • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
    • For which you were compensated or granted any consideration by any unapproved third party;
    • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
    • That contains any computer viruses, worms or other potentially damaging computer programs or files.
    You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


    For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


    All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

    Accept

    Fresh on the Shelf


    Inspired by your browsing history


    Your review has been submitted!

    You've already reviewed this product!