Unlock the potential of the world's biggest database.
This practical book shows you how to build portals, construct search engines and other knowledge-based applications to mine the information you need from the Web.
* Written by a developer for developers
* A practical, hands-on approach
* Illustrates how Java associated tools (XML, HTML) can be combined with database technology to display and manipulate Web-derived information more effectively.
* Demonstrates how to build a structure browser, portal, meta-search engine and how to make 'Talking Pages'
Table of Contents:
Preface xi
About the Author xix
Acknowledgements xxi
1 Surveying the Scene 1
2 Language of the Web 13
3 HTML and XML Parsing 33
4 Data Filters and Structured Queries 67
5 Building a Portal with Java 109
6 Building a Search Engine with Java 131
7 Mail Mining With Java 153
8 Introduction to Text Mining 177
9 Introduction to Data Mining 207
10 Loose Ends and Looking Ahead 231
Appendix A: Software Installation and Configuration 243
Appendix B: Javadoc Extracts 251
Appendix C: Earlier Versions of JAXP 271
Appendix D: License and Copyright Statements 275
Appendix E: Census 1891 Data XML 279
Appendix F: Share Price Cluster Data 287
Appendix G: Glossary of Acronyms 291
References 295
Further Reading 297
Index 299
About the Author :
Tony Loton, LOTONTech Ltd, Middlewich, UK
Tony Loton launched LOTONtech as a vehicle for researching and developing innovative software solutions. He developed the WebDataKit: a Java 2 solution comprising an API and a Structured Query Language designed specifically for the automatic extraction of HTML and XML from web sources. Tony's early Java web mining ideas have been featured previously as a case study contribution to "Professional Java Data programming" (Wrox Press). This book takes the ideas much further, with brand new material.
Review :
"When I got this book, I couldn't put it down. A lot of computer books sit on the shelf or send me to sleep, but not this one. Not only is it both topical and useful, but it hits a just-about-ideal balance between code and food for thought. The author has a real knack for useful solutions to complex problems." (www. Java Ranch 17 May 2002)