Mining of Massive Datasets
      The book has now been published by Cambridge University Press. The publisher is offering a 20% discount to anyone who buys the hardcopy Here. By agreement with the publisher, you can still download it free from this page. Cambridge Press does, however, retain copyright on the work, and we expect that you will obtain their permission and acknowledge our authorship if you republish parts or all of it. We are sorry to have to mention this point, but we have evidence that other items we have published on the Web have been appropriated and republished under other names. It is easy to detect such misuse, by the way, as you will learn in Chapter 3.
      --- Anand Rajaraman (@anand_raj) and Jeff Ullman

      Download Version 1.0

      The following materials are equivalent to the published book, with errata corrected to July 4, 2012. It has been frozen as we revise the book. The evolving book can be downloaded as "Version 1.1" below.

      Download the Complete Book (340 pages, approximately 2MB)

      Download chapters of the book:

      Preface and Table of Contents
      Chapter 1 Data Mining
      Chapter 2 Large-Scale File Systems and Map-Reduce
      Chapter 3 Finding Similar Items
      Chapter 4 Mining Data Streams
      Chapter 5 Link Analysis
      Chapter 6 Frequent Itemsets
      Chapter 7 Clustering
      Chapter 8 Advertising on the Web
      Chapter 9 Recommendation Systems
      Index

      Download Version 1.1

      Below is a draft, evolving version of the MMDS book. We have added Jure Leskovec as a coauthor, and at this point added only one new chapter, on mining large graphs. However, we will be making available new chapters on large-scale machine-learning algorithms and dimensionality reduction, as well as expanding Chapter 2 on map-reduce algorithm design.

      Download the Complete Book (395 pages, approximately 2.4MB)

      Download chapters of the book:

      Preface and Table of Contents
      Chapter 1 Data Mining
      Chapter 2 Large-Scale File Systems and Map-Reduce
      Chapter 3 Finding Similar Items
      Chapter 4 Mining Data Streams
      Chapter 5 Link Analysis
      Chapter 6 Frequent Itemsets
      Chapter 7 Clustering
      Chapter 8 Advertising on the Web
      Chapter 9 Recommendation Systems
      Chapter 10 Mining Social-Network Graphs
      Index

      Gradiance Support

      If you are an instructor interested in using the Gradiance Automated Homework System with this book, start by creating an account for yourself at www.gradiance.com/services. Then, email your chosen login and the request to become an instructor for the MMDS book to [email protected] You will then be able to create a class using these materials. Manuals explaining the use of the system are at www.gradiance.com/info.html.

      Students who want to use the Gradiance system for self-study can register at www.gradiance.com/services. Then, use the class token 1EDD8A1D to join the "omnibus class" for the MMDS book. See The Student Guide for more information.

      Other Stuff

      Free book from http://infolab.stanford.edu/~ullman/mmds.html