Practical Machine Learning 
Page 2 of 3
Chapter 4 Machine Learning Tools, Libraries, and Frameworks This chapter opens with an outline of the current landscape for Machine Learning tools. This is followed by an overview of 5 of the more common tools, namely: Apache Mahout, R, Julia, Python and Apache Spark. For each tool, details are provided on how to install and configure it, how it integrates with Hadoop, its basic syntax, example usage, and its specific advantages. This chapter provides a useful overview of the current state of Machine Learning tools. Helpful instructions are provided to get you upandrunning with each tool. I do wonder why we are looking at five different tools, there is not enough detail provided to become proficient with any of these tools. Chapter 5 Decision Tree based learning The first four chapters have provided general background information. This chapter, through to chapter 10, look at implementing the specific Machine Learning algorithms discussed previously (e.g. classification), using the previously discussed tools (e.g. Julia). The chapter opens with an overview of decision trees (definition, terminology, purpose etc). Next, a simple decision tree is built, and its limitations are noted, measures of uncertainty are described, and means of pruning the trees to reduce overfitting are discussed. Various decision tree algorithms are discussed (e.g. CART, C5.0) with the aid of helpful diagrams. Some specialized trees are briefly described (e.g. Random forest), again with helpful diagrams. The chapter ends rather abruptly, with a page about implementing decision trees, which contains links to decision tree example code in each of the 5 Machine Learning tools given in chapter 4 (i.e. Mahout, R, Spark, Python, Julia). This chapter provides a useful overview of what decision trees are, its terminology, advantages, problems/solutions, and types. It would have been much more useful to have provided a stepbystep walkthrough of at least some of the code examples, rather than provide just a link. Chapter 6 Instance and Kernel Methods Based Learning This chapter opens with a look at Instancebased Learning, this stores training data which is then used subsequently for prediction. Both lazy and eager learning are described. There’s a brief look at some algorithms (e.g. Nearest Neighbor, Radial basic functions), before looking at a realworld use case solved using the KNN (kNearest Neighbor) algorithm. The chapter next looks at Kernel methodsbased learning algorithms, these take two input and return details of their similarity. There is a brief look at various algorithms (e.g. Support Vector Machines [SVM]), before looking at a realworld use case solved using the SVM algorithm. I’m not sure why the two algorithms were included in the same chapter. In both detailed use cases, a stepbystep code walkthrough would have been useful instead of a link to code. Although the chapter contains plenty of math formulae, it is not discussed in any detail. Chapter 7 Association Rules based learning Association rules based learning is concerned with discovering associates that can be used for classification, and subsequent prediction. The chapter opens by briefly defining association rule, and then looks at the Apriori algorithm, illustrated with a stepbystep example, before highlighting the disadvantages of the Apriori algorithm. Next, the more efficient FPgrowth algorithm is discussed, again a stepbystep example is described. The chapter ends with links to code that implement Apriori and FPgrowth algorithms using each of the 5 tools.
Chapter 8 Clustering based learning Clustering based learning is used to identify related groups (clusters) of data. The chapter opens with a look at the different types of clustering (e.g. Hierarchical, Partitional), before looking at the kmeans clustering algorithm in detail – discussing its advantages and disadvantages. The importance of choosing the right number of clusters is discussed. The chapter ends with links to code that implement the kmeans clustering algorithm using each of the 5 tools. Chapter 9 Bayesian learning Bayesian learning relates to the probability of data belonging to a given group. The chapter opens with a look at what Baysian learning is, and a short statistics overview is provided. Bayes’ theorem is then discussed, before providing a deeper look at the Naive Bayes algorithm and some of its variations (e.g. Bernoulli classifiers). The chapter ends with links to code that implement the Naive Bayes classifier algorithm using each of the 5 tools. Chapter 10 Regression based learning Regression based learning aims to discover the relationship between two or more variables. The chapter opens with a look at what regression analysis involves, with a further look into statistics (e.g. variance, covariance, correlation). Next, various regression methods are discussed (e.g. multiple, Poisson). The chapter ends with links to code that implements linear regression algorithm using each of the 5 tools. Generally, the various formulae are briefly explained, and some examples provided, however a math background would make the chapter easier to understand.


Last Updated ( Saturday, 28 November 2020 ) 