Grokking Machine Learning cover
welcome to this free extract from
an online version of the Manning book.
to read more
or

9 Splitting data by asking questions: Decision trees

 

This chapter covers:

  • What is a decision tree?
  • Recommending apps using the demographic information of the users.
  • Asking a series of successive questions to build a good classifier.
  • Accuracy, Gini index, and Entropy, and their role in building decision trees.
  • Examples of decision trees in fields such as biology and genetics.
  • Coding the decision tree algorithm in Python.
  • Separating points of different colors using a line.

In this chapter, I cover decision trees. Decision trees are very powerful models which help us classify data and make predictions. Not only that, they also give us a great deal of information about the data. Like the models in the previous chapters, decision trees are trained with labelled data, where the labels (or targets) that we want to predict are given by a set of classes. The classes can be yes/no, spam/ham, dog/cat/bird, or anything that we wish to predict based on the features. Decision trees are a very intuitive way to build classifiers, and one that really resembles human reasoning.

In this chapter you will learn how to build decision trees that fit your data. The algorithm that we use to build decision trees is quite intuitive, and in a nutshell, it works the following way:

9.1   The problem: We need to recommend apps to users according to what they are likely to download

9.2   The solution: Building an app recommendation system

9.2.1   The remember-formulate-predict  framework

9.2.2   First step to build the model: Asking the best question

9.2.3   Next and final step: Iterate by asking the best question every time

9.2.4   Using the model by making predictions

9.3   Building the tree: How to pick the right feature to split

9.3.1   How to pick the best feature to split our data: Accuracy

9.3.2   How to pick the best feature to split our data: Gini impurity

9.4   Back to recommending apps: Building our decision tree using Gini index

9.5   When do we stop building the tree? Hyperparameters

9.6   Beyond questions like yes/no

9.6.1   Features with more categories, such as Dog/Cat/Bird

9.6.2   Continuous features, such as a number

9.7   Coding a decision tree with sklearn

9.8   A slightly larger example: Spam detection again!

9.9   Decision trees for regression

sitemap