9 Splitting data by asking questions: Decision trees
This chapter covers:
- What is a decision tree?
- Recommending apps using the demographic information of the users.
- Asking a series of successive questions to build a good classifier.
- Accuracy, Gini index, and Entropy, and their role in building decision trees.
- Examples of decision trees in fields such as biology and genetics.
- Coding the decision tree algorithm in Python.
- Separating points of different colors using a line.
In this chapter, I cover decision trees. Decision trees are very powerful models which help us classify data and make predictions. Not only that, they also give us a great deal of information about the data. Like the models in the previous chapters, decision trees are trained with labelled data, where the labels (or targets) that we want to predict are given by a set of classes. The classes can be yes/no, spam/ham, dog/cat/bird, or anything that we wish to predict based on the features. Decision trees are a very intuitive way to build classifiers, and one that really resembles human reasoning.
In this chapter you will learn how to build decision trees that fit your data. The algorithm that we use to build decision trees is quite intuitive, and in a nutshell, it works the following way: