![]() ![]() Lastly, we can print the decisions used for this observation’s prediction in plain English. We do not show it in this article, as the charts will not be scaled that nicely for devices with a narrower screen. Tip: We can also change the orientation of the plots from top-to-bottom to left-to-right by setting orientation=”LR”. In this case, only two features were used for predicting that the observation belongs to the versicolor class. At the very end, we see the values of all the features of this observation, with the ones used for decision highlighted in orange. It represents the indicated observation’s value of the given feature. Additionally, we can see the orange triangle at each of the histograms. ![]() The plot is very similar to the previous one, however, the orange highlight clearly shows which path the observation followed. Using the snippet below, we highlight the path of the first observation of the test set. This way, we clearly see which features contributed to the class prediction. If you are not a fan of the histograms and want to simplify the plot, you can specify fancy=False to receive the following simplified plot.Īnother handy feature of dtreeviz which improves the model’s interpretability is path highlighting of a particular observation on the plot. Note: We can also create a similar visualization for the test set, we just need to replace the x_data and y_data arguments while calling the function. In my opinion, the histogram provides more intuition about the split and the value of the coefficient might not be that relevant in case of a presentation for the stakeholders either way. One thing we do not see at this plot is the value of the Gini coefficient at each node. This way, we easily see which class is the majority one, so also the model’s prediction. The leaf nodes are represented as pie charts, which show what fraction of the observations within the leaf belongs to which class. The right branches of the tree indicate selecting the values larger or equal to the splitting value, while the left one lesser than the splitting value. In the first histogram, we can clearly see that all observations of the setosa class have petal length smaller than 2.45 cm. The small triangle with the value at the x-axis is the splitting point. This way, we can see how the classes are segregated by each split. At each node, we can see a stacked histogram of the feature that is used for splitting the observations, colored by class. First of all, let’s take a moment to acknowledge how big of an improvement it is, especially given that the function call is very similar. Visualize decision tree python without graphviz code#The code snippet is pretty much self-explanatory, so we can move on to the outcome. Having seen the old way of plotting the decision trees, let’s jump right into the dtreeviz approach. That is why we will skip it here, but you can find the implementation in the Notebook on GitHub. It is also possible to use the graphviz library for visualizing the decision trees, however, the outcome is very similar, with the same set of elements as the graph above. Much better! Now, we can quite easily interpret the decision tree. We can easily improve that by running the following snippet. But it is not very readable, for example, there are no feature names (only their column indices) or class labels. We start with the easiest approach - using the plot_tree function from scikit-learn. Now that we have a fitted decision tree model and we can proceed to visualize the tree. The only thing that we will “tune” is the maximum depth of the tree - we constraint it to 3, so the trees can still fit in the image and remain readable. Thus, we do not pay any attention to fitting the model or finding a good set of hyperparameters (there are a lot of articles on these topics). ![]() In this article, we focus purely on visualizing the decision trees. The next step involves creating the training/test sets and fitting the decision tree classifier to the Iris data set. We will also go over a regression example, but we will load the Boston housing data set for this later on. Then, we load the Iris data set from scikit-learn. In this article, I will first show the “old way” of plotting the decision trees and then introduce the improved approach using dtreeviz.Īs always, we need to start by importing the required libraries. However, there is a nice library called dtreeviz, which brings much more to the table and creates visualizations that are not only prettier but also convey more information about the decision process. Visualizing the decision trees can be really simple using a combination of scikit-learn and matplotlib. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |