Machine Learning for Social Scientists

class: center, middle, inverse, title-slide

# Machine Learning for Social Scientists
## Loss functions and decision trees
### Jorge Cimentada
### 2022-02-18

---

layout: true

background-position: 100% 0%, 100% 0%, 50% 100%
background-size: 10%, 10%, 10%

---

# What are loss functions?

* Social Scientists use metrics such as the `\(R^2\)`, `\(AIC\)`, `\(Log\text{ }likelihood\)` or `\(BIC\)`.

* We almost always use these metrics and their purpose is to inform some of our modeling choices.

* In machine learning, metrics such as the `\(R^2\)` and the `\(AIC\)` are called 'loss functions'

* There are two types of loss functions: continuous and binary

---

# Root Mean Square Error (RMSE)

Subtract the actual `\(Y_{i}\)` score of each respondent from the predicted `\(\hat{Y_{i}}\)` for each respondent:

<img src="03_loss_trees_files/figure-html/unnamed-chunk-3-1.png" width="70%" style="display: block; margin: auto;" />

`$$RMSE = \sqrt{\sum_{i = 1}^n{\frac{(\hat{y_{i}} - y_{i})^2}{N}}}$$`

---

# Mean Absolute Error (MAE)

* This approach doesn't penalize any values and just takes the absolute error of the predictions.

* Fundamentally simpler to interpret than the `\(RMSE\)` since it's just the average absolute error.

<img src="03_loss_trees_files/figure-html/unnamed-chunk-4-1.png" width="70%" style="display: block; margin: auto;" />

`$$MAE = \sum_{i = 1}^n{\frac{|\hat{y_{i}} - y_{i}|}{N}}$$`

---

# Confusion Matrices

* The city of Berlin is working on developing an 'early warning' system that is aimed at predicting whether a family is in need of childcare support.

* Families which received childcare support are flagged with a 1 and families which didn't received childcare support are flagged with a 0:

<img src="../../../img/base_df_lossfunction.svg" width="15%" style="display: block; margin: auto;" />

---

# Confusion Matrices

* Suppose we fit a logistic regression that returns a predicted probability for each family:

<img src="../../../img/df_lossfunction_prob.svg" width="35%" style="display: block; margin: auto;" />

---

# Confusion Matrices

* We could assign a 1 to every respondent who has a probability above `0.5` and a 0 to every respondent with a probability below `0.5`:

<img src="../../../img/df_lossfunction_class.svg" width="45%" style="display: block; margin: auto;" />

---

# Confusion Matrices

The accuracy is the sum of all correctly predicted rows divided by the total number of predictions:

<img src="../../../img/confusion_matrix_50_accuracy.svg" width="55%" style="display: block; margin: auto;" />

* Accuracy: `\((3 + 1) / (3 + 1 + 1 + 2) = 50\%\)`

---

# Confusion Matrices

* **Sensitivity** of a model is a fancy name for the **true positive rate**.

* Sensitivity measures those that were correctly predicted only for the `1`:

<img src="../../../img/confusion_matrix_50_sensitivity.svg" width="55%" style="display: block; margin: auto;" />

* Sensitivity: `\(3 / (3 + 1) = 75\%\)`

---

# Confusion Matrices

* The **specificity** of a model measures the true false rate.

* Specificity measures those that were correctly predicted only for the `0`:

<img src="../../../img/confusion_matrix_50_specificity.svg" width="55%" style="display: block; margin: auto;" />

* Specificity: `\(1 / (1 + 2) = 33\%\)`

---

# ROC Curves and Area Under the Curve

* The ROC curve is just another fancy name for something that is just a representation of sensitivity and specificity.

<br>
<br>
<br>

* In our previous example, we calculated the sensitivity and specificity assuming that the threshold for being 1 in the probability of each respondent is `0.5`.

<br>
<br>

> What if we tried different cutoff points?

---

# ROC Curves and Area Under the Curve

<div id="htmlwidget-eed7aff265bbbec827fa" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-eed7aff265bbbec827fa">{"x":{"filter":"none","vertical":false,"data":[["1","2","3"],[0.3,0.5,0.7],[0.74,0.87,0.96],[0.87,0.8,0.53]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>cutoff<\/th>\n      <th>sensitivity<\/th>\n      <th>specificity<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"paging":true,"pageLength":5,"bLengthChange":false,"columnDefs":[{"className":"dt-right","targets":[1,2,3]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>

* Assigning a 1 if the probability was above `0.3` is associated with a true positive rate (sensitivity) of `0.74`.

* Switching the cutoff to `0.7`, increases the true positive rate to `0.95`, quite an impressive benchmark.

* At the expense of increasing sensitivity, the true false rate decreases from `0.87` to `0.53`.

---

# ROC Curves and Area Under the Curve

* We want a cutoff that maximizes both the true positive rate and true false rate.

* Try all possible combinations:

<div id="htmlwidget-a1bfc39547a0f4afe7b8" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-a1bfc39547a0f4afe7b8">{"x":{"filter":"none","vertical":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99"],[0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1,0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18,0.19,0.2,0.21,0.22,0.23,0.24,0.25,0.26,0.27,0.28,0.29,0.3,0.31,0.32,0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.4,0.41,0.42,0.43,0.44,0.45,0.46,0.47,0.48,0.49,0.5,0.51,0.52,0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.6,0.61,0.62,0.63,0.64,0.65,0.66,0.67,0.68,0.69,0.7,0.71,0.72,0.73,0.74,0.75,0.76,0.77,0.78,0.79,0.8,0.81,0.82,0.83,0.84,0.85,0.86,0.87,0.88,0.89,0.9,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99],[0.26,0.35,0.39,0.39,0.39,0.39,0.39,0.39,0.43,0.52,0.57,0.57,0.61,0.61,0.61,0.61,0.61,0.61,0.65,0.65,0.65,0.65,0.7,0.7,0.74,0.74,0.74,0.74,0.74,0.74,0.74,0.74,0.74,0.74,0.74,0.74,0.74,0.74,0.74,0.78,0.78,0.78,0.83,0.83,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.96,0.96,0.96,0.96,0.96,0.96,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],[1,1,1,1,1,1,1,1,1,0.93,0.93,0.93,0.93,0.93,0.93,0.93,0.93,0.93,0.93,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.87,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.73,0.73,0.73,0.73,0.73,0.73,0.73,0.73,0.73,0.73,0.73,0.73,0.6,0.6,0.6,0.6,0.6,0.6,0.53,0.53,0.53,0.53,0.53,0.53,0.53,0.53,0.53,0.53,0.53,0.53,0.53,0.53,0.53,0.53,0.47,0.47,0.47,0.4,0.4,0.4,0.4,0.33,0.27,0.2,0.13,0.13,0.07,0.07]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>cutoff<\/th>\n      <th>sensitivity<\/th>\n      <th>specificity<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"paging":true,"pageLength":5,"bLengthChange":false,"columnDefs":[{"className":"dt-right","targets":[1,2,3]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>

---

# ROC Curves and Area Under the Curve

* This result contains the sensitivity and specificity for many different cutoff points. These results are most easy to understand by visualizing them.

* Cutoffs that improve the specificity does so at the expense of sensitivity.

<img src="03_loss_trees_files/figure-html/unnamed-chunk-13-1.png" width="90%" style="display: block; margin: auto;" />

---

# ROC Curves and Area Under the Curve

* Instead of visualizing the specificity as the true negative rate, let's subtract 1 such that as the `X` axis increases, it means that the error is increasing:

<img src="03_loss_trees_files/figure-html/unnamed-chunk-14-1.png" width="70%" height="70%" style="display: block; margin: auto;" />

* Ideal result:  most points cluster on the top left quadrant.

* Sensitivity is high (the true positive rate) and the specificity is high (because `\(1 - specificity\)` will switch the direction of the accuracy to the lower values of the `X` axis).

---

# ROC Curves and Area Under the Curve

* There is one thing we're missing: the actual cutoff points!
* Hover over the plot

<div id="htmlwidget-8a6648709ac4b21b78a3" style="width:50%;height:50%;" class="plotly html-widget"></div>
<script type="application/json" data-for="htmlwidget-8a6648709ac4b21b78a3">{"x":{"data":[{"x":[0,0,0,0,0,0,0,0,0,0,0,0,0,0.0666666666666667,0.0666666666666667,0.0666666666666667,0.0666666666666667,0.133333333333333,0.133333333333333,0.133333333333333,0.2,0.2,0.2,0.2,0.266666666666667,0.266666666666667,0.333333333333333,0.4,0.4,0.466666666666667,0.466666666666667,0.533333333333333,0.6,0.666666666666667,0.733333333333333,0.8,0.866666666666667,0.933333333333333,1,1],"y":[0,0.0434782608695652,0.0869565217391304,0.130434782608696,0.173913043478261,0.217391304347826,0.260869565217391,0.304347826086957,0.347826086956522,0.391304347826087,0.434782608695652,0.478260869565217,0.521739130434783,0.521739130434783,0.565217391304348,0.608695652173913,0.652173913043478,0.652173913043478,0.695652173913043,0.739130434782609,0.739130434782609,0.782608695652174,0.826086956521739,0.869565217391304,0.869565217391304,0.91304347826087,0.91304347826087,0.91304347826087,0.956521739130435,0.956521739130435,1,1,1,1,1,1,1,1,1,1],"text":["1 - specificity: 0.00000000<br />sensitivity: 0.00000000","1 - specificity: 0.00000000<br />sensitivity: 0.04347826","1 - specificity: 0.00000000<br />sensitivity: 0.08695652","1 - specificity: 0.00000000<br />sensitivity: 0.13043478","1 - specificity: 0.00000000<br />sensitivity: 0.17391304","1 - specificity: 0.00000000<br />sensitivity: 0.21739130","1 - specificity: 0.00000000<br />sensitivity: 0.26086957","1 - specificity: 0.00000000<br />sensitivity: 0.30434783","1 - specificity: 0.00000000<br />sensitivity: 0.34782609","1 - specificity: 0.00000000<br />sensitivity: 0.39130435","1 - specificity: 0.00000000<br />sensitivity: 0.43478261","1 - specificity: 0.00000000<br />sensitivity: 0.47826087","1 - specificity: 0.00000000<br />sensitivity: 0.52173913","1 - specificity: 0.06666667<br />sensitivity: 0.52173913","1 - specificity: 0.06666667<br />sensitivity: 0.56521739","1 - specificity: 0.06666667<br />sensitivity: 0.60869565","1 - specificity: 0.06666667<br />sensitivity: 0.65217391","1 - specificity: 0.13333333<br />sensitivity: 0.65217391","1 - specificity: 0.13333333<br />sensitivity: 0.69565217","1 - specificity: 0.13333333<br />sensitivity: 0.73913043","1 - specificity: 0.20000000<br />sensitivity: 0.73913043","1 - specificity: 0.20000000<br />sensitivity: 0.78260870","1 - specificity: 0.20000000<br />sensitivity: 0.82608696","1 - specificity: 0.20000000<br />sensitivity: 0.86956522","1 - specificity: 0.26666667<br />sensitivity: 0.86956522","1 - specificity: 0.26666667<br />sensitivity: 0.91304348","1 - specificity: 0.33333333<br />sensitivity: 0.91304348","1 - specificity: 0.40000000<br />sensitivity: 0.91304348","1 - specificity: 0.40000000<br />sensitivity: 0.95652174","1 - specificity: 0.46666667<br />sensitivity: 0.95652174","1 - specificity: 0.46666667<br />sensitivity: 1.00000000","1 - specificity: 0.53333333<br />sensitivity: 1.00000000","1 - specificity: 0.60000000<br />sensitivity: 1.00000000","1 - specificity: 0.66666667<br />sensitivity: 1.00000000","1 - specificity: 0.73333333<br />sensitivity: 1.00000000","1 - specificity: 0.80000000<br />sensitivity: 1.00000000","1 - specificity: 0.86666667<br />sensitivity: 1.00000000","1 - specificity: 0.93333333<br />sensitivity: 1.00000000","1 - specificity: 1.00000000<br />sensitivity: 1.00000000","1 - specificity: 1.00000000<br />sensitivity: 1.00000000"],"type":"scatter","mode":"lines","line":{"width":1.88976377952756,"color":"rgba(0,0,0,1)","dash":"solid"},"hoveron":"points","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[0,0,0,0,0,0,0,0,0,0,0,0,0,0.0666666666666667,0.0666666666666667,0.0666666666666667,0.0666666666666667,0.133333333333333,0.133333333333333,0.133333333333333,0.2,0.2,0.2,0.2,0.266666666666667,0.266666666666667,0.333333333333333,0.4,0.4,0.466666666666667,0.466666666666667,0.533333333333333,0.6,0.666666666666667,0.733333333333333,0.8,0.866666666666667,0.933333333333333,1,1],"y":[0,0.0434782608695652,0.0869565217391304,0.130434782608696,0.173913043478261,0.217391304347826,0.260869565217391,0.304347826086957,0.347826086956522,0.391304347826087,0.434782608695652,0.478260869565217,0.521739130434783,0.521739130434783,0.565217391304348,0.608695652173913,0.652173913043478,0.652173913043478,0.695652173913043,0.739130434782609,0.739130434782609,0.782608695652174,0.826086956521739,0.869565217391304,0.869565217391304,0.91304347826087,0.91304347826087,0.91304347826087,0.956521739130435,0.956521739130435,1,1,1,1,1,1,1,1,1,1],"text":["Cutoff: -Inf<br />1 - specificity: 0.00000000<br />sensitivity: 0.00000000","Cutoff: 0<br />1 - specificity: 0.00000000<br />sensitivity: 0.04347826","Cutoff: 0<br />1 - specificity: 0.00000000<br />sensitivity: 0.08695652","Cutoff: 0<br />1 - specificity: 0.00000000<br />sensitivity: 0.13043478","Cutoff: 0<br />1 - specificity: 0.00000000<br />sensitivity: 0.17391304","Cutoff: 0<br />1 - specificity: 0.00000000<br />sensitivity: 0.21739130","Cutoff: 0<br />1 - specificity: 0.00000000<br />sensitivity: 0.26086957","Cutoff: 0.02<br />1 - specificity: 0.00000000<br />sensitivity: 0.30434783","Cutoff: 0.02<br />1 - specificity: 0.00000000<br />sensitivity: 0.34782609","Cutoff: 0.03<br />1 - specificity: 0.00000000<br />sensitivity: 0.39130435","Cutoff: 0.09<br />1 - specificity: 0.00000000<br />sensitivity: 0.43478261","Cutoff: 0.09<br />1 - specificity: 0.00000000<br />sensitivity: 0.47826087","Cutoff: 0.1<br />1 - specificity: 0.00000000<br />sensitivity: 0.52173913","Cutoff: 0.1<br />1 - specificity: 0.06666667<br />sensitivity: 0.52173913","Cutoff: 0.11<br />1 - specificity: 0.06666667<br />sensitivity: 0.56521739","Cutoff: 0.13<br />1 - specificity: 0.06666667<br />sensitivity: 0.60869565","Cutoff: 0.18<br />1 - specificity: 0.06666667<br />sensitivity: 0.65217391","Cutoff: 0.19<br />1 - specificity: 0.13333333<br />sensitivity: 0.65217391","Cutoff: 0.23<br />1 - specificity: 0.13333333<br />sensitivity: 0.69565217","Cutoff: 0.24<br />1 - specificity: 0.13333333<br />sensitivity: 0.73913043","Cutoff: 0.36<br />1 - specificity: 0.20000000<br />sensitivity: 0.73913043","Cutoff: 0.4<br />1 - specificity: 0.20000000<br />sensitivity: 0.78260870","Cutoff: 0.43<br />1 - specificity: 0.20000000<br />sensitivity: 0.82608696","Cutoff: 0.44<br />1 - specificity: 0.20000000<br />sensitivity: 0.86956522","Cutoff: 0.51<br />1 - specificity: 0.26666667<br />sensitivity: 0.86956522","Cutoff: 0.54<br />1 - specificity: 0.26666667<br />sensitivity: 0.91304348","Cutoff: 0.63<br />1 - specificity: 0.33333333<br />sensitivity: 0.91304348","Cutoff: 0.64<br />1 - specificity: 0.40000000<br />sensitivity: 0.91304348","Cutoff: 0.64<br />1 - specificity: 0.40000000<br />sensitivity: 0.95652174","Cutoff: 0.7<br />1 - specificity: 0.46666667<br />sensitivity: 0.95652174","Cutoff: 0.71<br />1 - specificity: 0.46666667<br />sensitivity: 1.00000000","Cutoff: 0.86<br />1 - specificity: 0.53333333<br />sensitivity: 1.00000000","Cutoff: 0.88<br />1 - specificity: 0.60000000<br />sensitivity: 1.00000000","Cutoff: 0.93<br />1 - specificity: 0.66666667<br />sensitivity: 1.00000000","Cutoff: 0.93<br />1 - specificity: 0.73333333<br />sensitivity: 1.00000000","Cutoff: 0.95<br />1 - specificity: 0.80000000<br />sensitivity: 1.00000000","Cutoff: 0.95<br />1 - specificity: 0.86666667<br />sensitivity: 1.00000000","Cutoff: 0.97<br />1 - specificity: 0.93333333<br />sensitivity: 1.00000000","Cutoff: 1<br />1 - specificity: 1.00000000<br />sensitivity: 1.00000000","Cutoff: Inf<br />1 - specificity: 1.00000000<br />sensitivity: 1.00000000"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,0,0,1)","opacity":0,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(0,0,0,1)"}},"hoveron":"points","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[-0.05,1.05],"y":[-0.05,1.05],"text":"intercept: 0<br />slope: 1","type":"scatter","mode":"lines","line":{"width":1.88976377952756,"color":"rgba(0,0,0,1)","dash":"dash"},"hoveron":"points","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null}],"layout":{"margin":{"t":35.1780821917808,"r":7.30593607305936,"b":49.1324200913242,"l":48.9497716894977},"font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187},"xaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-0.05,1.05],"tickmode":"array","ticktext":["0.00","0.25","0.50","0.75","1.00"],"tickvals":[0,0.25,0.5,0.75,1],"categoryorder":"array","categoryarray":["0.00","0.25","0.50","0.75","1.00"],"nticks":null,"ticks":"","tickcolor":null,"ticklen":3.65296803652968,"tickwidth":0,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":11.689497716895},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(235,235,235,1)","gridwidth":0.66417600664176,"zeroline":false,"anchor":"y","title":{"text":"1 - specificity","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187}},"hoverformat":".2f"},"yaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-0.05,1.05],"tickmode":"array","ticktext":["0.00","0.25","0.50","0.75","1.00"],"tickvals":[0,0.25,0.5,0.75,1],"categoryorder":"array","categoryarray":["0.00","0.25","0.50","0.75","1.00"],"nticks":null,"ticks":"","tickcolor":null,"ticklen":3.65296803652968,"tickwidth":0,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":11.689497716895},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(235,235,235,1)","gridwidth":0.66417600664176,"zeroline":false,"anchor":"x","title":{"text":"sensitivity","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187}},"hoverformat":".2f"},"shapes":[{"type":"rect","fillcolor":null,"line":{"color":null,"width":0,"linetype":[]},"yref":"paper","xref":"paper","x0":0,"x1":1,"y0":0,"y1":1}],"showlegend":false,"legend":{"bgcolor":null,"bordercolor":null,"borderwidth":0,"font":{"color":"rgba(0,0,0,1)","family":"","size":11.689497716895}},"hovermode":"closest","barmode":"relative"},"config":{"doubleClick":"reset","showSendToCloud":false},"source":"A","attrs":{"1d4b04be052b2":{"x":{},"y":{},"type":"scatter"},"1d4b0274770e6":{"text":{},"x":{},"y":{}},"1d4b06d5f1b66":{"intercept":{},"slope":{}}},"cur_data":"1d4b04be052b2","visdat":{"1d4b04be052b2":["function (y) ","x"],"1d4b0274770e6":["function (y) ","x"],"1d4b06d5f1b66":["function (y) ","x"]},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}</script>

---

# ROC Curves and Area Under the Curve

* The last loss function we'll discuss is a very small extension of the ROC curve: the **A**rea **U**nder the **C**urve or `\(AUC\)`.

* `\(AUC\)` is the percentage of the plot that is under the curve. For example:

<img src="03_loss_trees_files/figure-html/unnamed-chunk-16-1.png" width="70%" height="70%" style="display: block; margin: auto;" />

* The more points are located in the top left quadrant, the higher the overall accuracy of our model

* 90% of the space of the plot is under the curve.

---

# Precision and recall

<img src="../../../img/precision_recall.png" width="55%" style="display: block; margin: auto;" />

---

# Precision and recall

<img src="../../../img/precision_recall_plot.png" width="55%" style="display: block; margin: auto;" />

---

# Precision and recall

- When to use ROC curves and precision-recall:

<br>

1. ROC curves should be used when there are roughly equal numbers of observations for each class

<br>

2. Precision-Recall curves should be used when there is a moderate to large class imbalance

---

<br>
<br>
<br>

.center[
# Decision trees
]

---

# Decision trees

* Decision trees are tree-like diagrams.
* They work by defining `yes-or-no` rules based on the data and assign the most common value for each respondent within their final branch.

<img src="03_loss_trees_files/figure-html/unnamed-chunk-19-1.png" width="70%" height="70%" style="display: block; margin: auto;" />

---

# Decision trees

<img src="03_loss_trees_files/figure-html/unnamed-chunk-20-1.png" width="70%" height="70%" style="display: block; margin: auto;" />

---

# Decision trees

<img src="03_loss_trees_files/figure-html/unnamed-chunk-21-1.png" width="70%" height="70%" style="display: block; margin: auto;" />

---
# Decision trees

<img src="03_loss_trees_files/figure-html/unnamed-chunk-22-1.png" width="70%" height="70%" style="display: block; margin: auto;" />

---
# Decision trees

<img src="03_loss_trees_files/figure-html/unnamed-chunk-23-1.png" width="70%" height="70%" style="display: block; margin: auto;" />

---
# Decision trees

<img src="03_loss_trees_files/figure-html/unnamed-chunk-24-1.png" width="70%" height="70%" style="display: block; margin: auto;" />

---

# How do decision trees work

<img src="../../../img/decision_trees_adv1.png" width="55%" style="display: block; margin: auto;" />

---

# How do decision trees work

<img src="../../../img/decision_trees_adv2.png" width="55%" style="display: block; margin: auto;" />

---

# How do decision trees work

<img src="../../../img/decision_trees_adv3.png" width="55%" style="display: block; margin: auto;" />

---

# How do decision trees work

<img src="../../../img/decision_trees_adv4.png" width="55%" style="display: block; margin: auto;" />

---

# How do decision trees work

<img src="../../../img/decision_trees_adv5.png" width="55%" style="display: block; margin: auto;" />

---

# Bad things about Decision trees

* They overfit a lot

<img src="03_loss_trees_files/figure-html/unnamed-chunk-31-1.png" width="70%" height="70%" style="display: block; margin: auto;" />

---
# Bad things about Decision trees

How can you address this?

* Not straight forward
* `min_n` and `tree_depth` are sometimes useful
* You need to tune these

<img src="03_loss_trees_files/figure-html/unnamed-chunk-32-1.png" width="70%" height="70%" style="display: block; margin: auto;" />

---
# Tuning decision trees

* Model tuning can help select the best `min_n` and `tree_depth`

```
# A tibble: 5 x 7
  tree_depth min_n .metric .estimator  mean     n std_err
       <dbl> <dbl> <chr>   <chr>      <dbl> <int>   <dbl>
1          9    50 rmse    standard   0.459     5  0.0126
2          9   100 rmse    standard   0.459     5  0.0126
3          3    50 rmse    standard   0.518     5  0.0116
4          3   100 rmse    standard   0.518     5  0.0116
5          1    50 rmse    standard   0.649     5  0.0102
```

---
# Tuning decision trees

<img src="03_loss_trees_files/figure-html/unnamed-chunk-34-1.png" width="95%" height="950%" style="display: block; margin: auto;" />

---
# Best tuned decision tree

* As usual, once we have out model, we predict on our test set and compare:

.center[

```
 Testing error Training error 
     0.4644939      0.4512248 
```
]

---

.center[
# Break
]