Corr analysis with ggplot2

Daniel Tafmizi

Dr. Friedman

October 17, 2024

Lis 4317

Module 8

Github: daniel.R/Work.R/LIS4370Rprog/mod8.R at main · DanielDataGit/daniel.R (github.com)

I attempted to recreate a visualization seen in Few's book on pg. 277. The goal of this graph was to break down many elements of the large dataset into a visually approachable manner, while also showing correlation data. I initially ran into some trouble because I tried using facet_wrap, which is only for one discrete variable. After viewing some documentation (links in the github), I realized I need to use facet grid since I had two discrete variables (auto and manual). Using stat_cor implemented the correlation coefficient and the p value. I used theme elements to make the graph prettier.

I agree with Few's recommendations. Correlation analysis on a large dataset can get very confusing very quickly. It is important to break down the data set to make it more approachable. I think I accomplished this in the above graph that displays most of the continuous data seen in the mtcars dataset. Furthermore, correlation analysis is an incredible tool for understanding how certain variables interact with each other and gets us close to causality. For example, the graph above recommends to us that for the tested cars, generally hp and disp have a positive relationship to weight, while mpg has a negative relationship to weight. Another interesting finding is seen in the clustering. The manual cars scale across the graph evenly, but the automatics are clustered around 3.5 and 5.5.

Search This Blog

DanielTLis4317

Corr analysis with ggplot2

Comments

Post a Comment