## Data Wrangling R Cheat Sheet

Reading data stored in a Google Sheet into R will probably be your most common use of googlesheets4. Here, we’ll read in the data from our example sheet, which contains data from Gapminder. To read in the data, we need a way to identify the Google Sheet. Googlesheets4 supports multiple ways of identifying sheets, but we recommend using the. 10.1 Scoped verbs vs. It can be easy to get confused between purrr and scoped verbs. The following diagram illustrates which to use for different combinations of inputs and outputs.

## Background

When teaching an intro class on Stata, we realized that there were no good reference materials on Stata. What started off as a “let’s make a quick cheat sheet for the basic functions” quickly evolved into a comprehensive set of 6 cheat sheets on the common data wrangling and analysis functions within Stata.

## Solution

After cataloguing the most common functions, we organized them into six basic functional areas: basic data processing, data manipulation, data visualization, visualization customization, basic analysis, and basic programming. Then came the tricky part: how are all these functions related? What’s the underlying logical and organizational framework? After sketching out these relationships, we created the layouts in Adobe Illustrator, heavily inspired by Rstudio’s amazing R cheat sheets.

#### Data Processing

• basic Stata syntax for all functions
• basic math and logic operations
• setting up working directories and log files
• importing data
• use
• import excel
• converting between data types
• exploring data files
• codebook
• summarize
• summarizing and collapsing data in tables
• tabulate
• collapse
• creating new variables
• generate
• egen

#### Data Transformation

• subsetting data
• drop
• keep
• replacing data
• rename
• replace
• recode
• using variable and value labels
• label define
• label list
• reshaping data (melting and casting)
• reshape
• merging and appending
• append
• merge
• fuzzy-matching
• string transformations
• saving and exporting data
• save
• export excel

### R Studio Data Wrangling Cheat Sheet

#### Data Visualization

• small multiples
• one variable visualizations
• histogram
• kdensity: smoothed histogram
• graph bar: bar plot
• graph dot: dot plot
• graph hbox: box and whiskers
• two variable visualizations
• tw scatter: scatter plot
• tw connected: line plot
• tw area: area plot
• two pcspike: parallel coordinates plot
• tw pccapsym: slope/bump chart
• three variable visualizations
• plotmatrix: heatmap
• plotting with summarization or fitting
• binscatter: plot summary value
• tw lfitci: linear fit
• tw lowess: lowess smoothing
• plotting regression results
• coefplot: regression coefficients
• marginsplot: marginal effects
• Changing marks
• symbology
• lines
• text
• Changing channels
• size
• color
• shape
• position
• Using themes
• Saving plots

#### Data Analysis

• declaring data as a special type
• time series
• survival analysis
• longitudinal/panel
• survey
• summarizing data, correlations, point estimates, etc.
• summarize
• pwcorr
• statistical tests
• t-tests, ANOVAs, proportions, distributions, etc.
• estimating models
• regress
• logit
• delaring interactions within model
• evaluating models
• postestimation calculations (use model for something)
• predict

### Data Wrangling R Cheat Sheet

• fundamental data types
• scalars
• matrices
• macros
• accessing stored results
• return: r-class objects
• e-return: e-class objects
• loops
• foreach
• forvalues
• additional programming resources: using github in Stata