Data Wrangling Cheat Sheet

The ability to effectively perform data wrangling, along with the availability of many statistical models, have made R a very popular choice for data munging, as well as data science. For this tutorial and our example concerning cars, we’ve chosen Python. However, for you R fans, we’ve dedicated a portion of our Data Wrangling Cheat Sheet to useful data wrangling libraries and functions in the R programming. Run this notebook with Data-Forge Notebook JavaScript data wrangling cheat sheet Snippets of JS code that are good for working with data. From the book Data Wrangling with JavaScript LOGGING Logging is your best friend. Use console.log to display, inspect and check your data. Console.log('Your logging here'); // General text logging for debugging. The term “data wrangler” has started to gain popularity in the pop culture. In the 2017 movie Kong: Skull Island, Marc Evan Jackson as a character is introduced as “Steve Woodward, our data wrangler.” Data Wrangling Cheat Sheet.

Background

When teaching an intro class on Stata, we realized that there were no good reference materials on Stata. What started off as a “let’s make a quick cheat sheet for the basic functions” quickly evolved into a comprehensive set of 6 cheat sheets on the common data wrangling and analysis functions within Stata.


Solution

Data Wrangling Cheat Sheet

After cataloguing the most common functions, we organized them into six basic functional areas: basic data processing, data manipulation, data visualization, visualization customization, basic analysis, and basic programming. Then came the tricky part: how are all these functions related? What’s the underlying logical and organizational framework? After sketching out these relationships, we created the layouts in Adobe Illustrator, heavily inspired by Rstudio’s amazing R cheat sheets.


Arper kinesit chair cushions.

Data Processing

  • basic Stata syntax for all functions
  • basic math and logic operations
  • setting up working directories and log files
  • importing data
    • use
    • import excel
  • converting between data types
  • exploring data files
    • codebook
    • summarize
  • summarizing and collapsing data in tables
    • tabulate
    • collapse
  • creating new variables
    • generate
    • egen


Sheet

Data Transformation

  • subsetting data
    • drop
    • keep
  • replacing data
    • rename
    • replace
    • recode
  • using variable and value labels
    • label define
    • label list
  • reshaping data (melting and casting)
    • reshape
  • merging and appending
    • append
    • merge
    • fuzzy-matching
  • string transformations
  • saving and exporting data
    • save
    • export excel


Data Wrangling R Cheat Sheet

Data Visualization

  • small multiples
  • one variable visualizations
    • histogram
    • kdensity: smoothed histogram
    • graph bar: bar plot
    • graph dot: dot plot
    • graph hbox: box and whiskers
  • two variable visualizations
    • tw scatter: scatter plot
    • tw connected: line plot
    • tw area: area plot
    • two pcspike: parallel coordinates plot
    • tw pccapsym: slope/bump chart
  • three variable visualizations
    • plotmatrix: heatmap
  • plotting with summarization or fitting
    • binscatter: plot summary value
    • tw lfitci: linear fit
    • tw lowess: lowess smoothing
  • plotting regression results
    • coefplot: regression coefficients
    • marginsplot: marginal effects
  • Changing marks
    • symbology
    • lines
    • text
  • Changing channels
    • size
    • color
    • shape
    • position
  • Using themes
  • Saving plots


Data Analysis

  • declaring data as a special type
    • time series
    • survival analysis
    • longitudinal/panel
    • survey
  • summarizing data, correlations, point estimates, etc.
    • summarize
    • pwcorr
  • statistical tests
    • t-tests, ANOVAs, proportions, distributions, etc.
  • estimating models
    • regress
    • logit
    • delaring interactions within model
  • evaluating models
  • postestimation calculations (use model for something)
    • predict


Programming

Data Wrangling Cheat Sheet

  • fundamental data types
    • scalars
    • matrices
    • macros
  • accessing stored results
    • return: r-class objects
    • e-return: e-class objects
  • loops
    • foreach
    • forvalues
  • additional programming resources: using github in Stata