adding my frequently used code chunks by veeveetran · Pull Request #3 · NCEAS/data-processing

veeveetran · 2018-03-16T23:01:11Z

This this .Rmd file contains some code chunks that I often refer to when cleaning data.

isteves · 2018-03-20T17:12:31Z

codeChunks.Rmd

+
+These are some code chunks that I frequently come back to when processing data for the Arctic Data Center. 
+
+#Reading in raw data


GitHub is finicky about spaces after # for the headers, so make sure to include them! RStudio will preview it just fine, but GitHub won't. (#Reading--> # Reading)

isteves · 2018-03-20T17:14:31Z

codeChunks.Rmd

+#Reading in raw data
+##Single data file
+```{r eval=FALSE}
+df <- read.table("path/to/data", 


Any reason you use read.table rather than read.csv or read_csv? I'm curious, but it might also be adding that those other options also exist.

isteves · 2018-03-20T17:16:50Z

codeChunks.Rmd

+##Single data file
+```{r eval=FALSE}
+df <- read.table("path/to/data", 
+               header=T, 


There should be spaces around the = sign. Doesn't affect the code at all, but it makes it more readable (especially once your code gets long/complicated). This is our go-to reference for style: http://style.tidyverse.org/

isteves · 2018-03-20T17:20:45Z

codeChunks.Rmd

+```{r eval=FALSE}
+dataList <- vector("list", length(rawPaths)) # makes an empty list with same length as file paths vector
+i=0 
+for(i in 1:length(rawPaths)){


It's generally better practice to use seq_along(rawPaths), rather than 1:length(x) (which I also do all the time). It allows the code to fail more gracefully. See the discussion here: https://stackoverflow.com/questions/24917228/proper-way-to-loop-over-the-length-of-a-dataframe-in-r

isteves · 2018-03-20T17:21:29Z

codeChunks.Rmd

+
+Read in data using a for loop. Remember to initialize all variables that you will be using outside of the for loop. 
+```{r eval=FALSE}
+dataList <- vector("list", length(rawPaths)) # makes an empty list with same length as file paths vector


Great job initializing a list! I always have to stop myself from growing vectors.

isteves · 2018-03-20T17:22:56Z

codeChunks.Rmd

+for(i in 1:length(rawPaths)){
+  dataList[[i]] <- read.table(rawPaths[i], 
+                               na.strings = c("", "NA"),
+                               header=T)  


It looks like the indentation is a little bit off here (though maybe it's GitHub, I'm not sure). A neat trick I learned from Bryce is to highlight your code and then use Cmd + I to fix the indentation!

isteves · 2018-03-20T17:25:04Z

codeChunks.Rmd

+                               header=T)  
+}
+```
+Note: list() creates an empty list of length 0. However, vector("list", length(rawPaths)) allocates a designated number of slots within the list instead of the list being constantly updated every time the for loop interates. With a small number of iterations, the time it takes for the code to run is not noticeable. However, for a large number of iterations, not allocating space will cause the code to run very slowly. 


Perhaps this reference (or something similar) is worth including in here: https://paulvanderlaken.com/2017/10/13/functional-programming-and-why-you-should-not-grow-vectors-in-r/

isteves · 2018-03-20T17:27:46Z

codeChunks.Rmd

+
+Iterate through all the rows in a data frame. 
+allRows is a vector containing "TRUE" and "FALSE". Each element corresponds to a row in dataFrame. 
+is.na(dataFrame[i,]) outputs "TRUE" if the row contains at least one blank cell, and "FALSE" otherwise. 


You can use ` to indicate code within sentences in Rmarkdown (like we do in slack)

isteves · 2018-03-20T17:29:52Z

codeChunks.Rmd

+
+#Searching Through Strings - Dates
+
+Use the grepl() function to search for a particular string. Since we often have to reformat dates in our data sets, searching for particular dates or times could be useful. 


Perhaps this would be a good place to introduce some helpful resources. I personally like this cheatsheet: https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf

isteves · 2018-03-20T17:31:14Z

codeChunks.Rmd

+
+Run unique() to see what kind of formats there are. 
+```{r}
+unique(dates)


Discovered the get_dupes function yesterday. Could be interesting to add! (or at least link to) https://cran.r-project.org/web/packages/janitor/vignettes/introduction.html

isteves · 2018-03-20T17:58:34Z

codeChunks.Rmd

+
+```{r}
+indDates1 <- which(grepl("/16",dates))
+dates[indDates1] <- format(as.POSIXct(dates[indDates1], tz = "", format="%m/%d/%y"), format = "%Y-%m-%d")


I like to use the lubridate package to work with dates. If you haven't tried it, I'd definitely recommend checking it out! https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html There are also some other date/time packages, but I'm not as familiar with them. tibbletime is another one seems promising.

adding my frequently used code chunks

b83cca7

isteves reviewed Mar 20, 2018

View reviewed changes

function that returns the system metadata of all versions of a PID

342bd82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding my frequently used code chunks#3

adding my frequently used code chunks#3
veeveetran wants to merge 2 commits intoNCEAS:masterfrom
veeveetran:master

veeveetran commented Mar 16, 2018

Uh oh!

isteves Mar 20, 2018 •

edited

Loading

Uh oh!

isteves Mar 20, 2018

Uh oh!

isteves Mar 20, 2018

Uh oh!

isteves Mar 20, 2018

Uh oh!

isteves Mar 20, 2018

Uh oh!

isteves Mar 20, 2018

Uh oh!

isteves Mar 20, 2018

Uh oh!

isteves Mar 20, 2018

Uh oh!

isteves Mar 20, 2018

Uh oh!

isteves Mar 20, 2018

Uh oh!

isteves Mar 20, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		These are some code chunks that I frequently come back to when processing data for the Arctic Data Center.

		#Reading in raw data


		#Searching Through Strings - Dates

		Use the grepl() function to search for a particular string. Since we often have to reformat dates in our data sets, searching for particular dates or times could be useful.

Conversation

veeveetran commented Mar 16, 2018

Uh oh!

isteves Mar 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

isteves Mar 20, 2018 •

edited

Loading