Sadukie’s SensoRs AdventuRes

The Project

My husband has various temperature and humidity sensors scattered throughout the house, recording data points to a MySQL server. The data is stored on a table that looks like this:

id
<int>
date
<chr>
sensorname
<chr>
sensorvalue
<dbl>
1 31 2016-12-18 22:20:23 temp5 63.6116
2 32 2016-12-18 22:20:23 finalDHTTempF2 68.0000
3 33 2016-12-18 22:20:23 humidity2 36.0000
4 34 2016-12-18 22:25:23 temp5 64.1750
5 35 2016-12-18 22:25:23 finalDHTTempF2 68.0000
6 36 2016-12-18 22:25:23 humidity2 36.0000
7 37 2016-12-18 22:30:23 temp5 63.7250
8 38 2016-12-18 22:30:23 finalDHTTempF2 69.8000
9 39 2016-12-18 22:30:23 humidity2 35.0000
10 40 2016-12-18 22:35:23 temp5 63.3866

I wanted to use his dataset to test my adventures in applying R.

Our current dataset data is a data frame with 198164 rows.

The Problem

Looking at this data, the first thing I thought was untidy. There has to be a better way. When I think of tidy data, I think of the tidyr package, which is used to help make data tidy, easier to work with. Specifically, I thought of the spread() function, where I could break things up. Once data was spread into appropriate columns, I figure I can operate on the data a bit better.

The Adventures so far…

As seen in the date field, the values are logged with their times. This is why we have so many data points. The first thing I wanted to do was group the values into daily means.

Cleaning up Dates

I am using lubridate to make some of my date management a bit easier. I am using dplyr to do the chaining with %>%. I grouped my data by sensor then by date parts – year, month, and day. After grouping the data, I summarized the data to get daily means. Once the data was summarized, I spread it out to make it more meaningful:

year(date)
<dbl>
month(date)
<dbl>
day(date)
<int>
finalDHTTempF1
<dbl>
finalDHTTempF2
<dbl>
finalDHTTempF3
<dbl>
humidity1
<dbl>
1 2016 12 18 NA 68.34286 NA NA
2 2016 12 19 NA 67.77578 NA NA
3 2016 12 20 NA 67.88750 NA NA
4 2016 12 21 NA 68.95625 NA NA
5 2016 12 22 NA 69.74375 NA NA
6 2016 12 23 NA 69.71875 NA NA
7 2016 12 24 NA 70.97500 NA NA
8 2016 12 25 NA 70.85625 NA NA
9 2016 12 26 NA 71.78750 NA NA
10 2016 12 27 NA 71.08750 NA NA
finalDHTTempF1
<dbl>
finalDHTTempF2
<dbl>
finalDHTTempF3
<dbl>
humidity1
<dbl>
humidity2
<dbl>
humidity3
<dbl>
temp4
<dbl>
temp5
<dbl>
NA 68.34286 NA NA 35.80952 NA NA 63.08703
NA 67.77578 NA NA 35.55709 NA NA 62.37841
NA 67.88750 NA NA 35.50347 NA NA 62.41281
NA 68.95625 NA NA 35.46528 NA NA 63.40109
NA 69.74375 NA NA 35.24306 NA NA 64.36713
NA 69.71875 NA NA 35.25000 NA NA 64.33000

Cleaning up NAs

Now some of the data shows NA. If there’s anything I’ve learned with data, NULL and NA can be problematic, depending on the data tool and the user operating said tool. In this case, I can easily convert my NA values to 0 without ruining the data meaning:

finalDHTTempF1
<dbl>
finalDHTTempF2
<dbl>
finalDHTTempF3
<dbl>
humidity1
<dbl>
humidity2
<dbl>
humidity3
<dbl>
temp4
<dbl>
temp5
<dbl>
0 68.34286 0 0 35.80952 0 0 63.08703
0 67.77578 0 0 35.55709 0 0 62.37841
0 67.88750 0 0 35.50347 0 0 62.41281
0 68.95625 0 0 35.46528 0 0 63.40109
0 69.74375 0 0 35.24306 0 0 64.36713
0 69.71875 0 0 35.25000 0 0 64.33000

Presentation

So now that I have daily averages in a format that I can work with, let’s do something meaningful with the data – let’s plot it! I am using ggplot2 for plotting.

Conclusion

So far, I’m having fun putting my skills to work, especially with this dataset at. I’m at the tail end of the 2nd course of an R specialization on Coursera. Between CodeMash and Coursera, I’ve been enjoying my exploRation into R. Here’s to many adventures ahead!

Why does RTVS open Notepad?!?

For the past few weeks, I’ve been going through the Mastering Software Development in R specialization on Coursera.  After Matthew Renze mentioned R Tools for Visual Studio (RTVS) during his workshop at CodeMash, I had to see what this was about.

As I have been going through my courses – which use swirl() – I have been looking at how things work, comparing RStudio to RTVS.  One of the things that was maddening for me was going through one of the courses in RTVS and having R files open in Notepad.  Notepad?!?  RStudio wasn’t doing this, so I was even more frustrated.  I could also open R files with Visual Studio right from the file system, so the file association was already in place.  This didn’t make sense.  However… RTVS is an open source project, as is swirl().  So I spent tonight looking at code in GitHub.

After poking around swirl(), I found something that led me to try the following command:

getOption("editor")
[1] "notepad"

Wait… how?! Why?!  Poking around some more, I realized that R has its own profile file – similar concepts to the PowerShell profile file and the bash profile file.  I found this post on Customizing Startup (Quick-R) leading me down the right direction.  With a bit of trial and error and finding this closed issue in the RTVS repo, I moved my .Rprofile file to Documents, and RTVS was happier.

Before changing the editor, I wanted to make sure that I could call the editor – so that when I change it, I could make sure it changed.  This is the command I tried, with the sampleTest.R being in my working directory:

edit(file="sampleTest.R")

Sure enough, this loaded my sample file in Notepad.

Using the sample Rprofile.site file from the Quick-R site as a guide, I edited my default editor to the full path to Notepad++.  This looks like it could be the right direction.

Calling the same command from above:

edit(file="sampleTest.R")

Now this loads in Notepad++, which means I have syntax highlighting.  (I would have pointed at Visual Studio Code, but I’m on the one laptop that didn’t have it installed just yet.)

Next goal: How to tell the R Interactive to open the R files in the current instance of Visual Studio….

Adventures with R…

About a week and a half ago, I started going through the R specialization on Coursera.  These are some of my observations.

Reminders of my Past

As I work in RStudio and go through lessons on data tidying, querying for values, and creating functions, I am reminded of some of the courses I went through in my past.  I am calling functions – such as correlation – that I (vaguely) remember learning about in my statistics class.  A lot of my interactions with R remind me of the days of working on engineering homework in Matlab.  I’m also finding that the language makes a lot of sense to me because it has elements of object-oriented programming – akin to the C# and Java that I teach at The Software Guild – and functional programming – with concepts like pipelines and chaining functions, which I liken to some of my PowerShell adventures.  It’s been quite an adventure so far.

Preparedness Going In

I’ve been curious about data science for awhile.  Catching Matthew Renze’s Practical Data Science with R workshop at CodeMash encouraged my curiosity out more.  Between January and March, I dreamt of data science stuff and had ideas popping into my head – especially since NASA’s International Space Apps Challenge is coming up in April, and I’d love to show my NASA friends what I’ve been playing with, hopefully using some of their datasets.  When it comes to querying data, I have a solid background in that too – having worked with multiple RDBMSes and worn the database administrator hat in my past.  Finally, I realized that I was prepared enough – between my solid understanding of programming languages and paradigms and having been exposed to R in the workshop – that I had better follow my dreams and take a course to keep me on the right path.

Current Status

Tonight, I hit an achievement – I finished Course 1 of the R specialization.  Yes, it’s a 4 week course.  Yes, I went through it in a short period of time – but my preparedness really helped in this case.  The only road block I had in this first course was when it came time to use statistical functions and not remembering what they meant or represented.  But after reading and plugging away at it for an hour or so, it all started coming together.

I signed up for Course 2, which starts on Monday.  I’m already through the Week 1 material there, and I’m having fun creating functions.  As I was writing some of my code, I laughed because I recognized R’s syntax and thought “ah… anonymous functions… much like my lambdas in C# and Java….”  It’s good to be adding another language to my toolbelt.

Also, while I mentioned RStudio above, I also find myself yearning to get back into Visual Studio at times.  So when I get tired of RStudio, I switch back to R Tools for Visual Studio 2015.  The only downfall I’ve run into with that is that Notepad is the editor that comes up when swirl() opens a temporary file for me.  I need to eventually sit down, look at configuration, and find out if I can either set Visual Studio or Notepad++ as my R editor for swirl() when I run it in VS.  (And no, I haven’t checked Visual Studio 2017 for the R tools yet…)

Overall, though, I am thrilled to be playing with data again, and R has captured my attention.