Sadukie’s SensoRs AdventuRes

The Project

My husband has various temperature and humidity sensors scattered throughout the house, recording data points to a MySQL server. The data is stored on a table that looks like this:

id
<int>
date
<chr>
sensorname
<chr>
sensorvalue
<dbl>
1 31 2016-12-18 22:20:23 temp5 63.6116
2 32 2016-12-18 22:20:23 finalDHTTempF2 68.0000
3 33 2016-12-18 22:20:23 humidity2 36.0000
4 34 2016-12-18 22:25:23 temp5 64.1750
5 35 2016-12-18 22:25:23 finalDHTTempF2 68.0000
6 36 2016-12-18 22:25:23 humidity2 36.0000
7 37 2016-12-18 22:30:23 temp5 63.7250
8 38 2016-12-18 22:30:23 finalDHTTempF2 69.8000
9 39 2016-12-18 22:30:23 humidity2 35.0000
10 40 2016-12-18 22:35:23 temp5 63.3866

I wanted to use his dataset to test my adventures in applying R.

Our current dataset data is a data frame with 198164 rows.

The Problem

Looking at this data, the first thing I thought was untidy. There has to be a better way. When I think of tidy data, I think of the tidyr package, which is used to help make data tidy, easier to work with. Specifically, I thought of the spread() function, where I could break things up. Once data was spread into appropriate columns, I figure I can operate on the data a bit better.

The Adventures so far…

As seen in the date field, the values are logged with their times. This is why we have so many data points. The first thing I wanted to do was group the values into daily means.

Cleaning up Dates

I am using lubridate to make some of my date management a bit easier. I am using dplyr to do the chaining with %>%. I grouped my data by sensor then by date parts – year, month, and day. After grouping the data, I summarized the data to get daily means. Once the data was summarized, I spread it out to make it more meaningful:

year(date)
<dbl>
month(date)
<dbl>
day(date)
<int>
finalDHTTempF1
<dbl>
finalDHTTempF2
<dbl>
finalDHTTempF3
<dbl>
humidity1
<dbl>
1 2016 12 18 NA 68.34286 NA NA
2 2016 12 19 NA 67.77578 NA NA
3 2016 12 20 NA 67.88750 NA NA
4 2016 12 21 NA 68.95625 NA NA
5 2016 12 22 NA 69.74375 NA NA
6 2016 12 23 NA 69.71875 NA NA
7 2016 12 24 NA 70.97500 NA NA
8 2016 12 25 NA 70.85625 NA NA
9 2016 12 26 NA 71.78750 NA NA
10 2016 12 27 NA 71.08750 NA NA
finalDHTTempF1
<dbl>
finalDHTTempF2
<dbl>
finalDHTTempF3
<dbl>
humidity1
<dbl>
humidity2
<dbl>
humidity3
<dbl>
temp4
<dbl>
temp5
<dbl>
NA 68.34286 NA NA 35.80952 NA NA 63.08703
NA 67.77578 NA NA 35.55709 NA NA 62.37841
NA 67.88750 NA NA 35.50347 NA NA 62.41281
NA 68.95625 NA NA 35.46528 NA NA 63.40109
NA 69.74375 NA NA 35.24306 NA NA 64.36713
NA 69.71875 NA NA 35.25000 NA NA 64.33000

Cleaning up NAs

Now some of the data shows NA. If there’s anything I’ve learned with data, NULL and NA can be problematic, depending on the data tool and the user operating said tool. In this case, I can easily convert my NA values to 0 without ruining the data meaning:

finalDHTTempF1
<dbl>
finalDHTTempF2
<dbl>
finalDHTTempF3
<dbl>
humidity1
<dbl>
humidity2
<dbl>
humidity3
<dbl>
temp4
<dbl>
temp5
<dbl>
0 68.34286 0 0 35.80952 0 0 63.08703
0 67.77578 0 0 35.55709 0 0 62.37841
0 67.88750 0 0 35.50347 0 0 62.41281
0 68.95625 0 0 35.46528 0 0 63.40109
0 69.74375 0 0 35.24306 0 0 64.36713
0 69.71875 0 0 35.25000 0 0 64.33000

Presentation

So now that I have daily averages in a format that I can work with, let’s do something meaningful with the data – let’s plot it! I am using ggplot2 for plotting.

Conclusion

So far, I’m having fun putting my skills to work, especially with this dataset at. I’m at the tail end of the 2nd course of an R specialization on Coursera. Between CodeMash and Coursera, I’ve been enjoying my exploRation into R. Here’s to many adventures ahead!

Why does RTVS open Notepad?!?

For the past few weeks, I’ve been going through the Mastering Software Development in R specialization on Coursera.  After Matthew Renze mentioned R Tools for Visual Studio (RTVS) during his workshop at CodeMash, I had to see what this was about.

As I have been going through my courses – which use swirl() – I have been looking at how things work, comparing RStudio to RTVS.  One of the things that was maddening for me was going through one of the courses in RTVS and having R files open in Notepad.  Notepad?!?  RStudio wasn’t doing this, so I was even more frustrated.  I could also open R files with Visual Studio right from the file system, so the file association was already in place.  This didn’t make sense.  However… RTVS is an open source project, as is swirl().  So I spent tonight looking at code in GitHub.

After poking around swirl(), I found something that led me to try the following command:

getOption("editor")
[1] "notepad"

Wait… how?! Why?!  Poking around some more, I realized that R has its own profile file – similar concepts to the PowerShell profile file and the bash profile file.  I found this post on Customizing Startup (Quick-R) leading me down the right direction.  With a bit of trial and error and finding this closed issue in the RTVS repo, I moved my .Rprofile file to Documents, and RTVS was happier.

Before changing the editor, I wanted to make sure that I could call the editor – so that when I change it, I could make sure it changed.  This is the command I tried, with the sampleTest.R being in my working directory:

edit(file="sampleTest.R")

Sure enough, this loaded my sample file in Notepad.

Using the sample Rprofile.site file from the Quick-R site as a guide, I edited my default editor to the full path to Notepad++.  This looks like it could be the right direction.

Calling the same command from above:

edit(file="sampleTest.R")

Now this loads in Notepad++, which means I have syntax highlighting.  (I would have pointed at Visual Studio Code, but I’m on the one laptop that didn’t have it installed just yet.)

Next goal: How to tell the R Interactive to open the R files in the current instance of Visual Studio….

Adventures with R…

About a week and a half ago, I started going through the R specialization on Coursera.  These are some of my observations.

Reminders of my Past

As I work in RStudio and go through lessons on data tidying, querying for values, and creating functions, I am reminded of some of the courses I went through in my past.  I am calling functions – such as correlation – that I (vaguely) remember learning about in my statistics class.  A lot of my interactions with R remind me of the days of working on engineering homework in Matlab.  I’m also finding that the language makes a lot of sense to me because it has elements of object-oriented programming – akin to the C# and Java that I teach at The Software Guild – and functional programming – with concepts like pipelines and chaining functions, which I liken to some of my PowerShell adventures.  It’s been quite an adventure so far.

Preparedness Going In

I’ve been curious about data science for awhile.  Catching Matthew Renze’s Practical Data Science with R workshop at CodeMash encouraged my curiosity out more.  Between January and March, I dreamt of data science stuff and had ideas popping into my head – especially since NASA’s International Space Apps Challenge is coming up in April, and I’d love to show my NASA friends what I’ve been playing with, hopefully using some of their datasets.  When it comes to querying data, I have a solid background in that too – having worked with multiple RDBMSes and worn the database administrator hat in my past.  Finally, I realized that I was prepared enough – between my solid understanding of programming languages and paradigms and having been exposed to R in the workshop – that I had better follow my dreams and take a course to keep me on the right path.

Current Status

Tonight, I hit an achievement – I finished Course 1 of the R specialization.  Yes, it’s a 4 week course.  Yes, I went through it in a short period of time – but my preparedness really helped in this case.  The only road block I had in this first course was when it came time to use statistical functions and not remembering what they meant or represented.  But after reading and plugging away at it for an hour or so, it all started coming together.

I signed up for Course 2, which starts on Monday.  I’m already through the Week 1 material there, and I’m having fun creating functions.  As I was writing some of my code, I laughed because I recognized R’s syntax and thought “ah… anonymous functions… much like my lambdas in C# and Java….”  It’s good to be adding another language to my toolbelt.

Also, while I mentioned RStudio above, I also find myself yearning to get back into Visual Studio at times.  So when I get tired of RStudio, I switch back to R Tools for Visual Studio 2015.  The only downfall I’ve run into with that is that Notepad is the editor that comes up when swirl() opens a temporary file for me.  I need to eventually sit down, look at configuration, and find out if I can either set Visual Studio or Notepad++ as my R editor for swirl() when I run it in VS.  (And no, I haven’t checked Visual Studio 2017 for the R tools yet…)

Overall, though, I am thrilled to be playing with data again, and R has captured my attention.

CodeMash 2017 Recap: Pre-Compiler Day 2

As I mentioned in yesterday’s post, one of my personal tech topics that I want to explore in 2017 is data science.  For as long as I’ve known, I love data.  As a hobbyist in my teens, I was playing with Access and reporting on data.  I eventually migrated to Visual Basic talking to Access… which led to me taking an internship right out of high school where I was QAing data sheets and working with a contractor on an app that was migrating an Access database to a VB front end and SQL Server back end.  That contractor saw my curiosity and excitement around data, and he introduced me to the Oracle database administrator.  Fast forward into my career – lots of fun writing data reports in Crystal Reports and SQL Server Reporting Services and wearing the database administrator hat over many versions of SQL Server!  Moving right along, I end up writing and supporting web applications that talk to SQL Server back ends.  Nowadays, I’m working at The Software Guild, writing database curriculum for both C# and Java cohorts and encouraging our apprentices to explore databases – amongst other topics.  I get to play with SQL Server and MySQL.

However, as much as I get to play with these tools and data, I’ve been more curious about the topic that is getting a lot of talk – data science.  One of my friends asked what we wanted to learn more about in 2017, and when I mentioned data science, another friend asked if I had met Matthew Renze yet.  While I hadn’t crossed paths with him at that point, I was curious.  He linked me to his courses, which gave me an idea of what to expect with the pre-compiler.  Most of all, I was looking forward to a day of data science at CodeMash, hoping to see what all the talk was about.

Pre-compiler – Practical Data Science with R

With a name like “practical data science”, I went into the pre-compiler expecting how to work with R and put it in practice.  The name of the pre-compiler workshop set the expectations for me quite clearly.  Reading the abstract and the pre-reqs for it, everything was spelled out enough for me to have reasonable expectations going into it.

R and RStudio

In this Practical Data Science with R workshop, we learned about the R language and used RStudio to run through labs on various topics in data science.  I really enjoyed Matthew’s storytelling, weaving a story around a fictitious guy’s ridiculous idea for a space western musical movie.  We played with a movies dataset for many of our labs, looking at the data and seeing why this guy’s musical idea was a bit ridiculous and unwise. For some other labs, we also played with iris data.

Looking at the R language, it made sense to me.  Everything being treated as a vector… I had seen that in other languages before, so it didn’t seem foreign.  The arrows of assignment reminded me of lambda syntax in Java and C#… oh arrows and lambdas and assignments… again, it seemed familiar enough.  The indexing with the ranges reminded me of my adventures with Ruby Koans of CodeMashes past.    Even now, as I recap this, I am realizing that some of the familiarity is due to my past background – surviving engineering and math statistics courses using MATLAB and Maple.  In fact, during the workshop, I mentioned to my friend Victor that I wish I had this mentality back then, as my advanced math classes may have been more tolerable back then.  Playing with R reminded me of how much I love analyzing data and building out visualizations.

R in Visual Studio

In the workshop, Matthew Renze mentioned that you could also run these things in Visual Studio.  Of course, I couldn’t resist – running a new language for me in a tool I am quite familiar with!  I installed R Tools for Visual Studio and ran through the labs from today in Visual Studio.  I really like that the Ctrl-Enter to execute code in RStudio carried over into Visual Studio.  The visualizations were neat to see when I ran them in Visual Studio.

Inspiration to Play More

After sitting through the data science workshop today, I realized a lot about myself and my love of data.  I realize that my love of data really hasn’t changed in the past couple decades – I really do enjoy seeing what all is in a database, how the data relates, the various trends, cleaning it up, understanding why there are certain trends and what the outliers may indicate.  While I had a quick flashback to younger me not happy in my classes in college that introduced the concepts, I realized that I still like the visualizations and calculations, and with the right teachers, things aren’t as bad as they once seemed.  Playing with data makes me excited, and today’s workshop reaffirmed that.

This really confirmed – 2017 will be my year to have fun with data science.