The Project
My husband has various temperature and humidity sensors scattered throughout the house, recording data points to a MySQL server. The data is stored on a table that looks like this:
id
<int>
|
date
<chr>
|
sensorname
<chr>
|
sensorvalue
<dbl>
|
|
---|---|---|---|---|
1 | 31 | 2016-12-18 22:20:23 | temp5 | 63.6116 |
2 | 32 | 2016-12-18 22:20:23 | finalDHTTempF2 | 68.0000 |
3 | 33 | 2016-12-18 22:20:23 | humidity2 | 36.0000 |
4 | 34 | 2016-12-18 22:25:23 | temp5 | 64.1750 |
5 | 35 | 2016-12-18 22:25:23 | finalDHTTempF2 | 68.0000 |
6 | 36 | 2016-12-18 22:25:23 | humidity2 | 36.0000 |
7 | 37 | 2016-12-18 22:30:23 | temp5 | 63.7250 |
8 | 38 | 2016-12-18 22:30:23 | finalDHTTempF2 | 69.8000 |
9 | 39 | 2016-12-18 22:30:23 | humidity2 | 35.0000 |
10 | 40 | 2016-12-18 22:35:23 | temp5 | 63.3866 |
I wanted to use his dataset to test my adventures in applying R.
Our current dataset data
is a data frame with 198164 rows.
The Problem
Looking at this data, the first thing I thought was untidy. There has to be a better way. When I think of tidy data, I think of the tidyr package, which is used to help make data tidy, easier to work with. Specifically, I thought of the spread()
function, where I could break things up. Once data was spread into appropriate columns, I figure I can operate on the data a bit better.
The Adventures so far…
As seen in the date
field, the values are logged with their times. This is why we have so many data points. The first thing I wanted to do was group the values into daily means.
Cleaning up Dates
I am using lubridate to make some of my date management a bit easier. I am using dplyr to do the chaining with %>%
. I grouped my data by sensor then by date parts – year, month, and day. After grouping the data, I summarized the data to get daily means. Once the data was summarized, I spread it out to make it more meaningful:
year(date)
<dbl>
|
month(date)
<dbl>
|
day(date)
<int>
|
finalDHTTempF1
<dbl>
|
finalDHTTempF2
<dbl>
|
finalDHTTempF3
<dbl>
|
humidity1
<dbl>
|
||
---|---|---|---|---|---|---|---|---|
1 | 2016 | 12 | 18 | NA | 68.34286 | NA | NA | |
2 | 2016 | 12 | 19 | NA | 67.77578 | NA | NA | |
3 | 2016 | 12 | 20 | NA | 67.88750 | NA | NA | |
4 | 2016 | 12 | 21 | NA | 68.95625 | NA | NA | |
5 | 2016 | 12 | 22 | NA | 69.74375 | NA | NA | |
6 | 2016 | 12 | 23 | NA | 69.71875 | NA | NA | |
7 | 2016 | 12 | 24 | NA | 70.97500 | NA | NA | |
8 | 2016 | 12 | 25 | NA | 70.85625 | NA | NA | |
9 | 2016 | 12 | 26 | NA | 71.78750 | NA | NA | |
10 | 2016 | 12 | 27 | NA | 71.08750 | NA | NA |
finalDHTTempF1
<dbl>
|
finalDHTTempF2
<dbl>
|
finalDHTTempF3
<dbl>
|
humidity1
<dbl>
|
humidity2
<dbl>
|
humidity3
<dbl>
|
temp4
<dbl>
|
temp5
<dbl>
|
|
---|---|---|---|---|---|---|---|---|
NA | 68.34286 | NA | NA | 35.80952 | NA | NA | 63.08703 | |
NA | 67.77578 | NA | NA | 35.55709 | NA | NA | 62.37841 | |
NA | 67.88750 | NA | NA | 35.50347 | NA | NA | 62.41281 | |
NA | 68.95625 | NA | NA | 35.46528 | NA | NA | 63.40109 | |
NA | 69.74375 | NA | NA | 35.24306 | NA | NA | 64.36713 | |
NA | 69.71875 | NA | NA | 35.25000 | NA | NA | 64.33000 |
Cleaning up NAs
Now some of the data shows NA
. If there’s anything I’ve learned with data, NULL
and NA
can be problematic, depending on the data tool and the user operating said tool. In this case, I can easily convert my NA
values to 0
without ruining the data meaning:
finalDHTTempF1
<dbl>
|
finalDHTTempF2
<dbl>
|
finalDHTTempF3
<dbl>
|
humidity1
<dbl>
|
humidity2
<dbl>
|
humidity3
<dbl>
|
temp4
<dbl>
|
temp5
<dbl>
|
|
---|---|---|---|---|---|---|---|---|
0 | 68.34286 | 0 | 0 | 35.80952 | 0 | 0 | 63.08703 | |
0 | 67.77578 | 0 | 0 | 35.55709 | 0 | 0 | 62.37841 | |
0 | 67.88750 | 0 | 0 | 35.50347 | 0 | 0 | 62.41281 | |
0 | 68.95625 | 0 | 0 | 35.46528 | 0 | 0 | 63.40109 | |
0 | 69.74375 | 0 | 0 | 35.24306 | 0 | 0 | 64.36713 | |
0 | 69.71875 | 0 | 0 | 35.25000 | 0 | 0 | 64.33000 |
Presentation
So now that I have daily averages in a format that I can work with, let’s do something meaningful with the data – let’s plot it! I am using ggplot2 for plotting.
Conclusion
So far, I’m having fun putting my skills to work, especially with this dataset at. I’m at the tail end of the 2nd course of an R specialization on Coursera. Between CodeMash and Coursera, I’ve been enjoying my exploRation into R. Here’s to many adventures ahead!
One thought on “Sadukie’s SensoRs AdventuRes”