Posts

Thoughts in data science, time-series and R.

Time Series Introduction with R codes

1 What is a Time Series A set of observed values ordered in time, or we can say, repeated measurement of something usually with the same fixed interval of time (hourly, weekly, monthly). A collection of observations made sequentially in time[1]. If the variable we are measuring is a count variable, we may have a Poisson Time Series (that is for later). A time series \(T \in \mathbb{R}^n\) is a sequence of real-valued numbers \(t_i \in \mathbb{R} : T=[t_1,t_2,\dots,t_n]\) where \(n\) is the length of \(T\).

Read More…

Presenting 'matrixprofiler' a fast Matrix Profile implementation in R

It took some time, as you can see in the previous post, but the matrixprofiler package is done! What does this mean? The UCR Matrix Pofile is growing, undoubtedly, and the tsmp package is getting almost 700 downloads per month. We decided that we needed to separate the core from the practical usage of Matrix Profile. So the matrixprofiler package has been born and is focused on having the low-level code (C/C++), for speed and robustness.

Read More…

Using RStudio with Github Classroom

In March, 12th, Github has launched the Github Classroom platform. TL; DR, you can continue. For the long story, click here. Classroom For those that want to know more about the capabilities of Github Classroom, I recommend you start here. Using RStudio Why do we need this tutorial? Well, Github Classroom already allows an auto-integration with Microsoft MakeCode and Repl.it, but we, as R developers, like RStudio, right? So how to solve this?

Read More…

Using RStudio with Github Classroom - long version

In March, 12th, Github has launched the Github Classroom platform. TL; DR, click here. Disclaimer: Everything I say here is only my opinion, and some better solution may exist that I could not find yet. Here is the long story: Classroom For those that want to know more about the capabilities of Github Classroom, I recommend you start here. As soon as I knew about it, I felt that I should use it.

Read More…

100 Time Series Data Mining Questions - Part 8

In the last post, we were able to identify when a regime change occurs. Today we will focus on speed (well, a trade-off) For the next question, we will still be using the datasets available at https://github.com/matrix-profile-foundation/mpf-datasets so you can try this at home. The original code (MATLAB) and data are here. Now let’s start: How do I quickly search this long dataset for patterns, if an approximate search is acceptable?

Read More…

100 Time Series Data Mining Questions - Part 7

In the last post, we were able to identify when a regime change occurs. Today we will focus on speed (well, a trade-off) For the next question, we will still be using the datasets available at https://github.com/matrix-profile-foundation/mpf-datasets so you can try this at home. The original code (MATLAB) and data are here. Now let’s start: How do I quickly search this long dataset for patterns, if an approximate search is acceptable?

Read More…

100 Time Series Data Mining Questions - Part 6

In the last post took a very long time series, and we summarize it. Now we will do something that seems related when we look at the regime bar: regime change detection. For the next question, we will still be using the datasets available at https://github.com/matrix-profile-foundation/mpf-datasets so you can try this at home. The original code (MATLAB) and data are here. Now let’s start: When does the regime change in this time series?

Read More…

100 Time Series Data Mining Questions - Part 5

In the last post we managed to find similar patterns between two time series. For the next question, we will still be using the datasets available at https://github.com/matrix-profile-foundation/mpf-datasets so you can try this at home. The original code (MATLAB) and data are here. Now let’s start: If you had to summarize this long time series with just two shorter examples, what would they be? This is a new kind of question.

Read More…

100 Time Series Data Mining Questions - Part 4

In the last post we’ve understood and find Discords in our data. For the next question, we will still be using the datasets available at https://github.com/matrix-profile-foundation/mpf-datasets so you can try this at home. The original code (MATLAB) and data are here. Now let’s start: Is there any pattern that is common to these two time series? Now we will see one of the most interesting and fast jobs that the Matrix Profile can do (there are more, for sure).

Read More…

100 Time Series Data Mining Questions - Part 3

In the last post we started looking for repeated patterns in a time series, what we call Motifs. For the next question, we will still be using the datasets available at https://github.com/matrix-profile-foundation/mpf-datasets so you can try this at home. The original code (MATLAB) and data are here. Now let’s start: What are the three most unusual days in this three-month-long dataset? Now we don’t know what we are looking for, but we want to discover something.

Read More…