Title: | Time Series with Matrix Profile |
---|---|
Description: | A toolkit implementing the Matrix Profile concept that was created by CS-UCR <http://www.cs.ucr.edu/~eamonn/MatrixProfile.html>. |
Authors: | Francisco Bischoff [aut, cre] |
Maintainer: | Francisco Bischoff <[email protected]> |
License: | Apache License (>= 2.0) |
Version: | 0.4.15 |
Built: | 2025-02-05 04:57:09 UTC |
Source: | https://github.com/matrix-profile-foundation/tsmp |
The goal of this function is to compute all fundamental algorithms on the provided time series data. See details for more information.
analyze( ts, windows = NULL, query = NULL, sample_pct = 1, threshold = 0.98, n_jobs = 1L )
analyze( ts, windows = NULL, query = NULL, sample_pct = 1, threshold = 0.98, n_jobs = 1L )
ts |
a |
windows |
an |
query |
a |
sample_pct |
a |
threshold |
a |
n_jobs |
an |
For now the following is computed:
Matrix Profile - exact or approximate based on sample_pct
given that a single windows
is provided. By default
is the exact algorithm;
Top 3 Motifs;
Top 3 Discords;
Plot Matrix Profile, Motifs and Discords.
When windows
is not provided or more than a single window is provided,
the Pan-Matrix Profile is computed:
Compute the upper bound when a threshold
is provided (it is, by default);
Compute Pan-Matrix Profile for all windows
provided, below the upper bound, or a default range when no windows
is provided;
Top Motifs;
Top Discords;
Plot Pan-Matrix Profile, motifs and discords.
The appropriate Matrix Profile or Pan-Matrix Profile profile object and also plots the graphics.
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Main API:
compute()
,
discords()
,
motifs()
,
visualize()
# Matrix Profile result <- analyze(mp_toy_data$data[, 1], 80) # Pan Matrix Profile result <- analyze(mp_toy_data$data[, 1])
# Matrix Profile result <- analyze(mp_toy_data$data[, 1], 80) # Pan Matrix Profile result <- analyze(mp_toy_data$data[, 1])
The base Classes are MatrixProfile
and MultiMatrixProfile
, but as other functions are used,
classes are pushed behind, since the last output normally is the most significant. If you want,
for example, to plot the Matrix Profile from a Fluss
object, you may use as.matrixprofile()
to cast it back.
as.matrixprofile(.mp) as.multimatrixprofile(.mp) as.pmp(.mp) as.valmod(.mp) as.fluss(.mp) as.chain(.mp) as.discord(.mp) as.motif(.mp) as.multimotif(.mp) as.arccount(.mp) as.salient(.mp)
as.matrixprofile(.mp) as.multimatrixprofile(.mp) as.pmp(.mp) as.valmod(.mp) as.fluss(.mp) as.chain(.mp) as.discord(.mp) as.motif(.mp) as.multimotif(.mp) as.arccount(.mp) as.salient(.mp)
.mp |
a TSMP object. |
Returns the object with the new class, if possible.
as.matrixprofile()
: Cast an object changed by another function back to MatrixProfile
.
as.multimatrixprofile()
: Cast an object changed by another function back to MultiMatrixProfile
.
as.pmp()
: Cast an object changed by another function back to PMP
.
as.valmod()
: Cast an object changed by another function back to MultiMatrixProfile
.
as.fluss()
: Cast an object changed by another function back to Fluss
.
as.chain()
: Cast an object changed by another function back to Chain
.
as.discord()
: Cast an object changed by another function back to Discord
.
as.motif()
: Cast an object changed by another function back to Motif
.
as.multimotif()
: Cast an object changed by another function back to MultiMotif
.
as.arccount()
: Cast an object changed by another function back to ArcCount
.
as.salient()
: Cast an object changed by another function back to Salient
.
w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) mp <- find_motif(mp) class(mp) # first class will be "Motif" plot(mp) # plots a motif plot plot(as.matrixprofile(mp)) # plots a matrix profile plot
w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) mp <- find_motif(mp) class(mp) # first class will be "Motif" plot(mp) # plots a motif plot plot(as.matrixprofile(mp)) # plots a matrix profile plot
This function overwrites the current Matrix Profile using the Annotation Vector. Use with caution.
av_apply(.mp)
av_apply(.mp)
.mp |
A Matrix Profile with an Annotation Vector. |
Returns the input .mp
object corrected by the embedded annotation vector.
Dau HA, Keogh E. Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '17. New York, New York, USA: ACM Press; 2017. p. 125-34.
Other Annotation vectors:
av_complexity()
,
av_hardlimit_artifact()
,
av_motion_artifact()
,
av_stop_word()
,
av_zerocrossing()
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) mp <- av_complexity(mp) av <- av_apply(mp)
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) mp <- av_complexity(mp) av <- av_apply(mp)
Computes the annotation vector that favors complexity
av_complexity(.mp, data, dilution_factor = 0, apply = FALSE)
av_complexity(.mp, data, dilution_factor = 0, apply = FALSE)
.mp |
a Matrix Profile object. |
data |
a |
dilution_factor |
a |
apply |
logical. (Default is |
Returns the input .mp
object with an embedded annotation vector.
Dau HA, Keogh E. Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '17. New York, New York, USA: ACM Press; 2017. p. 125-34.
Other Annotation vectors:
av_apply()
,
av_hardlimit_artifact()
,
av_motion_artifact()
,
av_stop_word()
,
av_zerocrossing()
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) av <- av_complexity(mp, apply = TRUE)
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) av <- av_complexity(mp, apply = TRUE)
Computes the annotation vector that suppresses hard-limited artifacts
av_hardlimit_artifact(.mp, data, apply = FALSE)
av_hardlimit_artifact(.mp, data, apply = FALSE)
.mp |
a Matrix Profile object. |
data |
a |
apply |
logical. (Default is |
Returns the input .mp
object with an embedded annotation vector.
Dau HA, Keogh E. Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '17. New York, New York, USA: ACM Press; 2017. p. 125-34.
Other Annotation vectors:
av_apply()
,
av_complexity()
,
av_motion_artifact()
,
av_stop_word()
,
av_zerocrossing()
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) av <- av_hardlimit_artifact(mp, apply = TRUE)
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) av <- av_hardlimit_artifact(mp, apply = TRUE)
Computes the annotation vector that suppresses motion artifacts
av_motion_artifact(.mp, data, apply = FALSE)
av_motion_artifact(.mp, data, apply = FALSE)
.mp |
a Matrix Profile object. |
data |
a |
apply |
logical. (Default is |
Returns the input .mp
object with an embedded annotation vector.
Dau HA, Keogh E. Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '17. New York, New York, USA: ACM Press; 2017. p. 125-34.
Other Annotation vectors:
av_apply()
,
av_complexity()
,
av_hardlimit_artifact()
,
av_stop_word()
,
av_zerocrossing()
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) av <- av_motion_artifact(mp, apply = TRUE)
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) av <- av_motion_artifact(mp, apply = TRUE)
Computes the annotation vector that suppresses stop-word motifs
av_stop_word( .mp, data, stop_word_loc, exclusion_zone = NULL, threshold = 0.1, apply = FALSE )
av_stop_word( .mp, data, stop_word_loc, exclusion_zone = NULL, threshold = 0.1, apply = FALSE )
.mp |
a Matrix Profile object. |
data |
a |
stop_word_loc |
an |
exclusion_zone |
a |
threshold |
a |
apply |
logical. (Default is |
The function is intended to be generic. However, its parameters (stop_word_loc
,
exclusion_zone
and threshold
) are highly dataset dependent.
Returns the input .mp
object with an embedded annotation vector.
Dau HA, Keogh E. Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '17. New York, New York, USA: ACM Press; 2017. p. 125-34.
Other Annotation vectors:
av_apply()
,
av_complexity()
,
av_hardlimit_artifact()
,
av_motion_artifact()
,
av_zerocrossing()
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) av <- av_stop_word(mp, stop_word_loc = 150, apply = TRUE)
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) av <- av_stop_word(mp, stop_word_loc = 150, apply = TRUE)
Computes the annotation vector that favors number of zero crossing
av_zerocrossing(.mp, data, apply = FALSE)
av_zerocrossing(.mp, data, apply = FALSE)
.mp |
a Matrix Profile object. |
data |
a |
apply |
logical. (Default is |
Returns the input .mp
object with an embedded annotation vector.
Dau HA, Keogh E. Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '17. New York, New York, USA: ACM Press; 2017. p. 125-34.
Other Annotation vectors:
av_apply()
,
av_complexity()
,
av_hardlimit_artifact()
,
av_motion_artifact()
,
av_stop_word()
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) av <- av_zerocrossing(mp, apply = TRUE)
data <- mp_test_data$train$data[1:1000] w <- 50 mp <- tsmp(data, window_size = w, verbose = 0) av <- av_zerocrossing(mp, apply = TRUE)
Main API Function
compute( ts, windows = NULL, query = NULL, sample_pct = 1, threshold = 0.98, n_jobs = 1L )
compute( ts, windows = NULL, query = NULL, sample_pct = 1, threshold = 0.98, n_jobs = 1L )
ts |
a |
windows |
an |
query |
a |
sample_pct |
a |
threshold |
a |
n_jobs |
an |
Computes the exact or approximate Matrix Profile based on the sample percent specified. Currently, MPX and SCRIMP++ are used for the exact and approximate algorithms respectively. See details for more information about the arguments combinations.
When a single windows
is given, the Matrix Profile is computed. If a query
is provided, AB join is computed.
Otherwise the self-join is computed.
When multiple windows
or none are given, the Pan-Matrix Profile is computed. If a threshold
is set (it is,
by default), the upper bound will be computed and the given windows
or a default range (when no windows
), below
the upper bound will be computed.
The profile computed.
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Main API:
analyze()
,
discords()
,
motifs()
,
visualize()
# Matrix Profile result <- compute(mp_toy_data$data[, 1], 80) # Pan-Matrix Profile result <- compute(mp_toy_data$data[, 1])
# Matrix Profile result <- compute(mp_toy_data$data[, 1], 80) # Pan-Matrix Profile result <- compute(mp_toy_data$data[, 1])
Search for Discord
discords( profile, exclusion_zone = profile$ez, k = 3L, neighbor_count = 10L, radius = 3 )
discords( profile, exclusion_zone = profile$ez, k = 3L, neighbor_count = 10L, radius = 3 )
profile |
a |
exclusion_zone |
an |
k |
an |
neighbor_count |
an |
radius |
an |
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Main API:
analyze()
,
compute()
,
motifs()
,
visualize()
Mueen's Algorithm for Similarity Search is The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance and Correlation Coefficient.
dist_profile( data, query, ..., window_size = NULL, method = "v3", index = 1, k = NULL, weight = NULL, paa = 1 )
dist_profile( data, query, ..., window_size = NULL, method = "v3", index = 1, k = NULL, weight = NULL, paa = 1 )
data |
a |
query |
a |
... |
Precomputed values from the first iteration. If not supplied, these values will be computed. |
window_size |
an |
method |
method that will be used to calculate the distance profile. See details. |
index |
an |
k |
an |
weight |
a |
paa |
a |
This function has several ways to work:
Case 1: You have a small sized query and the data. In this case you only have to provide the first two
parameters data
and query
. Internally the window_size
will be get from the query length.
Case 2: You have one or two data vectors and want to compute the join or self-similarity. In this case
you need to use the recursive solution. The parameters are data
, query
, window_size
and index
.
The first iteration don't need the index
unless you are starting somewhere else. The query
will be
the source of a query_window
, starting on index
, with length of window_size
.
The method
defines which MASS will be used. Current supported values are: v2
, v3
, weighted
.
Returns the distance_profile
for the given query and the last_product
for STOMP
algorithm and the parameters for recursive call. See details.
Abdullah Mueen, Yan Zhu, Michael Yeh, Kaveh Kamgar, Krishnamurthy Viswanathan, Chetan Kumar Gupta and Eamonn Keogh (2015), The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance
Website: https://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html
w <- mp_toy_data$sub_len ref_data <- mp_toy_data$data[, 1] # minimum example, data and query nn <- dist_profile(ref_data, ref_data[1:w]) distance_profile <- sqrt(nn$distance_profile) # data and indexed query nn <- dist_profile(ref_data, ref_data, window_size = w, index = 10) distance_profile <- sqrt(nn$distance_profile) # recursive nn <- NULL for (i in seq_len(10)) { nn <- dist_profile(ref_data, ref_data, nn, window_size = w, index = i) } # weighted weight <- c(rep(1, w / 3), rep(0.5, w / 3), rep(0.8, w / 3)) # just an example nn <- dist_profile(ref_data, ref_data, window_size = w, index = 1, method = "weighted", weight = weight ) distance_profile <- sqrt(nn$distance_profile)
w <- mp_toy_data$sub_len ref_data <- mp_toy_data$data[, 1] # minimum example, data and query nn <- dist_profile(ref_data, ref_data[1:w]) distance_profile <- sqrt(nn$distance_profile) # data and indexed query nn <- dist_profile(ref_data, ref_data, window_size = w, index = 10) distance_profile <- sqrt(nn$distance_profile) # recursive nn <- NULL for (i in seq_len(10)) { nn <- dist_profile(ref_data, ref_data, nn, window_size = w, index = i) } # weighted weight <- c(rep(1, w / 3), rep(0.5, w / 3), rep(0.8, w / 3)) # just an example nn <- dist_profile(ref_data, ref_data, window_size = w, index = 1, method = "weighted", weight = weight ) distance_profile <- sqrt(nn$distance_profile)
This function does not handle NA values
fast_avg_sd(data, window_size, rcpp = FALSE)
fast_avg_sd(data, window_size, rcpp = FALSE)
data |
a |
window_size |
moving sd window size |
rcpp |
a |
Returns a list
with avg
and sd
vector
s
This function does not handle NA values
fast_movavg(data, window_size)
fast_movavg(data, window_size)
data |
a |
window_size |
moving sd window size |
Returns a vector
with the moving average
data_avg <- fast_movavg(mp_toy_data$data[, 1], mp_toy_data$sub_len)
data_avg <- fast_movavg(mp_toy_data$data[, 1], mp_toy_data$sub_len)
This function does not handle NA values
fast_movsd(data, window_size, rcpp = FALSE)
fast_movsd(data, window_size, rcpp = FALSE)
data |
a |
window_size |
moving sd window size |
rcpp |
a |
Returns a vector
with the moving standard deviation
data_sd <- fast_movsd(mp_toy_data$data[, 1], mp_toy_data$sub_len)
data_sd <- fast_movsd(mp_toy_data$data[, 1], mp_toy_data$sub_len)
Time Series Chains is a new primitive for time series data mining.
find_chains(.mp)
find_chains(.mp)
.mp |
a |
Returns the input .mp
object with a new name chain
. It contains: chains
, a list
of chains found with more than 2 patterns and best
with the best one.
Zhu Y, Imamura M, Nikovski D, Keogh E. Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining. Knowl Inf Syst. 2018 Jun 2;1-27.
Website: https://sites.google.com/site/timeserieschain/
w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) mp <- find_chains(mp)
w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) mp <- find_chains(mp)
Search for Discord
find_discord(.mp, ...) ## S3 method for class 'MatrixProfile' find_discord( .mp, data, n_discords = 1, n_neighbors = 3, radius = 3, exclusion_zone = NULL, ... ) ## S3 method for class 'PMP' find_discord( .mp, data, n_discords = 1, n_neighbors = 3, radius = 3, exclusion_zone = NULL, ... )
find_discord(.mp, ...) ## S3 method for class 'MatrixProfile' find_discord( .mp, data, n_discords = 1, n_neighbors = 3, radius = 3, exclusion_zone = NULL, ... ) ## S3 method for class 'PMP' find_discord( .mp, data, n_discords = 1, n_neighbors = 3, radius = 3, exclusion_zone = NULL, ... )
.mp |
a |
... |
further arguments to be passed to class specific function. |
data |
the data used to build the Matrix Profile, if not embedded. |
n_discords |
an |
n_neighbors |
an |
radius |
an |
exclusion_zone |
if a |
For class MatrixProfile
, returns the input .mp
object with a new name discord
. It contains: discord_idx
, a vector
of discords found
For class PMP
, returns the input .mp
object with a new name discord
. It contains: discord_idx
, a vector
of discords found
# Single dimension data w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) mp <- find_discord(mp) pan <- tsmp(mp_gait_data, window_size = 20:30, mode = "pmp") mp <- find_discord(pan)
# Single dimension data w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) mp <- find_discord(mp) pan <- tsmp(mp_gait_data, window_size = 20:30, mode = "pmp") mp <- find_discord(pan)
Search for Motifs
find_motif(.mp, ...) ## S3 method for class 'MatrixProfile' find_motif( .mp, data, n_motifs = 3, n_neighbors = 10, radius = 3, exclusion_zone = NULL, ... ) ## S3 method for class 'MultiMatrixProfile' find_motif( .mp, data, n_motifs = 3, mode = c("guided", "unconstrained"), n_bit = 4, exclusion_zone = NULL, n_dim = NULL, ... ) ## S3 method for class 'PMP' find_motif( .mp, data, n_motifs = 3, n_neighbors = 10, radius = 3, exclusion_zone = NULL, ... )
find_motif(.mp, ...) ## S3 method for class 'MatrixProfile' find_motif( .mp, data, n_motifs = 3, n_neighbors = 10, radius = 3, exclusion_zone = NULL, ... ) ## S3 method for class 'MultiMatrixProfile' find_motif( .mp, data, n_motifs = 3, mode = c("guided", "unconstrained"), n_bit = 4, exclusion_zone = NULL, n_dim = NULL, ... ) ## S3 method for class 'PMP' find_motif( .mp, data, n_motifs = 3, n_neighbors = 10, radius = 3, exclusion_zone = NULL, ... )
.mp |
a |
... |
further arguments to be passed to class specific function. |
data |
the data used to build the Matrix Profile, if not embedded. |
n_motifs |
an |
n_neighbors |
an |
radius |
an |
exclusion_zone |
if a |
mode |
a |
n_bit |
an |
n_dim |
an |
For class MatrixProfile
, returns the input .mp
object with a new name motif
. It contains: motif_idx
, a list
of motif pairs found and motif_neighbor
a list
with respective motif's neighbors.
For class MultiMatrixProfile
, returns the input .mp
object with a new name motif
. It contains: motif_idx
, a vector
of motifs found and motif_dim
a list
the dimensions where the motifs were found
For class PMP
, returns the input .mp
object with a new name motif
. It contains: motif_idx
, a list
of motif pairs found and motif_neighbor
a list
with respective motif's neighbors.
# Single dimension data w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) mp <- find_motif(mp) # Multidimension data w <- mp_toy_data$sub_len data <- mp_toy_data$data[1:200, ] mp <- tsmp(data, window_size = w, mode = "mstomp", verbose = 0) mp <- find_motif(mp) pan <- tsmp(mp_gait_data, window_size = 20:30, mode = "pmp") mp <- find_motif(pan)
# Single dimension data w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) mp <- find_motif(mp) # Multidimension data w <- mp_toy_data$sub_len data <- mp_toy_data$data[1:200, ] mp <- tsmp(data, window_size = w, mode = "mstomp", verbose = 0) mp <- find_motif(mp) pan <- tsmp(mp_gait_data, window_size = 20:30, mode = "pmp") mp <- find_motif(pan)
Time Series Snippets tries to solve mainly the common problem of summarization "Show me some representative/typical data". As stated by the original paper, potential uses of snippets are: integrating summarizations of files directly into an operating, production of automatically generated reports, for example, summarize a sleep study and also can be used to support a host of higher-level tasks, including the comparison of massive data collections.
find_snippet(data, s_size, n_snippets = 2L, window_size = s_size/2L)
find_snippet(data, s_size, n_snippets = 2L, window_size = s_size/2L)
data |
a |
s_size |
an int. Size of snippet. |
n_snippets |
an |
window_size |
an |
Motifs vs. snippets: While motifs reward fidelity of conservation, snippets also rewards coverage. Informally, coverage is some measure of how much of the data is explained or represented by a given snippet.
Shapelets vs. snippets: shapelets are defined as subsequences that are maximally representative of a class. Shapelets are supervised, snippets are unsupervised. Shapelets are generally biased to be as short as possible. In contrast, we want snippets to be longer, to intuitively capture the "flavor" of the time series.
Returns the snippet : a list of n_snippets snippets fraction : fraction of each snippet snippetidx : the location of each snippet within time series
Imani S, Madrid F, Ding W, Crouter S, Keogh E. Matrix Profile XIII: Time Series Snippets: A New Primitive for Time Series Data Mining. In: 2018 IEEE International Conference on Data Mining (ICDM). 2018.
Gharghabi S, Imani S, Bagnall A, Darvishzadeh A, Keogh E. Matrix Profile XII: MPdist: A Novel Time Series Distance Measure to Allow Data Mining in More Challenging Scenarios. In: 2018 IEEE International Conference on Data Mining (ICDM). 2018.
Website: https://sites.google.com/site/snippetfinder/
snippets <- find_snippet(mp_fluss_data$walkjogrun$data[1:300], 40, n_snippets = 2) snippets <- find_snippet(mp_fluss_data$walkjogrun$data, 120, n_snippets = 3) plot(snippets)
snippets <- find_snippet(mp_fluss_data$walkjogrun$data[1:300], 40, n_snippets = 2) snippets <- find_snippet(mp_fluss_data$walkjogrun$data, 120, n_snippets = 3) plot(snippets)
Fast Low-cost Online Semantic Segmentation (FLOSS)
floss( .mp, new_data, data_window, threshold = 1, exclusion_zone = NULL, chunk_size = NULL, keep_cac = TRUE )
floss( .mp, new_data, data_window, threshold = 1, exclusion_zone = NULL, chunk_size = NULL, keep_cac = TRUE )
.mp |
a |
new_data |
a |
data_window |
an |
threshold |
a |
exclusion_zone |
if a |
chunk_size |
an |
keep_cac |
a |
Returns the input .mp
object new names: cac
the corrected arc count, cac_final
the
combination of cac
after repeated calls of floss()
, floss
with the location of semantic
changes and floss_vals
with the normalized arc count value of the semantic change positions.
Gharghabi S, Ding Y, Yeh C-CM, Kamgar K, Ulanova L, Keogh E. Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. In: 2017 IEEE International Conference on Data Mining (ICDM). IEEE; 2017. p. 117-26.
Website: https://sites.google.com/site/onlinesemanticsegmentation/
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Semantic Segmentations:
floss_cac()
,
floss_extract()
,
fluss_cac()
,
fluss_extract()
,
fluss_score()
,
fluss()
data <- mp_fluss_data$tilt_abp$data[1:1000] new_data <- mp_fluss_data$tilt_abp$data[1001:1010] new_data2 <- mp_fluss_data$tilt_abp$data[1011:1020] w <- 80 mp <- tsmp(data, window_size = w, verbose = 0) data_window <- 1000 mp <- floss(mp, new_data, data_window) mp <- floss(mp, new_data2, data_window)
data <- mp_fluss_data$tilt_abp$data[1:1000] new_data <- mp_fluss_data$tilt_abp$data[1001:1010] new_data2 <- mp_fluss_data$tilt_abp$data[1011:1020] w <- 80 mp <- tsmp(data, window_size = w, verbose = 0) data_window <- 1000 mp <- floss(mp, new_data, data_window) mp <- floss(mp, new_data2, data_window)
Computes the arc count with edge and 'online' correction (CAC).
floss_cac(.mp, data_window, exclusion_zone = NULL)
floss_cac(.mp, data_window, exclusion_zone = NULL)
.mp |
a |
data_window |
an |
exclusion_zone |
if a |
Original paper suggest using the classic statistical-process-control heuristic to set a threshold where a semantic change may occur in CAC. This may be useful in real-time implementation as we don't know in advance the number of domain changes to look for. Please check original paper (1).
Returns the input .mp
object a new name cac
with the corrected arc count and cac_final
the combination of cac
after repeated calls of floss()
.
Gharghabi S, Ding Y, Yeh C-CM, Kamgar K, Ulanova L, Keogh E. Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. In: 2017 IEEE International Conference on Data Mining (ICDM). IEEE; 2017. p. 117-26.
Website: https://sites.google.com/site/onlinesemanticsegmentation/
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Semantic Segmentations:
floss_extract()
,
floss()
,
fluss_cac()
,
fluss_extract()
,
fluss_score()
,
fluss()
data <- mp_fluss_data$tilt_abp$data[1:1000] new_data <- mp_fluss_data$tilt_abp$data[1001:1010] w <- 10 mp <- tsmp(data, window_size = w, verbose = 0) data_window <- 1000 mp <- stompi_update(mp, new_data, data_window) mp <- floss_cac(mp, data_window)
data <- mp_fluss_data$tilt_abp$data[1:1000] new_data <- mp_fluss_data$tilt_abp$data[1001:1010] w <- 10 mp <- tsmp(data, window_size = w, verbose = 0) data_window <- 1000 mp <- stompi_update(mp, new_data, data_window) mp <- floss_cac(mp, data_window)
Extract candidate points of semantic changes.
floss_extract(.mpac, threshold = 1, exclusion_zone = NULL)
floss_extract(.mpac, threshold = 1, exclusion_zone = NULL)
.mpac |
a TSMP object of class |
threshold |
a |
exclusion_zone |
if a |
Returns the input .mp
object a new name floss
with the location of semantic
changes and floss_vals
with the normalized arc count value of the semantic change positions.
Gharghabi S, Ding Y, Yeh C-CM, Kamgar K, Ulanova L, Keogh E. Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. In: 2017 IEEE International Conference on Data Mining (ICDM). IEEE; 2017. p. 117-26.
Website: https://sites.google.com/site/onlinesemanticsegmentation/
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Semantic Segmentations:
floss_cac()
,
floss()
,
fluss_cac()
,
fluss_extract()
,
fluss_score()
,
fluss()
data <- mp_fluss_data$tilt_abp$data[1:1000] w <- 10 mp <- tsmp(data, window_size = w, verbose = 0) mp <- fluss_cac(mp) mp <- fluss_extract(mp, 2)
data <- mp_fluss_data$tilt_abp$data[1:1000] w <- 10 mp <- tsmp(data, window_size = w, verbose = 0) mp <- fluss_cac(mp) mp <- fluss_extract(mp, 2)
FLUSS is a Domain Agnostic Online Semantic Segmentation that uses the assumption that when few
arc are crossing a given index point, means that there is a high probability of semantic change.
This function is a wrap to fluss_cac()
and fluss_extract()
.
fluss(.mp, num_segments = 1, exclusion_zone = NULL)
fluss(.mp, num_segments = 1, exclusion_zone = NULL)
.mp |
a |
num_segments |
an |
exclusion_zone |
if a |
Returns the input .mp
object new names: cac
, corrected arc count and fluss
with
the location of semantic changes.
Gharghabi S, Ding Y, Yeh C-CM, Kamgar K, Ulanova L, Keogh E. Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. In: 2017 IEEE International Conference on Data Mining (ICDM). IEEE; 2017. p. 117-26.
Website: https://sites.google.com/site/onlinesemanticsegmentation/
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Semantic Segmentations:
floss_cac()
,
floss_extract()
,
floss()
,
fluss_cac()
,
fluss_extract()
,
fluss_score()
data <- mp_fluss_data$tilt_abp$data[1:1000] w <- 10 mp <- tsmp(data, window_size = w, verbose = 0) mp <- fluss(mp, 2)
data <- mp_fluss_data$tilt_abp$data[1:1000] w <- 10 mp <- tsmp(data, window_size = w, verbose = 0) mp <- fluss(mp, 2)
Computes the arc count with edge correction (CAC).
fluss_cac(.mp, exclusion_zone = NULL)
fluss_cac(.mp, exclusion_zone = NULL)
.mp |
a |
exclusion_zone |
if a |
Original paper suggest using the classic statistical-process-control heuristic to set a threshold where a semantic change may occur in CAC. This may be useful in real-time implementation as we don't know in advance the number of domain changes to look for. Please check original paper (1).
Returns the input .mp
object a new name cac
with the corrected arc count.
Gharghabi S, Ding Y, Yeh C-CM, Kamgar K, Ulanova L, Keogh E. Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. In: 2017 IEEE International Conference on Data Mining (ICDM). IEEE; 2017. p. 117-26.
Website: https://sites.google.com/site/onlinesemanticsegmentation/
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Semantic Segmentations:
floss_cac()
,
floss_extract()
,
floss()
,
fluss_extract()
,
fluss_score()
,
fluss()
data <- mp_fluss_data$tilt_abp$data[1:1000] w <- 10 mp <- tsmp(data, window_size = w, verbose = 0) mp <- fluss_cac(mp)
data <- mp_fluss_data$tilt_abp$data[1:1000] w <- 10 mp <- tsmp(data, window_size = w, verbose = 0) mp <- fluss_cac(mp)
Extract candidate points of semantic changes.
fluss_extract(.mpac, num_segments = 1, exclusion_zone = NULL)
fluss_extract(.mpac, num_segments = 1, exclusion_zone = NULL)
.mpac |
a TSMP object of class |
num_segments |
an |
exclusion_zone |
if a |
Returns the input .mp
object a new name fluss
with the location of semantic changes.
Gharghabi S, Ding Y, Yeh C-CM, Kamgar K, Ulanova L, Keogh E. Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. In: 2017 IEEE International Conference on Data Mining (ICDM). IEEE; 2017. p. 117-26.
Website: https://sites.google.com/site/onlinesemanticsegmentation/
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Semantic Segmentations:
floss_cac()
,
floss_extract()
,
floss()
,
fluss_cac()
,
fluss_score()
,
fluss()
data <- mp_fluss_data$tilt_abp$data[1:1000] w <- 10 mp <- tsmp(data, window_size = w, verbose = 0) mp <- fluss_cac(mp) mp <- fluss_extract(mp, 2)
data <- mp_fluss_data$tilt_abp$data[1:1000] w <- 10 mp <- tsmp(data, window_size = w, verbose = 0) mp <- fluss_cac(mp) mp <- fluss_extract(mp, 2)
FLUSS - Prediction score calculation
fluss_score(gtruth, extracted, data_size)
fluss_score(gtruth, extracted, data_size)
gtruth |
an |
extracted |
an |
data_size |
an |
Returns the score of predicted semantic transitions compared with the ground truth. Zero is the best, One is the worst.
Gharghabi S, Ding Y, Yeh C-CM, Kamgar K, Ulanova L, Keogh E. Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. In: 2017 IEEE International Conference on Data Mining (ICDM). IEEE; 2017. p. 117-26.
Website: https://sites.google.com/site/onlinesemanticsegmentation/
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Semantic Segmentations:
floss_cac()
,
floss_extract()
,
floss()
,
fluss_cac()
,
fluss_extract()
,
fluss()
data <- mp_fluss_data$tilt_abp$data[1:1000] w <- 10 truth <- c(945, 875) mp <- tsmp(data, window_size = w, verbose = 0) mp <- fluss_cac(mp) mp <- fluss_extract(mp, 2) score <- fluss_score(truth, mp$fluss, length(data))
data <- mp_fluss_data$tilt_abp$data[1:1000] w <- 10 truth <- c(945, 875) mp <- tsmp(data, window_size = w, verbose = 0) mp <- fluss_cac(mp) mp <- fluss_extract(mp, 2) score <- fluss_score(truth, mp$fluss, length(data))
Get the data included in a TSMP object, if any.
get_data(.mp)
get_data(.mp)
.mp |
a TSMP object. |
Returns the data as matrix
. If there is more than one series, returns a list
.
mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) get_data(mp)
mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) get_data(mp)
Mueen's Algorithm for Similarity Search is The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance and Correlation Coefficient.
mass_v3( query_window, data, window_size, data_size, data_mean, data_sd, query_mean, query_sd, k = NULL, ... )
mass_v3( query_window, data, window_size, data_size, data_mean, data_sd, query_mean, query_sd, k = NULL, ... )
query_window |
a |
data |
a |
window_size |
an |
data_size |
an |
data_mean |
precomputed data moving average. |
data_sd |
precomputed data moving standard deviation. |
query_mean |
precomputed query average. |
query_sd |
precomputed query standard deviation. |
k |
an |
... |
just a placeholder to catch unused parameters. |
This is a piecewise version of MASS that performs better when the size of the pieces are well aligned with the hardware.
Returns the distance_profile
for the given query and the last_product
for STOMP
algorithm.
Abdullah Mueen, Yan Zhu, Michael Yeh, Kaveh Kamgar, Krishnamurthy Viswanathan, Chetan Kumar Gupta and Eamonn Keogh (2015), The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance
Website: https://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html
mass_pre()
to precomputation of input values.
w <- mp_toy_data$sub_len ref_data <- mp_toy_data$data[, 1] query_data <- mp_toy_data$data[, 1] d_size <- length(ref_data) q_size <- length(query_data) pre <- tsmp:::mass_pre(ref_data, query_data, w) dp <- list() for (i in 1:(d_size - w + 1)) { dp[[i]] <- tsmp:::mass_v3( query_data[i:(i - 1 + w)], ref_data, pre$window_size, pre$data_size, pre$data_mean, pre$data_sd, pre$query_mean[i], pre$query_sd[i] ) }
w <- mp_toy_data$sub_len ref_data <- mp_toy_data$data[, 1] query_data <- mp_toy_data$data[, 1] d_size <- length(ref_data) q_size <- length(query_data) pre <- tsmp:::mass_pre(ref_data, query_data, w) dp <- list() for (i in 1:(d_size - w + 1)) { dp[[i]] <- tsmp:::mass_v3( query_data[i:(i - 1 + w)], ref_data, pre$window_size, pre$data_size, pre$data_mean, pre$data_sd, pre$query_mean[i], pre$query_sd[i] ) }
Get index of the minimum value from a matrix profile and its nearest neighbor
min_mp_idx(.mp, n_dim = NULL, valid = TRUE)
min_mp_idx(.mp, n_dim = NULL, valid = TRUE)
.mp |
a |
n_dim |
number of dimensions of the matrix profile |
valid |
check for valid numbers |
returns a matrix
with two columns: the minimum and the nearest neighbor
w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) min_val <- min_mp_idx(mp)
w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) min_val <- min_mp_idx(mp)
Search for Motifs
motifs( profile, exclusion_zone = profile$ez, k = 3L, neighbor_count = 10L, radius = 3 )
motifs( profile, exclusion_zone = profile$ez, k = 3L, neighbor_count = 10L, radius = 3 )
profile |
a |
exclusion_zone |
an |
k |
an |
neighbor_count |
an |
radius |
an |
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Main API:
analyze()
,
compute()
,
discords()
,
visualize()
Just a synthetic dataset for testing
motifs_discords_small
motifs_discords_small
A vector
with 875 observations
Contains two datasets used in FLUSS paper (1), first is TiltABP from (2), and second is WalkJogRun from PAMAP's dataset (3)
mp_fluss_data
mp_fluss_data
A list containing:
one column matrix with the dataset's data
a vector with the ground truth of semantic change according to provided dataset
window size used in original paper
https://sites.google.com/site/onlinesemanticsegmentation/
http://www.cs.ucr.edu/~eamonn/time_series_data/
Gharghabi S, Ding Y, Yeh C-CM, Kamgar K, Ulanova L, Keogh E. Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. In: 2017 IEEE International Conference on Data Mining (ICDM). IEEE; 2017. p. 117-26.
Heldt, T., Oefinger, M.B., Hoshiyama, M. and Mark, R.G., 2003, September. Circulatory response to passive and active changes in posture. In IEEE Computers in Cardiology, 2003 (pp. 263-266).
Reiss, A. and Stricker, D., 2012. Introducing a new benchmarked dataset for activity monitoring. In 16th International Symposium on Wearable Computers (ISWC), 2012, pages 108-109. IEEE, 2012.
Original data used in the Time Series Chain demo
mp_gait_data
mp_gait_data
A matrix
with 904 rows and 1 column with the Y data from an accelerometer
https://sites.google.com/site/timeserieschain/
Zhu Y, Imamura M, Nikovski D, Keogh E. Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining. Knowl Inf Syst. 2018 Jun 2;1-27.
This is the Meat dataset from UCR Archive modified for Salient discovery. The original data is mixed with Random Walks and the algorithm must pick only the originals.
mp_meat_data
mp_meat_data
original
is the original dataset with 60+60 observations mixed with 120 random walks:
240 time series with length of 448 each.
label of each time series, -666
means a random walk.
size of sliding window.
sub
is the original dataset embedded in random walks:
One time series with length of 107520.
label of each original data.
starting point where the original data was placed.
size of sliding window.
http://www.cs.ucr.edu/~eamonn/time_series_data/
Yeh CCM, Van Herle H, Keogh E. Matrix profile III: The matrix profile allows visualization of salient subsequences in massive time series. Proc - IEEE Int Conf Data Mining, ICDM. 2017;579-88.
Hu B, Rakthanmanon T, Hao Y, Evans S, Lonardi S, Keogh E. Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL. In: 2011 IEEE 11th International Conference on Data Mining. IEEE; 2011. p. 1086-91.
Website: https://sites.google.com/site/salientsubs/
A synthetic dataset base on TRACE dataset and used as Stress Test to STDS algorithm. The TRACE dataset used here is originally from (1), and the version distributed here is from (2)
mp_test_data
mp_test_data
A list of matrices with 215010 rows and 1 dimension:
training data
label for training data
test data
label for test data
https://sites.google.com/view/weaklylabeled
http://www.cs.ucr.edu/~eamonn/time_series_data/
Roverso, D., Multivariate temporal classification by windowed wavelet decomposition and recurrent neural networks, in 3rd ANS Int'l Topical Meeting on Nuclear Plant Instrumentation, Control and Human-Machine Interface, vol. 20, Washington, DC, USA, 2000.
Yeh C-CM, Kavantzas N, Keogh E. Matrix profile IV: Using Weakly Labeled Time Series to Predict Outcomes. Proc VLDB Endow. 2017 Aug 1;10(12):1802-12.
A synthetic dataset with embedded MOTIFs for multidimensional discovery
mp_toy_data
mp_toy_data
A list
with a matrix
with 550 rows and 3 dimensions and an int
:
data with embedded MOTIFs
size of sliding window
https://sites.google.com/view/mstamp/
Yeh CM, Kavantzas N, Keogh E. Matrix Profile VI : Meaningful Multidimensional Motif Discovery.
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
MPdist is a recently introduced distance measure which considers two time series to be similar if they share many similar subsequences, regardless of the order of matching subsequences. It was demonstrated in that MPdist is robust to spikes, warping, linear trends, dropouts, wandering baseline and missing values, issues that are common outside of benchmark datasets.
mpdist( ref_data, query_data, window_size, type = c("simple", "vector"), thr = 0.05 )
mpdist( ref_data, query_data, window_size, type = c("simple", "vector"), thr = 0.05 )
ref_data |
a |
query_data |
a |
window_size |
an int. Size of the sliding window. |
type |
the type of result. (Default is |
thr |
threshold for MPdist. (Default is |
MPdist returns the distance of two time series or a vector containing the distance
between all sliding windows. If argument type
is set to vector
, the vector is returned.
Returns the distance of two time series or a vector containing the distance between all sliding windows.
Gharghabi S, Imani S, Bagnall A, Darvishzadeh A, Keogh E. Matrix Profile XII: MPdist: A Novel Time Series Distance Measure to Allow Data Mining in More Challenging Scenarios. In: 2018 IEEE International Conference on Data Mining (ICDM). 2018.
Website: https://sites.google.com/site/mpdistinfo/
ref_data <- mp_toy_data$data[, 1] qe_data <- mp_toy_data$data[, 2] qd_data <- mp_toy_data$data[150:200, 1] w <- mp_toy_data$sub_len # distance between data of same size deq <- mpdist(ref_data, qe_data, w) # distance between data of different sizes ddiff <- mpdist(ref_data, qd_data, w) # distance vector between data of different sizes ddvect <- mpdist(ref_data, qd_data, w, type = "vector")
ref_data <- mp_toy_data$data[, 1] qe_data <- mp_toy_data$data[, 2] qd_data <- mp_toy_data$data[150:200, 1] w <- mp_toy_data$sub_len # distance between data of same size deq <- mpdist(ref_data, qe_data, w) # distance between data of different sizes ddiff <- mpdist(ref_data, qd_data, w) # distance vector between data of different sizes ddvect <- mpdist(ref_data, qd_data, w, type = "vector")
Fast implementation of MP and MPI for internal purposes, without FFT
mpx( data, window_size, query = NULL, idx = TRUE, dist = c("euclidean", "pearson"), n_workers = 1 )
mpx( data, window_size, query = NULL, idx = TRUE, dist = c("euclidean", "pearson"), n_workers = 1 )
data |
a |
window_size |
window size |
query |
query |
idx |
compute the profile indexes? |
dist |
distance measure, Euclidean or Pearson? |
n_workers |
threads for multi-threading |
Returns MP and MPI
mp <- mpx(mp_toy_data$data[1:200, 1], window_size = 30)
mp <- mpx(mp_toy_data$data[1:200, 1], window_size = 30)
Computes the Matrix Profile and Profile Index for Multivariate Time Series.
mstomp_par( data, window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), must_dim = NULL, exc_dim = NULL, n_workers = 2 ) mstomp( data, window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), must_dim = NULL, exc_dim = NULL )
mstomp_par( data, window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), must_dim = NULL, exc_dim = NULL, n_workers = 2 ) mstomp( data, window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), must_dim = NULL, exc_dim = NULL )
data |
a |
window_size |
an |
exclusion_zone |
a |
verbose |
an |
must_dim |
an |
exc_dim |
an |
n_workers |
an |
The Matrix Profile, has the potential to revolutionize time series data mining because of its generality, versatility, simplicity and scalability. In particular it has implications for time series motif discovery, time series joins, shapelet discovery (classification), density estimation, semantic segmentation, visualization, rule discovery, clustering etc. The MSTOMP computes the Matrix Profile and Profile Index for Multivariate Time Series that is meaningful for multidimensional MOTIF discovery. It uses the STOMP algorithm that is faster than STAMP but lacks its anytime property.
Although this functions handles Multivariate Time Series, it can also be used to handle
Univariate Time Series. verbose
changes how much information is printed by this function; 0
means nothing, 1
means text, 2
adds the progress bar, 3
adds the finish sound.
Returns a MultiMatrixProfile
object, a list
with the matrix profile mp
, profile index pi
left and right matrix profile lmp
, rmp
and profile index lpi
, rpi
, window size w
,
number of dimensions n_dim
, exclusion zone ez
, must dimensions must
and excluded dimensions exc
.
If the input has only one dimension, returns the same as stomp()
.
mstomp_par()
: Parallel version.
mstomp()
: Single thread version.
Yeh CM, Kavantzas N, Keogh E. Matrix Profile VI : Meaningful Multidimensional Motif Discovery.
Zhu Y, Imamura M, Nikovski D, Keogh E. Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining. Knowl Inf Syst. 2018 Jun 2;1-27.
Website: https://sites.google.com/view/mstamp/
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other matrix profile computations:
scrimp()
,
stamp_par()
,
stomp_par()
,
tsmp()
,
valmod()
# using all dimensions mp <- mstomp(mp_toy_data$data[1:150, ], 30, verbose = 0) #' # using threads mp <- mstomp_par(mp_toy_data$data[1:150, ], 30, verbose = 0) # force using dimensions 1 and 2 mp <- mstomp(mp_toy_data$data[1:200, ], 30, must_dim = c(1, 2)) # exclude dimensions 2 and 3 mp2 <- mstomp(mp_toy_data$data[1:200, ], 30, exc_dim = c(2, 3))
# using all dimensions mp <- mstomp(mp_toy_data$data[1:150, ], 30, verbose = 0) #' # using threads mp <- mstomp_par(mp_toy_data$data[1:150, ], 30, verbose = 0) # force using dimensions 1 and 2 mp <- mstomp(mp_toy_data$data[1:200, ], 30, must_dim = c(1, 2)) # exclude dimensions 2 and 3 mp2 <- mstomp(mp_toy_data$data[1:200, ], 30, exc_dim = c(2, 3))
Plot a TSMP object
## S3 method for class 'ArcCount' plot( x, data, type = c("data", "matrix"), exclusion_zone = NULL, edge_limit = NULL, threshold = stats::quantile(x$cac, 0.1), main = "Arcs Discover", xlab = "index", ylab = "", ... ) ## S3 method for class 'Valmod' plot( x, ylab = "distance", xlab = "index", main = "Valmod Matrix Profile", data = FALSE, ... ) ## S3 method for class 'MatrixProfile' plot( x, ylab = "distance", xlab = "index", main = "Unidimensional Matrix Profile", data = FALSE, ... ) ## S3 method for class 'MultiMatrixProfile' plot( x, ylab = "distance", xlab = "index", main = "Multidimensional Matrix Profile", ... ) ## S3 method for class 'SimpleMatrixProfile' plot( x, ylab = "distance", xlab = "index", main = "SiMPle Matrix Profile", data = FALSE, ... ) ## S3 method for class 'Fluss' plot( x, data, type = c("data", "matrix"), main = "Fast Low-cost Unipotent Semantic Segmentation", xlab = "index", ylab = "", ... ) ## S3 method for class 'Floss' plot( x, data, type = c("data", "matrix"), main = "Fast Low-cost Online Semantic Segmentation", xlab = "index", ylab = "", ... ) ## S3 method for class 'Chain' plot( x, data, type = c("data", "matrix"), main = "Chain Discover", xlab = "index", ylab = "", ... ) ## S3 method for class 'Discord' plot( x, data, type = c("data", "matrix"), ncol = 3, main = "Discord Discover", xlab = "index", ylab = "", ... ) ## S3 method for class 'Snippet' plot( x, data, ncol = 3, main = "Snippet Finder", xlab = "index", ylab = "", ... ) ## S3 method for class 'Motif' plot( x, data, type = c("data", "matrix"), ncol = 3, main = "MOTIF Discover", xlab = "index", ylab = "", ... ) ## S3 method for class 'MultiMotif' plot( x, data, type = c("data", "matrix"), ncol = 3, main = "Multidimensional MOTIF Discover", xlab = "index", ylab = "", ... ) ## S3 method for class 'Salient' plot(x, data, main = "Salient Subsections", xlab = "index", ylab = "", ...) ## S3 method for class 'PMP' plot( x, ylab = "distance", xlab = "index", main = "Unidimensional Matrix Profile", data = FALSE, ... )
## S3 method for class 'ArcCount' plot( x, data, type = c("data", "matrix"), exclusion_zone = NULL, edge_limit = NULL, threshold = stats::quantile(x$cac, 0.1), main = "Arcs Discover", xlab = "index", ylab = "", ... ) ## S3 method for class 'Valmod' plot( x, ylab = "distance", xlab = "index", main = "Valmod Matrix Profile", data = FALSE, ... ) ## S3 method for class 'MatrixProfile' plot( x, ylab = "distance", xlab = "index", main = "Unidimensional Matrix Profile", data = FALSE, ... ) ## S3 method for class 'MultiMatrixProfile' plot( x, ylab = "distance", xlab = "index", main = "Multidimensional Matrix Profile", ... ) ## S3 method for class 'SimpleMatrixProfile' plot( x, ylab = "distance", xlab = "index", main = "SiMPle Matrix Profile", data = FALSE, ... ) ## S3 method for class 'Fluss' plot( x, data, type = c("data", "matrix"), main = "Fast Low-cost Unipotent Semantic Segmentation", xlab = "index", ylab = "", ... ) ## S3 method for class 'Floss' plot( x, data, type = c("data", "matrix"), main = "Fast Low-cost Online Semantic Segmentation", xlab = "index", ylab = "", ... ) ## S3 method for class 'Chain' plot( x, data, type = c("data", "matrix"), main = "Chain Discover", xlab = "index", ylab = "", ... ) ## S3 method for class 'Discord' plot( x, data, type = c("data", "matrix"), ncol = 3, main = "Discord Discover", xlab = "index", ylab = "", ... ) ## S3 method for class 'Snippet' plot( x, data, ncol = 3, main = "Snippet Finder", xlab = "index", ylab = "", ... ) ## S3 method for class 'Motif' plot( x, data, type = c("data", "matrix"), ncol = 3, main = "MOTIF Discover", xlab = "index", ylab = "", ... ) ## S3 method for class 'MultiMotif' plot( x, data, type = c("data", "matrix"), ncol = 3, main = "Multidimensional MOTIF Discover", xlab = "index", ylab = "", ... ) ## S3 method for class 'Salient' plot(x, data, main = "Salient Subsections", xlab = "index", ylab = "", ...) ## S3 method for class 'PMP' plot( x, ylab = "distance", xlab = "index", main = "Unidimensional Matrix Profile", data = FALSE, ... )
x |
a Matrix Profile |
data |
the data used to build the Matrix Profile, if not embedded to it. |
type |
"data" or "matrix". Choose what will be plotted. |
exclusion_zone |
if a |
edge_limit |
if a |
threshold |
the maximum value to be used to plot. |
main |
a |
xlab |
a |
ylab |
a |
... |
|
ncol |
an |
None
mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) plot(mp)
mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) plot(mp)
Sometimes may be useful to see where is the nearest neighbor graphically. This is the reasoning behind, for example, FLUSS which uses the arc count to infer a semantic change, and SiMPle which infer that arcs connect similar segments of a music. See details for a deeper explanation how to use this function.
plot_arcs( pairs, alpha = NULL, quality = 30, lwd = 15, col = c("blue", "orange"), main = "Arc Plot", ylab = "", xlab = "Profile Index", xmin = NULL, xmax = NULL, ... )
plot_arcs( pairs, alpha = NULL, quality = 30, lwd = 15, col = c("blue", "orange"), main = "Arc Plot", ylab = "", xlab = "Profile Index", xmin = NULL, xmax = NULL, ... )
pairs |
a |
alpha |
a |
quality |
an |
lwd |
an |
col |
a |
main |
a |
ylab |
a |
xlab |
a |
xmin |
an |
xmax |
an |
... |
You have two options to use this function. First you can provide just the data, and the function
will try its best to retrieve the pairs for plotting. Second, you can skip the first parameters
and just provide the pairs
, which is a matrix
with two columns; the first is the starting
index, the second is the end index. Two colors are used to allow you to identify the direction of
the arc. If you use the rpi
or lpi
as input, you will see that these profile indexes have
just one direction.
exclusion_zone
is used to filter out small arcs that may be useless (e.g. you may be interested
in similarities that are far away). edge_limit
is used to filter out spurious arcs that are
used connect the beginning and the end of the profile (e.g. silent audio). threshold
is used to
filter indexes that have distant nearest neighbor (e.g. retrieve only the best motifs).
None
plot_arcs(pairs = matrix(c(5, 10, 1, 10, 20, 5), ncol = 2, byrow = TRUE))
plot_arcs(pairs = matrix(c(5, 10, 1, 10, 20, 5), ncol = 2, byrow = TRUE))
Computes the Pan-Matrix Profile (PMP) for the given time series.
pmp( data, window_sizes = seq.int(from = 10, to = length(data)/2, length.out = 20), plot = FALSE, pmp_obj = NULL, n_workers = 1, verbose = getOption("tsmp.verbose", 2) )
pmp( data, window_sizes = seq.int(from = 10, to = length(data)/2, length.out = 20), plot = FALSE, pmp_obj = NULL, n_workers = 1, verbose = getOption("tsmp.verbose", 2) )
data |
a |
window_sizes |
a |
plot |
a |
pmp_obj |
a |
n_workers |
an |
verbose |
an |
The work closest in spirit to ours is VALMOD. The idea of VALMOD is to compute the MP for the shortest length of interest, then use the information gleaned from it to guide a search through longer subsequence lengths, exploiting lower bounds to prune off some calculations. This idea works well for the first few of the longer subsequence lengths, but the lower bounds progressively weaken, making the pruning ineffective. Thus, in the five case studies they presented, the mean value of U/L was just 1.24. In contrast, consider that our termite example in Fig. 15 has a U/L ratio of 240, more than two orders of magnitude larger. Thus, VALMOD is perhaps best seen as finding motifs with some tolerance for a slightly (~25%) too short user-specified query length, rather than a true "motif-of-all-lengths" algorithm. Also note that apart from the shortest length, VALMOD only gives some information for the other lengths, unlike pmp, which contains exact distances for all subsequences of all lengths.
When just the data
is provided, the exploration will be done using the default window_sizes
that is a sequence
of 20 values between 10 and the half data size and the resulting object will have an upper_bound
equals to Inf
.
If an object is provided by the argument pmp_obj
, this function will add more information to the resulting object,
never changing the values already computed.
verbose
changes how much information is printed by this function; 0
means nothing, 1
means text, 2
adds the progress bar, 3
adds the finish sound.
Talk about upper bound and window sizes
upper_window will be set to Inf on new objects 1.1. upper_window will also be used for plot, and for discovery, it must not remove any existing data from the object
window_sizes is used for plot, it must not remove any mp inside the object 2.1. window_sizes tells the function what mp are stored, it may be updated with as.numeric(names(pmp))
the functions must be capable to handle the data without need to sort by window_size, but sort may be useful later(?)
Returns a PMP
object.
# Just compute pan <- pmp(mp_gait_data) # Compute the upper bound, than add new profiles pan <- pmp_upper_bound(mp_gait_data) pan <- pmp(mp_gait_data, pmp_obj = pan)
# Just compute pan <- pmp(mp_gait_data) # Compute the upper bound, than add new profiles pan <- pmp_upper_bound(mp_gait_data) pan <- pmp(mp_gait_data, pmp_obj = pan)
Finds the upper bound for Pan Matrix Profile calculation.
pmp_upper_bound( data, threshold = getOption("tsmp.pmp_ub", 0.95), refine_stepsize = getOption("tsmp.pmp_refine", 0.25), return_pmp = TRUE, n_workers = 1, verbose = getOption("tsmp.verbose", 2) )
pmp_upper_bound( data, threshold = getOption("tsmp.pmp_ub", 0.95), refine_stepsize = getOption("tsmp.pmp_refine", 0.25), return_pmp = TRUE, n_workers = 1, verbose = getOption("tsmp.verbose", 2) )
data |
a |
threshold |
a |
refine_stepsize |
a |
return_pmp |
a |
n_workers |
an |
verbose |
verbose an |
The Pan Matrix Profile may not give any further information beyond a certain window size. This function starts
computing the matrix profile for the window size of 8 and doubles it until the minimum correlation value found is
less than the threshold
. After that, it begins to refine the upper bound using the refine_stepsize
values, until
the threshold
value is hit.
verbose
changes how much information is printed by this function; 0
means nothing, 1
means text, 2
adds the progress bar, 3
adds the finish sound.
Returns a PMP
object with computed data, or just the upper bound value if return_pmp
is set to FALSE
.
Yet to be announced
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
# return the object pan_matrix <- pmp_upper_bound(mp_gait_data) # just the upper bound pan_ub <- pmp_upper_bound(mp_gait_data, return_pmp = FALSE)
# return the object pan_matrix <- pmp_upper_bound(mp_gait_data) # just the upper bound pan_ub <- pmp_upper_bound(mp_gait_data, return_pmp = FALSE)
Read TSMP object from JSON file.
read(x, ...)
read(x, ...)
x |
a |
... |
other arguments to be passed forward. |
result <- compute(mp_toy_data$data[, 1], 80) tempfile <- file.path(tempdir(), "output.json") write(result, file = tempfile) result <- read(tempfile)
result <- compute(mp_toy_data$data[, 1], 80) tempfile <- file.path(tempdir(), "output.json") write(result, file = tempfile) result <- read(tempfile)
TSMP
class from an objectRemove a TSMP
class from an object
remove_class(x, class)
remove_class(x, class)
x |
a |
class |
|
the object without the class
w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) mp <- find_chains(mp) # Remove the "Chain" class information mp <- remove_class(mp, "Chain")
w <- 50 data <- mp_gait_data mp <- tsmp(data, window_size = w, exclusion_zone = 1 / 4, verbose = 0) mp <- find_chains(mp) # Remove the "Chain" class information mp <- remove_class(mp, "Chain")
Convert salient sequences into MDS space
salient_mds(.mp, data, bit_idx = 1)
salient_mds(.mp, data, bit_idx = 1)
.mp |
a Matrix Profile object. |
data |
the data used to build the Matrix Profile, if not embedded. |
bit_idx |
an |
Returns X,Y values for plotting
Yeh CCM, Van Herle H, Keogh E. Matrix profile III: The matrix profile allows visualization of salient subsequences in massive time series. Proc - IEEE Int Conf Data Mining, ICDM. 2017;579-88.
Hu B, Rakthanmanon T, Hao Y, Evans S, Lonardi S, Keogh E. Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL. In: 2011 IEEE 11th International Conference on Data Mining. IEEE; 2011. p. 1086-91.
Website: https://sites.google.com/site/salientsubs/
# toy example data <- mp_toy_data$data[, 1] mp <- tsmp(data, window_size = 30, verbose = 0) mps <- salient_subsequences(mp, verbose = 0) mds_data <- salient_mds(mps) plot(mds_data, main = "Multi dimensional scale")
# toy example data <- mp_toy_data$data[, 1] mp <- tsmp(data, window_size = 30, verbose = 0) mps <- salient_subsequences(mp, verbose = 0) mds_data <- salient_mds(mps) plot(mds_data, main = "Multi dimensional scale")
This score function is useful for testing several values of n_bits
for MDL discretization and
checking against a known set of indexes. This increase the probability of better results on
relevant subsequence extraction.
salient_score(.mp, gtruth, verbose = getOption("tsmp.verbose", 2))
salient_score(.mp, gtruth, verbose = getOption("tsmp.verbose", 2))
.mp |
a Matrix Profile object. |
gtruth |
a |
verbose |
an |
Returns a list
with f_score
, precision
, recall
and bits
used in the algorithm.
Yeh CCM, Van Herle H, Keogh E. Matrix profile III: The matrix profile allows visualization of salient subsequences in massive time series. Proc - IEEE Int Conf Data Mining, ICDM. 2017;579-88.
Hu B, Rakthanmanon T, Hao Y, Evans S, Lonardi S, Keogh E. Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL. In: 2011 IEEE 11th International Conference on Data Mining. IEEE; 2011. p. 1086-91.
Website: https://sites.google.com/site/salientsubs/
# toy example data <- mp_toy_data$data[, 1] mp <- tsmp(data, window_size = 30, verbose = 0) mps <- salient_subsequences(mp, n_bits = c(4, 6, 8), verbose = 0) label_idx <- seq(2, 500, by = 110) # fake data salient_score(mps, label_idx, verbose = 0)
# toy example data <- mp_toy_data$data[, 1] mp <- tsmp(data, window_size = 30, verbose = 0) mps <- salient_subsequences(mp, n_bits = c(4, 6, 8), verbose = 0) label_idx <- seq(2, 500, by = 110) # fake data salient_score(mps, label_idx, verbose = 0)
In order to allow a meaningful visualization in Multi-Dimensional Space (MDS), this function retrieves the most relevant subsequences using Minimal Description Length (MDL) framework.
salient_subsequences( .mp, data, n_bits = 8, n_cand = 10, exclusion_zone = NULL, verbose = getOption("tsmp.verbose", 2) )
salient_subsequences( .mp, data, n_bits = 8, n_cand = 10, exclusion_zone = NULL, verbose = getOption("tsmp.verbose", 2) )
.mp |
a TSMP object of class |
data |
the data used to build the Matrix Profile, if not embedded. |
n_bits |
an |
n_cand |
an |
exclusion_zone |
if a |
verbose |
an |
verbose
changes how much information is printed by this function; 0
means nothing,
1
means text, 2
adds the progress bar, 3
adds the finish sound.
Returns the input .mp
object with a new name salient
. It contains: indexes
, a vector
with the starting position of each subsequence, idx_bit_size
, a vector
with the associated
bitsize for each iteration and bits
the value used as input on n_bits
.
Yeh CCM, Van Herle H, Keogh E. Matrix profile III: The matrix profile allows visualization of salient subsequences in massive time series. Proc - IEEE Int Conf Data Mining, ICDM. 2017;579-88.
Hu B, Rakthanmanon T, Hao Y, Evans S, Lonardi S, Keogh E. Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL. In: 2011 IEEE 11th International Conference on Data Mining. IEEE; 2011. p. 1086-91.
Website: https://sites.google.com/site/salientsubs/
# toy example data <- mp_toy_data$data[, 1] mp <- tsmp(data, window_size = 30, verbose = 0) mps <- salient_subsequences(mp, data, verbose = 0) # full example data <- mp_meat_data$sub$data w <- mp_meat_data$sub$sub_len mp <- tsmp(data, window_size = w, verbose = 2, n_workers = 2) mps <- salient_subsequences(mp, data, n_bits = c(4, 6, 8), verbose = 2)
# toy example data <- mp_toy_data$data[, 1] mp <- tsmp(data, window_size = 30, verbose = 0) mps <- salient_subsequences(mp, data, verbose = 0) # full example data <- mp_meat_data$sub$data w <- mp_meat_data$sub$sub_len mp <- tsmp(data, window_size = w, verbose = 2, n_workers = 2) mps <- salient_subsequences(mp, data, n_bits = c(4, 6, 8), verbose = 2)
Computes the best so far Matrix Profile and Profile Index for Univariate Time Series. DISCLAIMER: This algorithm still in development by its authors. Join similarity, RMP and LMP not implemented yet.
scrimp( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), s_size = Inf, pre_scrimp = 1/4, pre_only = FALSE )
scrimp( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), s_size = Inf, pre_scrimp = 1/4, pre_only = FALSE )
... |
a |
window_size |
an |
exclusion_zone |
a |
verbose |
an |
s_size |
a |
pre_scrimp |
a |
pre_only |
a |
The Matrix Profile, has the potential to revolutionize time series data mining because of its
generality, versatility, simplicity and scalability. In particular it has implications for time
series motif discovery, time series joins, shapelet discovery (classification), density
estimation, semantic segmentation, visualization, rule discovery, clustering etc. The anytime
SCRIMP computes the Matrix Profile and Profile Index in such manner that it can be stopped before
its complete calculation and return the best so far results allowing ultra-fast approximate
solutions. verbose
changes how much information is printed by this function; 0
means nothing,
1
means text, 2
adds the progress bar, 3
adds the finish sound. exclusion_zone
is used to
avoid trivial matches.
Returns a MatrixProfile
object, a list
with the matrix profile mp
, profile index pi
left and right matrix profile lmp
, rmp
and profile index lpi
, rpi
, window size w
and
exclusion zone ez
.
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other matrix profile computations:
mstomp_par()
,
stamp_par()
,
stomp_par()
,
tsmp()
,
valmod()
mp <- scrimp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) ref_data <- mp_toy_data$data[, 1] query_data <- mp_toy_data$data[, 2] # self similarity mp <- scrimp(ref_data, window_size = 30, s_size = round(nrow(ref_data) * 0.1)) # join similarity mp <- scrimp(ref_data, query_data, window_size = 30, s_size = round(nrow(query_data) * 0.1))
mp <- scrimp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) ref_data <- mp_toy_data$data[, 1] query_data <- mp_toy_data$data[, 2] # self similarity mp <- scrimp(ref_data, window_size = 30, s_size = round(nrow(ref_data) * 0.1)) # join similarity mp <- scrimp(ref_data, query_data, window_size = 30, s_size = round(nrow(query_data) * 0.1))
This function trains a model that uses a dictionary to predict state changes. Differently from
fluss()
, it doesn't look for semantic changes (that may be several), but for binary states like
"on" or "off". Think for example that a human annotator is pressing a switch any time he thinks
that the recorded data is relevant, and releases the switch when he thinks the data is noise. This
algorithm will learn the switching points (even better) and try to predict using new data.
sdts_predict(model, data, window_size)
sdts_predict(model, data, window_size)
model |
a model created by SDTS training function |
data |
a |
window_size |
an |
Returns a vector
of logical
with predicted annotations.
Yeh C-CM, Kavantzas N, Keogh E. Matrix profile IV: Using Weakly Labeled Time Series to Predict Outcomes. Proc VLDB Endow. 2017 Aug 1;10(12):1802-12.
Website: https://sites.google.com/view/weaklylabeled
Other Scalable Dictionaries:
sdts_score()
,
sdts_train()
# This is a fast toy example and results are useless. For a complete result, run the code inside #' Not run' section below. w <- c(110, 220) subs <- 11000:20000 tr_data <- mp_test_data$train$data[subs] tr_label <- mp_test_data$train$label[subs] te_data <- mp_test_data$test$data[subs] te_label <- mp_test_data$test$label[subs] model <- sdts_train(tr_data, tr_label, w, verbose = 0) predict <- sdts_predict(model, te_data, round(mean(w))) sdts_score(predict, te_label, 1) windows <- c(110, 220, 330) model <- sdts_train(mp_test_data$train$data, mp_test_data$train$label, windows, verbose = 0) predict <- sdts_predict(model, mp_test_data$test$data, round(mean(windows))) sdts_score(predict, mp_test_data$test$label, 1)
# This is a fast toy example and results are useless. For a complete result, run the code inside #' Not run' section below. w <- c(110, 220) subs <- 11000:20000 tr_data <- mp_test_data$train$data[subs] tr_label <- mp_test_data$train$label[subs] te_data <- mp_test_data$test$data[subs] te_label <- mp_test_data$test$label[subs] model <- sdts_train(tr_data, tr_label, w, verbose = 0) predict <- sdts_predict(model, te_data, round(mean(w))) sdts_score(predict, te_label, 1) windows <- c(110, 220, 330) model <- sdts_train(mp_test_data$train$data, mp_test_data$train$label, windows, verbose = 0) predict <- sdts_predict(model, mp_test_data$test$data, round(mean(windows))) sdts_score(predict, mp_test_data$test$label, 1)
Computes the F-Score of a SDTS prediction.
sdts_score(pred, gtruth, beta = 1)
sdts_score(pred, gtruth, beta = 1)
pred |
a |
gtruth |
a |
beta |
a |
beta
is used to balance F-score towards recall (>1
) or precision (<1
).
Returns a list
with f_score
, precision
and recall
.
Yeh C-CM, Kavantzas N, Keogh E. Matrix profile IV: Using Weakly Labeled Time Series to Predict Outcomes. Proc VLDB Endow. 2017 Aug 1;10(12):1802-12.
Website: https://sites.google.com/view/weaklylabeled
Other Scalable Dictionaries:
sdts_predict()
,
sdts_train()
# This is a fast toy example and results are useless. For a complete result, run the code inside #' Not run' section below. w <- c(110, 220) subs <- 11000:20000 tr_data <- mp_test_data$train$data[subs] tr_label <- mp_test_data$train$label[subs] te_data <- mp_test_data$test$data[subs] te_label <- mp_test_data$test$label[subs] model <- sdts_train(tr_data, tr_label, w, verbose = 0) predict <- sdts_predict(model, te_data, round(mean(w))) sdts_score(predict, te_label, 1) windows <- c(110, 220, 330) model <- sdts_train(mp_test_data$train$data, mp_test_data$train$label, windows) predict <- sdts_predict(model, mp_test_data$test$data, round(mean(windows))) sdts_score(predict, mp_test_data$test$label, 1)
# This is a fast toy example and results are useless. For a complete result, run the code inside #' Not run' section below. w <- c(110, 220) subs <- 11000:20000 tr_data <- mp_test_data$train$data[subs] tr_label <- mp_test_data$train$label[subs] te_data <- mp_test_data$test$data[subs] te_label <- mp_test_data$test$label[subs] model <- sdts_train(tr_data, tr_label, w, verbose = 0) predict <- sdts_predict(model, te_data, round(mean(w))) sdts_score(predict, te_label, 1) windows <- c(110, 220, 330) model <- sdts_train(mp_test_data$train$data, mp_test_data$train$label, windows) predict <- sdts_predict(model, mp_test_data$test$data, round(mean(windows))) sdts_score(predict, mp_test_data$test$label, 1)
This function trains a model that uses a dictionary to predict state changes. Differently from
fluss()
, it doesn't look for semantic changes (that may be several), but for binary states like
"on" or "off". Think for example that a human annotator is pressing a switch any time he thinks
that the recorded data is relevant, and releases the switch when he thinks the data is noise. This
algorithm will learn the switching points (even better) and try to predict using new data.
sdts_train( data, label, window_size, beta = 1, pat_max = Inf, parallel = FALSE, verbose = getOption("tsmp.verbose", 2) )
sdts_train( data, label, window_size, beta = 1, pat_max = Inf, parallel = FALSE, verbose = getOption("tsmp.verbose", 2) )
data |
a |
label |
a |
window_size |
an |
beta |
a |
pat_max |
an |
parallel |
a |
verbose |
an |
beta
is used to balance F-score towards recall (>1
) or precision (<1
). verbose
changes
how much information is printed by this function; 0
means nothing, 1
means text, 2
adds the
progress bar, 3
adds the finish sound.
Returns a list with the learned dictionary score
(estimated score), score_hist
(history of scores), pattern
(shape features), thold
(threshold values).
Yeh C-CM, Kavantzas N, Keogh E. Matrix profile IV: Using Weakly Labeled Time Series to Predict Outcomes. Proc VLDB Endow. 2017 Aug 1;10(12):1802-12.
Website: https://sites.google.com/view/weaklylabeled
Other Scalable Dictionaries:
sdts_predict()
,
sdts_score()
# This is a fast toy example and results are useless. For a complete result, run the code inside #' Not run' section below. w <- c(110, 220) subs <- 11000:20000 tr_data <- mp_test_data$train$data[subs] tr_label <- mp_test_data$train$label[subs] te_data <- mp_test_data$test$data[subs] te_label <- mp_test_data$test$label[subs] model <- sdts_train(tr_data, tr_label, w, verbose = 0) predict <- sdts_predict(model, te_data, round(mean(w))) sdts_score(predict, te_label, 1) windows <- c(110, 220, 330) model <- sdts_train(mp_test_data$train$data, mp_test_data$train$label, windows) predict <- sdts_predict(model, mp_test_data$test$data, round(mean(windows))) sdts_score(predict, mp_test_data$test$label, 1)
# This is a fast toy example and results are useless. For a complete result, run the code inside #' Not run' section below. w <- c(110, 220) subs <- 11000:20000 tr_data <- mp_test_data$train$data[subs] tr_label <- mp_test_data$train$label[subs] te_data <- mp_test_data$test$data[subs] te_label <- mp_test_data$test$label[subs] model <- sdts_train(tr_data, tr_label, w, verbose = 0) predict <- sdts_predict(model, te_data, round(mean(w))) sdts_score(predict, te_label, 1) windows <- c(110, 220, 330) model <- sdts_train(mp_test_data$train$data, mp_test_data$train$label, windows) predict <- sdts_predict(model, mp_test_data$test$data, round(mean(windows))) sdts_score(predict, mp_test_data$test$label, 1)
This may be useful if you want to include the data lately or remove the included data (set as NULL
).
set_data(.mp, data)
set_data(.mp, data)
.mp |
a TSMP object. |
data |
a |
Returns silently the original TSMP object with changed data.
mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) mp <- set_data(mp, NULL)
mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) mp <- set_data(mp, NULL)
Compute the join similarity for Sound data
simple_fast( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2) )
simple_fast( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2) )
... |
a |
window_size |
an |
exclusion_zone |
a |
verbose |
an |
verbose
changes how much information is printed by this function; 0
means nothing, 1
means
text, 2
adds the progress bar, 3
adds the finish sound.
Returns a SimpleMatrixProfile
object, a list
with the matrix profile mp
, profile index pi
,
number of dimensions n_dim
, window size w
and exclusion zone ez
.
Silva D, Yeh C, Batista G, Keogh E. Simple: Assessing Music Similarity Using Subsequences Joins. Proc 17th ISMIR Conf. 2016;23-30.
Silva DF, Yeh C-CM, Zhu Y, Batista G, Keogh E. Fast Similarity Matrix Profile for Music Analysis and Exploration. IEEE Trans Multimed. 2018;14(8):1-1.
Website: https://sites.google.com/view/simple-fast
Website: https://sites.google.com/site/ismir2016simple/home
w <- 30 data <- mp_toy_data$data # 3 dimensions matrix result <- simple_fast(data, window_size = w, verbose = 0)
w <- 30 data <- mp_toy_data$data # 3 dimensions matrix result <- simple_fast(data, window_size = w, verbose = 0)
Computes the best so far Matrix Profile and Profile Index for Univariate Time Series.
stamp_par( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), s_size = Inf, n_workers = 2, weight = NULL ) stamp( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), s_size = Inf, weight = NULL )
stamp_par( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), s_size = Inf, n_workers = 2, weight = NULL ) stamp( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), s_size = Inf, weight = NULL )
... |
a |
window_size |
an |
exclusion_zone |
a |
verbose |
an |
s_size |
a |
n_workers |
an |
weight |
a |
The Matrix Profile, has the potential to revolutionize time series data mining because of its
generality, versatility, simplicity and scalability. In particular it has implications for time
series motif discovery, time series joins, shapelet discovery (classification), density
estimation, semantic segmentation, visualization, rule discovery, clustering etc. The anytime
STAMP computes the Matrix Profile and Profile Index in such manner that it can be stopped before
its complete calculation and return the best so far results allowing ultra-fast approximate
solutions. verbose
changes how much information is printed by this function; 0
means nothing,
1
means text, 2
adds the progress bar, 3
adds the finish sound. exclusion_zone
is used to
avoid trivial matches; if a query data is provided (join similarity), this parameter is ignored.
Returns a MatrixProfile
object, a list
with the matrix profile mp
, profile index pi
left and right matrix profile lmp
, rmp
and profile index lpi
, rpi
, window size w
and
exclusion zone ez
.
stamp_par()
: Parallel version.
stamp()
: Single thread version.
Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, et al. Matrix profile I: All pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets. Proc - IEEE Int Conf Data Mining, ICDM. 2017;1317-22.
Zhu Y, Imamura M, Nikovski D, Keogh E. Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining. Knowl Inf Syst. 2018 Jun 2;1-27.
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other matrix profile computations:
mstomp_par()
,
scrimp()
,
stomp_par()
,
tsmp()
,
valmod()
mp <- stamp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) #' # using threads mp <- stamp_par(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) ref_data <- mp_toy_data$data[, 1] query_data <- mp_toy_data$data[, 2] # self similarity mp <- stamp(ref_data, window_size = 30, s_size = round(nrow(ref_data) * 0.1)) # join similarity mp <- stamp(ref_data, query_data, window_size = 30, s_size = round(nrow(query_data) * 0.1))
mp <- stamp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) #' # using threads mp <- stamp_par(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) ref_data <- mp_toy_data$data[, 1] query_data <- mp_toy_data$data[, 2] # self similarity mp <- stamp(ref_data, window_size = 30, s_size = round(nrow(ref_data) * 0.1)) # join similarity mp <- stamp(ref_data, query_data, window_size = 30, s_size = round(nrow(query_data) * 0.1))
Computes the Matrix Profile and Profile Index for Univariate Time Series.
stomp_par( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), n_workers = 2 ) stomp( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2) )
stomp_par( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2), n_workers = 2 ) stomp( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), verbose = getOption("tsmp.verbose", 2) )
... |
a |
window_size |
an |
exclusion_zone |
a |
verbose |
an |
n_workers |
an |
The Matrix Profile, has the potential to revolutionize time series data mining because of its
generality, versatility, simplicity and scalability. In particular it has implications for time
series motif discovery, time series joins, shapelet discovery (classification), density
estimation, semantic segmentation, visualization, rule discovery, clustering etc. verbose
changes how much information is printed by this function; 0
means nothing, 1
means text, 2
adds the progress bar, 3
adds the finish sound. exclusion_zone
is used to avoid trivial
matches; if a query data is provided (join similarity), this parameter is ignored.
Returns a MatrixProfile
object, a list
with the matrix profile mp
, profile index pi
left and right matrix profile lmp
, rmp
and profile index lpi
, rpi
, window size w
and
exclusion zone ez
.
stomp_par()
: Parallel version.
stomp()
: Single thread version.
Zhu Y, Zimmerman Z, Senobari NS, Yeh CM, Funning G. Matrix Profile II : Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins. Icdm. 2016 Jan 22;54(1):739-48.
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other matrix profile computations:
mstomp_par()
,
scrimp()
,
stamp_par()
,
tsmp()
,
valmod()
mp <- stomp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) #' # using threads mp <- stomp_par(mp_toy_data$data[1:400, 1], window_size = 30, verbose = 0) ref_data <- mp_toy_data$data[, 1] query_data <- mp_toy_data$data[, 2] # self similarity mp <- stomp(ref_data, window_size = 30) # join similarity mp2 <- stomp(ref_data, query_data, window_size = 30)
mp <- stomp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) #' # using threads mp <- stomp_par(mp_toy_data$data[1:400, 1], window_size = 30, verbose = 0) ref_data <- mp_toy_data$data[, 1] query_data <- mp_toy_data$data[, 2] # self similarity mp <- stomp(ref_data, window_size = 30) # join similarity mp2 <- stomp(ref_data, query_data, window_size = 30)
Real-time STOMP algorithm
stompi_update(.mp, new_data, history_size = FALSE)
stompi_update(.mp, new_data, history_size = FALSE)
.mp |
a TSMP object of class |
new_data |
new data to append to original data. |
history_size |
an |
Returns the input .mp
updated with the new information.
mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) mpi <- stompi_update(mp, mp_toy_data$data[201:300, 1]) mp <- tsmp(mp_toy_data$data[1:300, 1], window_size = 30, verbose = 0) all.equal(mp, mpi, check.attributes = FALSE)
mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) mpi <- stompi_update(mp, mp_toy_data$data[201:300, 1]) mp <- tsmp(mp_toy_data$data[1:300, 1], window_size = 30, verbose = 0) all.equal(mp, mpi, check.attributes = FALSE)
This is a wrap function that makes easy to use all available algorithms to compute the Matrix Profile and Profile Index for multiple purposes.
tsmp( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), mode = c("stomp", "stamp", "simple", "mstomp", "scrimp", "valmod", "pmp"), verbose = getOption("tsmp.verbose", 2), n_workers = 1, s_size = Inf, must_dim = NULL, exc_dim = NULL, heap_size = 50, paa = 1, .keep_data = TRUE )
tsmp( ..., window_size, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), mode = c("stomp", "stamp", "simple", "mstomp", "scrimp", "valmod", "pmp"), verbose = getOption("tsmp.verbose", 2), n_workers = 1, s_size = Inf, must_dim = NULL, exc_dim = NULL, heap_size = 50, paa = 1, .keep_data = TRUE )
... |
a |
window_size |
an |
exclusion_zone |
a |
mode |
the algorithm that will be used to compute the matrix profile. (Default is |
verbose |
an |
n_workers |
an |
s_size |
a |
must_dim |
an |
exc_dim |
an |
heap_size |
an |
paa |
an |
.keep_data |
a |
The Matrix Profile, has the potential to revolutionize time series data mining because of its generality, versatility, simplicity and scalability. In particular it has implications for time series motif discovery, time series joins, shapelet discovery (classification), density estimation, semantic segmentation, visualization, rule discovery, clustering etc.
The first algorithm invented was the stamp()
that using mass()
as an ultra-fast Algorithm
for Similarity Search allowed to compute the Matrix Profile in reasonable time. One of its main
feature was its Anytime property which using a randomized approach could return a "best-so-far"
matrix that could give us the correct answer (using for example 1/10 of all iterations) almost
every time.
The next algorithm was stomp()
that currently is the most used. Researchers noticed that the
dot products do not need to be recalculated from scratch for each subsequence. Instead, we can
reuse the values calculated for the first subsequence to make a faster calculation in the next
iterations. The idea is to make use of the intersections between the required products in
consecutive iterations. This approach reduced the time to compute the Matrix Profile to about
3% compared to stamp()
, but on the other hand, we lost the Anytime property.
Currently there is a new algorithm that I'll not explain further here. It is called scrimp()
,
and is as fast as stomp()
, and have the Anytime property. This algorithm is implemented in
this package, but still waiting for an article publication.
Further, there is the mstomp()
that computes a multidimensional Matrix Profile that allows to
meaningful MOTIF discovery in Multivariate Time Series. And simple_fast()
that also handles
Multivariate Time Series, but focused in Music Analysis and Exploration.
The valmod()
uses a new pruning algorithm allowing a similarity search with a range of sliding
window sizes.
The pmp()
is a new concept that creates several profiles from a range of windows.
Some parameters are global across the algorithms:
One or two time series (except for mstomp()
). The second time series can be smaller than the first.
The sliding window.
Is used to avoid trivial matches; if a query data is provided (join similarity), this parameter is ignored.
Changes how much information is printed by this function; 0
means nothing,
1
means text, 2
adds the progress bar, 3
adds the finish sound.
number of threads for parallel computing (except simple_fast
, scrimp
and valmod
).
If the value is 2 or more, the '_par' version of the algorithm will be used.
s_size
is used only in Anytime algorithms: stamp()
and scrimp()
.
must_dim
and exc_dim
are used only in mstomp()
.
heap_size
is used only for valmod()
mode
can be any of the following: stomp
, stamp
, simple
, mstomp
, scrimp
, valmod
, pmp
.
Returns the matrix profile mp
and profile index pi
. It also returns the left and
right matrix profile lmp
, rmp
and profile index lpi
, rpi
that may be used to detect
Time Series Chains. mstomp()
returns a multidimensional Matrix Profile.
Silva D, Yeh C, Batista G, Keogh E. Simple: Assessing Music Similarity Using Subsequences Joins. Proc 17th ISMIR Conf. 2016;23-30.
Silva DF, Yeh C-CM, Zhu Y, Batista G, Keogh E. Fast Similarity Matrix Profile for Music Analysis and Exploration. IEEE Trans Multimed. 2018;14(8):1-1.
Yeh CM, Kavantzas N, Keogh E. Matrix Profile VI : Meaningful Multidimensional Motif Discovery.
Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, et al. Matrix profile I: All pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets. Proc - IEEE Int Conf Data Mining, ICDM. 2017;1317-22.
Zhu Y, Imamura M, Nikovski D, Keogh E. Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining. Knowl Inf Syst. 2018 Jun 2;1-27.
Zhu Y, Zimmerman Z, Senobari NS, Yeh CM, Funning G. Matrix Profile II : Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins. Icdm. 2016 Jan 22;54(1):739-48.
Website: https://sites.google.com/view/simple-fast
Website: https://sites.google.com/site/ismir2016simple/home
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other matrix profile computations:
mstomp_par()
,
scrimp()
,
stamp_par()
,
stomp_par()
,
valmod()
# default with [stomp()] mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) # Anytime STAMP mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, mode = "stamp", s_size = 50, verbose = 0) # [mstomp()] mp <- tsmp(mp_toy_data$data[1:200, ], window_size = 30, mode = "mstomp", verbose = 0) # [simple_fast()] mp <- tsmp(mp_toy_data$data[1:200, ], window_size = 30, mode = "simple", verbose = 0) # parallel with [stomp_par()] mp <- tsmp(mp_test_data$train$data[1:1000, 1], window_size = 30, n_workers = 2, verbose = 0)
# default with [stomp()] mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, verbose = 0) # Anytime STAMP mp <- tsmp(mp_toy_data$data[1:200, 1], window_size = 30, mode = "stamp", s_size = 50, verbose = 0) # [mstomp()] mp <- tsmp(mp_toy_data$data[1:200, ], window_size = 30, mode = "mstomp", verbose = 0) # [simple_fast()] mp <- tsmp(mp_toy_data$data[1:200, ], window_size = 30, mode = "simple", verbose = 0) # parallel with [stomp_par()] mp <- tsmp(mp_test_data$train$data[1:1000, 1], window_size = 30, n_workers = 2, verbose = 0)
Computes the Matrix Profile and Profile Index for a range of query window sizes
valmod( ..., window_min, window_max, heap_size = 50, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), lb = TRUE, verbose = getOption("tsmp.verbose", 2) )
valmod( ..., window_min, window_max, heap_size = 50, exclusion_zone = getOption("tsmp.exclusion_zone", 1/2), lb = TRUE, verbose = getOption("tsmp.verbose", 2) )
... |
a |
window_min |
an |
window_max |
an |
heap_size |
an |
exclusion_zone |
a |
lb |
a |
verbose |
an |
This algorithm uses an exact algorithm based on a novel lower bounding technique, which is
specifically designed for the motif discovery problem. verbose
changes how much information
is printed by this function; 0
means nothing, 1
means text, 2
adds the progress bar,
3
adds the finish sound. exclusion_zone
is used to avoid trivial matches; if a query data
is provided (join similarity), this parameter is ignored.
Paper that implements skimp()
suggests that window_max / window_min > than 1.24 begins to
weakening pruning in valmod()
.
Returns a Valmod
object, a list
with the matrix profile mp
, profile index pi
left and right matrix profile lmp
, rmp
and profile index lpi
, rpi
, best window size w
for each index and exclusion zone ez
. Additionally: evolution_motif
the best motif distance
per window size, and non-length normalized versions of mp
, pi
and w
: mpnn
, pinn
and wnn
.
Linardi M, Zhu Y, Palpanas T, Keogh E. VALMOD: A Suite for Easy and Exact Detection of Variable Length Motifs in Data Series. In: Proceedings of the 2018 International Conference on Management of Data - SIGMOD '18. New York, New York, USA: ACM Press; 2018. p. 1757-60.
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other matrix profile computations:
mstomp_par()
,
scrimp()
,
stamp_par()
,
stomp_par()
,
tsmp()
mp <- valmod(mp_toy_data$data[1:200, 1], window_min = 30, window_max = 40, verbose = 0) ref_data <- mp_toy_data$data[, 1] query_data <- mp_toy_data$data[, 2] # self similarity mp <- valmod(ref_data, window_min = 30, window_max = 40) # join similarity mp <- valmod(ref_data, query_data, window_min = 30, window_max = 40)
mp <- valmod(mp_toy_data$data[1:200, 1], window_min = 30, window_max = 40, verbose = 0) ref_data <- mp_toy_data$data[, 1] query_data <- mp_toy_data$data[, 2] # self similarity mp <- valmod(ref_data, window_min = 30, window_max = 40) # join similarity mp <- valmod(ref_data, query_data, window_min = 30, window_max = 40)
Plots an object generated from one of the algorithms. In some cases multiple plots will be generated
visualize(profile)
visualize(profile)
profile |
a |
Website: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Other Main API:
analyze()
,
compute()
,
discords()
,
motifs()
result <- compute(mp_toy_data$data[, 1], 80) visualize(result)
result <- compute(mp_toy_data$data[, 1], 80) visualize(result)
Write a TSMP object to JSON file.
write(x, ...) ## S3 method for class 'MatrixProfile' write(x, file, ...) ## S3 method for class 'PMP' write(x, file, ...)
write(x, ...) ## S3 method for class 'MatrixProfile' write(x, file, ...) ## S3 method for class 'PMP' write(x, file, ...)
x |
a |
... |
other arguments to be passed forward. |
file |
a |
result <- compute(mp_toy_data$data[, 1], 80) write(result, file = file.path(tempdir(), "output.json"))
result <- compute(mp_toy_data$data[, 1], 80) write(result, file = file.path(tempdir(), "output.json"))