I'm in the process of learning about NNs and LSTMs in the context of developing global time series forecasting models. I am struggling to answer this question:
What is the precise meaning of "sample" in a global time series context, and how does this compare to the meaning of "sample" in non-sequence data or univariate time series contexts?
Can anyone help?
Non-Sequence Data
In this context, my understanding is that each data point (i.e. each row in the dataframe) is known as a "sample". These data points can be put into "batches" (here short for "mini-batches"). The parameter batch_size
specifies how many data points are in each "(mini) batch". The total number of batches (i.e. iterations) that would have to be processed in order to ensure the model had been exposed to the entire dataset once within an epoch during training is:
Number of Batches per Epoch = (Number of Samples / batch_size
) = (Number of data points in dataset / batch_size
)
Univariate TS Context
In a univariate timeseries modelling context, "sample" seems to be defined in relation to the parameter sequence_length
. This post and this post seem to say that a "sample" in a time series context is not a single data point within the time series. Instead, each data point is a "time step". These posts point out that once a sequence_length
has been specified, a time series can be decomposed into a series of over-lapping sub-sequences, each of length sequence_length
. A "sample" is one of those sub-sequences. In this case, batch_size
indicates how many of those sub-sequences will be selected in each iteration.
Number of Batches per Epoch = (Number of sub-sequences of length sequence_length
formed from the time series / batch_size
)
Global TS Context
In a global time series modelling context, the dataset consists of a suite of related (target) time series. Each of these (target) time series can have exogenous features (dynamic or static). The goal is to produce a single model from this suite of related (target) time series (and any exogenous features) in order to be able to forecast any of the (target) series. This adds an extra dimension to consider: the number of (target) time series in the dataset (i.e. the number of "data entries" in the dataset).
I presume that batch_length
still refers to the number of "samples" drawn each iteration. However, it's not clear to me if the definition of "sample" changes in light of this added dimension. Are any of the following correct?
Possiblity (A) "Sample" is a multi-series construct. All (target) time series are decomposed into sub-sequences of length
sequence_length
. A single "sample" is formed by selecting a time index, and then taking the sub-sequence that starts at that index in each (target) series simultaneously. The sample would have length equal tosequence_length
and another dimension equal to however many (target) series are in the dataset. Assuming all the (target) series are the same length, then:Number of Batches per Epoch = (Number of sub-sequences of length
sequence_length
in any one time series in the dataset /batch_size
)Possiblity (B) "Sample" is a single-series construct (as in the univariate context), but samples come from multiple series in an un-coordinated fashion such that not all (target) series may be represented in each iteration, and not all samples within an iteration are aligned with each other in time.
Number of Batches per Epoch = (Number of sub-sequences of length
sequence_length
in the full dataset /batch_size
)Possibility (C) "Sample" refers to the (target) time series in the dataset (i.e. 1 "sample" = 1 "data entry"), and not to sub-sequences within any (target) time series. In this case, to cover the entire dataset an epoch would require the following number of batches:
Number of Batches per Epoch = (Number of (target) time series in the full dataset /
batch_size
)
I've tried to look widely, and haven't found anything that specifically and directly addresses my specific question. I've looked for these clarifications in software documentation and code bases (e.g. GluonTS, Pytorch Lightning) and other online sources. If it's there, I've missed it or misunderstood it. It often seems like these terms are used presuming their precise definitions are already known by the reader.