diff --git a/joss.06294/10.21105.joss.06294.crossref.xml b/joss.06294/10.21105.joss.06294.crossref.xml new file mode 100644 index 0000000000..db4d716640 --- /dev/null +++ b/joss.06294/10.21105.joss.06294.crossref.xml @@ -0,0 +1,268 @@ + + + + 20240330T141705-0ea8bc011eff66513ce82ce067a9b9d231417e15 + 20240330141705 + + JOSS Admin + admin@theoj.org + + The Open Journal + + + + + Journal of Open Source Software + JOSS + 2475-9066 + + 10.21105/joss + https://joss.theoj.org + + + + + 03 + 2024 + + + 9 + + 95 + + + + fABBA: A Python library for the fast symbolic +approximation of time series + + + + Xinye + Chen + https://orcid.org/0000-0003-1778-393X + + + Stefan + Güttel + https://orcid.org/0000-0003-1494-4478 + + + + 03 + 30 + 2024 + + + 6294 + + + 10.21105/joss.06294 + + + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + + + + Software archive + 10.5281/zenodo.10885652 + + + GitHub review issue + https://github.com/openjournals/joss-reviews/issues/6294 + + + + 10.21105/joss.06294 + https://joss.theoj.org/papers/10.21105/joss.06294 + + + https://joss.theoj.org/papers/10.21105/joss.06294.pdf + + + + + + ABBA: adaptive Brownian bridge-based symbolic +aggregation of time series + Elsworth + Data Mining and Knowledge +Discovery + 34 + 10.1007/s10618-020-00689-6 + 2020 + Elsworth, S., & Güttel, S. +(2020). ABBA: adaptive Brownian bridge-based symbolic aggregation of +time series. Data Mining and Knowledge Discovery, 34, 1175–1200. +https://doi.org/10.1007/s10618-020-00689-6 + + + Least squares quantization in +PCM + Lloyd + Transactions on Information +Theory + 28 + 10.1109/tit.1982.1056489 + 1982 + Lloyd, S. P. (1982). Least squares +quantization in PCM. Transactions on Information Theory, 28, 129–137. +https://doi.org/10.1109/tit.1982.1056489 + + + A symbolic representation of time series, +with implications for streaming algorithms + Lin + Proceedings of the 8th ACM SIGMOD workshop on +research issues in data mining and knowledge discovery + 10.1145/882082.882086 + 2003 + Lin, J., Keogh, E., Lonardi, S., +& Chiu, B. (2003). A symbolic representation of time series, with +implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD +Workshop on Research Issues in Data Mining and Knowledge Discovery, +2–11. https://doi.org/10.1145/882082.882086 + + + An efficient aggregation method for the +symbolic representation of temporal data + Chen + ACM Transactions on Knowledge Discovery from +Data + 1 + 17 + 10.1145/3532622 + 2023 + Chen, X., & Güttel, S. (2023). An +efficient aggregation method for the symbolic representation of temporal +data. ACM Transactions on Knowledge Discovery from Data, 17(1), 1–22. +https://doi.org/10.1145/3532622 + + + ECG classification with learning ensemble +based on symbolic discretization + Taktak + Information Systems + 120 + 10.1016/j.is.2023.102294 + 2024 + Taktak, M., Ltifi, H., & Ayed, M. +B. (2024). ECG classification with learning ensemble based on symbolic +discretization. Information Systems, 120, 102294. +https://doi.org/10.1016/j.is.2023.102294 + + + Fast time series classification with random +symbolic subsequences + Nguyen + Advanced analytics and learning on temporal +data: 7th ECML PKDD workshop + 10.1007/978-3-031-24378-3_4 + 2023 + Nguyen, T. L., & Ifrim, G. +(2023). Fast time series classification with random symbolic +subsequences. Advanced Analytics and Learning on Temporal Data: 7th ECML +PKDD Workshop, 50–65. +https://doi.org/10.1007/978-3-031-24378-3_4 + + + Experiencing SAX: A novel symbolic +representation of time series + Lin + Data Mining and Knowledge +Discovery + 2 + 15 + 10.1007/s10618-007-0064-z + 2007 + Lin, J., Keogh, E., Wei, L., & +Lonardi, S. (2007). Experiencing SAX: A novel symbolic representation of +time series. Data Mining and Knowledge Discovery, 15(2), 107–144. +https://doi.org/10.1007/s10618-007-0064-z + + + Scikit-learn: Machine learning in +python + Pedregosa + Journal of Machine Learning +Research + 85 + 12 + 2011 + Pedregosa, F., Varoquaux, G., +Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., +Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., +Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, Édouard. +(2011). Scikit-learn: Machine learning in python. Journal of Machine +Learning Research, 12(85), 2825–2830. +http://jmlr.org/papers/v12/pedregosa11a.html + + + Foreseer: Efficiently forecasting malware +event series with long short-term memory + Gogineni + IEEE international symposium on secure and +private execution environment design + 10.1109/seed55351.2022.00016 + 2022 + Gogineni, K., Derasari, P., & +Venkataramani, G. (2022). Foreseer: Efficiently forecasting malware +event series with long short-term memory. IEEE International Symposium +on Secure and Private Execution Environment Design, 97–108. +https://doi.org/10.1109/seed55351.2022.00016 + + + Time series forecasting using LSTM networks: +A symbolic approach + Elsworth + 10.48550/arXiv.2003.05672 + 2020 + Elsworth, S., & Güttel, S. +(2020). Time series forecasting using LSTM networks: A symbolic approach +(No. arXiv:2003.05672v1; p. 12). +https://doi.org/10.48550/arXiv.2003.05672 + + + Data-driven prognostics based on +time-frequency analysis and symbolic recurrent neural network for fuel +cells under dynamic load + Wang + Reliability Engineering & System +Safety + 233 + 10.1016/j.ress.2023.109123 + 2023 + Wang, C., Dou, M., Li, Z., Outbib, +R., Zhao, D., Zuo, J., Wang, Y., Liang, B., & Wang, P. (2023). +Data-driven prognostics based on time-frequency analysis and symbolic +recurrent neural network for fuel cells under dynamic load. Reliability +Engineering & System Safety, 233, 109123. +https://doi.org/10.1016/j.ress.2023.109123 + + + A framework for generating summaries from +temporal personal health data + Harris + ACM Transactions on Computing for +Healthcare + 3 + 2 + 10.1145/3448672 + 2021 + Harris, J. J., Chen, C.-H., & +Zaki, M. J. (2021). A framework for generating summaries from temporal +personal health data. ACM Transactions on Computing for Healthcare, +2(3). https://doi.org/10.1145/3448672 + + + + + + diff --git a/joss.06294/10.21105.joss.06294.jats b/joss.06294/10.21105.joss.06294.jats new file mode 100644 index 0000000000..6d2afcc7f0 --- /dev/null +++ b/joss.06294/10.21105.joss.06294.jats @@ -0,0 +1,473 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +6294 +10.21105/joss.06294 + +fABBA: A Python library for the fast +symbolic approximation of time series + + + +https://orcid.org/0000-0003-1778-393X + +Chen +Xinye + + + + +https://orcid.org/0000-0003-1494-4478 + +Güttel +Stefan + + + + + +Department of Numerical Mathematics, Charles University +Prague, Czech Republic + + + + +Department of Mathematics, The University of Manchester, +United Kingdom + + + + +6 +12 +2023 + +9 +95 +6294 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2022 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +Python +time series +dimensionality reduction +symbolic representation +data science + + + + + + Summary +

Adaptive Brownian bridge-based aggregation (ABBA) + (Elsworth + & Güttel, 2020a) is a symbolic time series representation + approach that is applicable to general time series. It is based on a + tolerance-controlled polygonal chain approximation of the time series, + followed by a mean-based clustering of the polygonal pieces into + groups. With the increasing need for faster time series processing, + lots of efforts have been put into deriving new time series + representations in order to reduce the time complexity of similarity + search or enhance forecasting performance of machine learning models. + Compared to working on the raw time series data, symbolizing time + series with ABBA provides numerous benefits including but not limited + to (1) dimensionality reduction, (2) smoothing and noise reduction, + and (3) explainable feature discretization. The time series features + extracted by ABBA enable fast time series forecasting + (Elsworth + & Güttel, 2020b), anomaly detection + (Chen + & Güttel, 2023; + Elsworth + & Güttel, 2020a), event prediction + (Gogineni + et al., 2022), classification + (Nguyen + & Ifrim, 2023; + Taktak + et al., 2024), and other data-driven tasks in time series + analysis + (Harris + et al., 2021; + Wang + et al., 2023). An example illustration of an ABBA symbolization + is shown in + [fig:enter-label].

+ +

ABBA symbolization with 4 + symbols.

+ +
+

ABBA follows a two-phase approach to symbolize time series, namely + compression and digitization. The first phase aims to reduce the time + series dimension by polygonal chain approximation, and the second + phase assigns symbols to the polygonal pieces. Both phases operate + together to ensure that the essential time series features are best + reflected by the symbols, controlled by a user-chosen error tolerance. + The advantages of the ABBA representation against other symbolic + representations include (1) better preservation of essential shape + features, e.g., when compared against the popular SAX representation + (Elsworth + & Güttel, 2020a; + Lin et + al., 2003); (2) effective representation of local up and down + trends in the time series which supports motif detection; (3) + demonstrably reduced sensitivity to hyperparameters of neural network + models and the initialization of random weights in forecasting + applications + (Elsworth + & Güttel, 2020b).

+

fABBA is a Python library to compute ABBA symbolic + time series representations on Linux, Windows, and MacOS systems. With + Cython compilation and typed memoryviews, it significantly outperforms + existing ABBA implementations. The fABBA library also + includes a new ABBA variant, fABBA + (Chen + & Güttel, 2023), which uses a fast alternative digitization + method (i.e., greedy aggregation) instead of k-means clustering + (Lloyd, + 1982), providing significant speedup and improved + tolerance-based digitization (without the need to specify the number + + + k + of symbols a priori). The experiments in Chen & Güttel + (2023) + demonstrate that fABBA runs significantly faster than the original + ABBA module at + https://github.com/nla-group/ABBA/. + fABBA is an open-source library and licensed under + the 3-Clause BSD License. Its redistribution and use, with or without + modification, are permitted under conditions described in + https://opensource.org/license/bsd-3-clause/.

+
+ + Examples +

fABBA can installed via the Python Package Index + or conda forge. Detailed documentation for its installation, usage, + API reference, and quick start examples can be found on + https://fabba.readthedocs.io/en/latest/. Below + we provide a brief demonstration.

+ + Compress and reconstruct a time series +

The following example approximately transforms a time series into + a symbolic string representation (using method + transform()) and then converts the string + back into a numerical format (using method + inverse_transform()). fABBA requires two + parameters, tol and + alpha. The tolerance + tol determines how closely the polygonal + chain approximation follows the original time series. The parameter + alpha controls how similar time series pieces + need to be in order to be represented by the same symbol. A smaller + tol means that more polygonal pieces are used + and the polygonal chain approximation is more accurate; but on the + other hand, it will increase the length of the string + representation. Similarly, a smaller alpha + typically results in more accurate symbolic digitization but a + larger number of symbols.

+ import numpy as np +import matplotlib.pyplot as plt +from fABBA import fABBA + +# original time series +ts = [np.sin(0.05*i) for i in range(1000)] +fabba = fABBA(tol=0.1, alpha=0.1, sorting='2-norm', scl=1, verbose=0) + +# symbolic representation of the time series +string = fabba.fit_transform(ts) +# prints aBbCbCbCbCbCbCbCA +print(string) + +# reconstruct numerical time series +inverse_ts = fabba.inverse_transform(string, ts[0]) +
+ + More ABBA variants +

Other clustering-based ABBA variants are also provided, supported + by the clustering methods in the scikit-learn + library + (Pedregosa + et al., 2011). Below is a basic code example.

+ import numpy as np +from sklearn.cluster import KMeans +from fABBA import ABBAbase + +# original time series +ts = [np.sin(0.05*i) for i in range(1000)] +# k-means clustering with 5 symbols +kmeans = KMeans(n_clusters=5, random_state=0, init='k-means++', verbose=0) +abba = ABBAbase(tol=0.1, scl=1, clustering=kmeans) + +# symbolic representation of the time series +string = abba.fit_transform(ts) +# prints BbAaAaAaAaAaAaAaC +print(string) + +# reconstruct numerical time series +inverse_ts = abba.inverse_transform(string) +
+
+ + Statement of Need +

Symbolic representations enhance time series processing by a large + number of powerful techniques developed, e.g., by the natural language + processing or bioinformatics communities + (Lin et + al., 2003, + 2007). + fABBA is a Python module for computing such symbolic + time series representations very efficiently, enabling their use for + downstream tasks such as time series classification, forecasting, and + anomaly detection.

+
+ + Acknowledgement +

Stefan Güttel acknowledges a Royal Society Industry Fellowship + IF/R1/231032. Xinye Chen is supported by the European Union (ERC, + inEXASCALE, 101075632). Views and opinions expressed are those of the + authors only and do not necessarily reflect those of the European + Union or the European Research Council. Neither the European Union nor + the granting authority can be held responsible for them.

+
+ + + + + + + ElsworthSteven + GüttelStefan + + ABBA: adaptive Brownian bridge-based symbolic aggregation of time series + Data Mining and Knowledge Discovery + 2020 + 34 + 10.1007/s10618-020-00689-6 + 1175 + 1200 + + + + + + LloydStuart P. + + Least squares quantization in PCM + Transactions on Information Theory + IEEE + 1982 + 28 + 10.1109/tit.1982.1056489 + 129 + 137 + + + + + + LinJessica + KeoghEamonn + LonardiStefano + ChiuBill + + A symbolic representation of time series, with implications for streaming algorithms + Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery + ACM + 2003 + 10.1145/882082.882086 + 2 + 11 + + + + + + ChenXinye + GüttelStefan + + An efficient aggregation method for the symbolic representation of temporal data + ACM Transactions on Knowledge Discovery from Data + 2023 + 17 + 1 + 10.1145/3532622 + 1 + 22 + + + + + + TaktakMariem + LtifiHela + AyedMounir Ben + + ECG classification with learning ensemble based on symbolic discretization + Information Systems + 2024 + 120 + 10.1016/j.is.2023.102294 + 102294 + + + + + + + NguyenThach Le + IfrimGeorgiana + + Fast time series classification with random symbolic subsequences + Advanced analytics and learning on temporal data: 7th ECML PKDD workshop + Springer + 2023 + 10.1007/978-3-031-24378-3_4 + 50 + 65 + + + + + + LinJessica + KeoghEamonn + WeiLi + LonardiStefano + + Experiencing SAX: A novel symbolic representation of time series + Data Mining and Knowledge Discovery + Springer + 2007 + 15 + 2 + 10.1007/s10618-007-0064-z + 107 + 144 + + + + + + PedregosaFabian + VaroquauxGaël + GramfortAlexandre + MichelVincent + ThirionBertrand + GriselOlivier + BlondelMathieu + PrettenhoferPeter + WeissRon + DubourgVincent + VanderplasJake + PassosAlexandre + CournapeauDavid + BrucherMatthieu + PerrotMatthieu + Duchesnay + + Scikit-learn: Machine learning in python + Journal of Machine Learning Research + 2011 + 12 + 85 + http://jmlr.org/papers/v12/pedregosa11a.html + 2825 + 2830 + + + + + + GogineniKailash + DerasariPreet + VenkataramaniGuru + + Foreseer: Efficiently forecasting malware event series with long short-term memory + IEEE international symposium on secure and private execution environment design + 2022 + + 10.1109/seed55351.2022.00016 + 97 + 108 + + + + + + ElsworthSteven + GüttelStefan + + Time series forecasting using LSTM networks: A symbolic approach + 2020 + 10.48550/arXiv.2003.05672 + 12 + + + + + + + WangChu + DouManfeng + LiZhongliang + OutbibRachid + ZhaoDongdong + ZuoJian + WangYuanlin + LiangBin + WangPeng + + Data-driven prognostics based on time-frequency analysis and symbolic recurrent neural network for fuel cells under dynamic load + Reliability Engineering & System Safety + 2023 + 233 + 10.1016/j.ress.2023.109123 + 109123 + + + + + + + HarrisJonathan J. + ChenChing-Hua + ZakiMohammed J. + + A framework for generating summaries from temporal personal health data + ACM Transactions on Computing for Healthcare + ACM + 2021 + 2 + 3 + 10.1145/3448672 + + + + +
diff --git a/joss.06294/10.21105.joss.06294.pdf b/joss.06294/10.21105.joss.06294.pdf new file mode 100644 index 0000000000..d2aaac7d71 Binary files /dev/null and b/joss.06294/10.21105.joss.06294.pdf differ diff --git a/joss.06294/media/abba.png b/joss.06294/media/abba.png new file mode 100644 index 0000000000..c922ca959b Binary files /dev/null and b/joss.06294/media/abba.png differ