diff --git a/joss.05755/10.21105.joss.05755.crossref.xml b/joss.05755/10.21105.joss.05755.crossref.xml
new file mode 100644
index 0000000000..8448954d26
--- /dev/null
+++ b/joss.05755/10.21105.joss.05755.crossref.xml
@@ -0,0 +1,339 @@
+
+
+
+ 20231116T170444-68d5f67ef7febb92d7b6b965d3d76e36b965d0c5
+ 20231116170444
+
+ JOSS Admin
+ admin@theoj.org
+
+ The Open Journal
+
+
+
+
+ Journal of Open Source Software
+ JOSS
+ 2475-9066
+
+ 10.21105/joss
+ https://joss.theoj.org
+
+
+
+
+ 11
+ 2023
+
+
+ 8
+
+ 91
+
+
+
+ pvOps: a Python package for empirical analysis of
+photovoltaic field data
+
+
+
+ Kirk L.
+ Bonney
+ https://orcid.org/0009-0006-2383-1634
+
+
+ Thushara
+ Gunda
+ https://orcid.org/0000-0003-1945-4064
+
+
+ Michael W.
+ Hopwood
+ https://orcid.org/0000-0001-6190-1767
+
+
+ Hector
+ Mendoza
+ https://orcid.org/0009-0007-5812-606X
+
+
+ Nicole D.
+ Jackson
+ https://orcid.org/0000-0002-3814-9906
+
+
+
+ 11
+ 16
+ 2023
+
+
+ 5755
+
+
+ 10.21105/joss.05755
+
+
+ http://creativecommons.org/licenses/by/4.0/
+ http://creativecommons.org/licenses/by/4.0/
+ http://creativecommons.org/licenses/by/4.0/
+
+
+
+ Software archive
+ 10.5281/zenodo.10126530
+
+
+ GitHub review issue
+ https://github.com/openjournals/joss-reviews/issues/5755
+
+
+
+ 10.21105/joss.05755
+ https://joss.theoj.org/papers/10.21105/joss.05755
+
+
+ https://joss.theoj.org/papers/10.21105/joss.05755.pdf
+
+
+
+
+
+ RdTools: An open source python library for PV
+degradation analysis
+ Deceglie
+ 2018
+ Deceglie, M. G., Jordan, D., Nag, A.,
+Deline, C. A., & Shinn, A. (2018). RdTools: An open source python
+library for PV degradation analysis. National Renewable Energy
+Lab.(NREL), Golden, CO (United States).
+
+
+ A machine learning evaluation of maintenance
+records for common failure modes in PV inverters
+ Gunda
+ IEEE Access
+ 8
+ 10.1109/ACCESS.2020.3039182
+ 2020
+ Gunda, T., Hackett, S., Kraus, L.,
+Downs, C., Jones, R., McNalley, C., Bolen, M., & Walker, A. (2020).
+A machine learning evaluation of maintenance records for common failure
+modes in PV inverters. IEEE Access, 8, 211610–211620.
+https://doi.org/10.1109/ACCESS.2020.3039182
+
+
+ Pvlib python: A python package for modeling
+solar energy systems
+ Holmgren
+ Journal of Open Source
+Software
+ 29
+ 3
+ 10.21105/joss.00884
+ 2018
+ Holmgren, W. F., Hansen, C. W., &
+Mikofski, M. A. (2018). Pvlib python: A python package for modeling
+solar energy systems. Journal of Open Source Software, 3(29), 884.
+https://doi.org/10.21105/joss.00884
+
+
+ Neural network-based classification of
+string-level IV curves from physically-induced failures of photovoltaic
+modules
+ Hopwood
+ IEEE Access
+ 8
+ 10.1109/ACCESS.2020.3021577
+ 2020
+ Hopwood, M. W., Gunda, T., Seigneur,
+H., & Walters, J. (2020). Neural network-based classification of
+string-level IV curves from physically-induced failures of photovoltaic
+modules. IEEE Access, 8, 161480–161487.
+https://doi.org/10.1109/ACCESS.2020.3021577
+
+
+ Classification of photovoltaic failures with
+hidden markov modeling, an unsupervised statistical
+approach
+ Hopwood
+ Energies
+ 14
+ 15
+ 10.3390/en15145104
+ 2022
+ Hopwood, M. W., Patel, L., &
+Gunda, T. (2022). Classification of photovoltaic failures with hidden
+markov modeling, an unsupervised statistical approach. Energies, 15(14),
+5104. https://doi.org/10.3390/en15145104
+
+
+ Generation of data-driven expected energy
+models for photovoltaic systems
+ Hopwood
+ Applied Sciences
+ 4
+ 12
+ 10.3390/app12041872
+ 2022
+ Hopwood, M. W., & Gunda, T.
+(2022). Generation of data-driven expected energy models for
+photovoltaic systems. Applied Sciences, 12(4), 1872.
+https://doi.org/10.3390/app12041872
+
+
+ Physics-based method for generating fully
+synthetic IV curve training datasets for machine learning classification
+of PV failures
+ Hopwood
+ Energies
+ 14
+ 15
+ 10.3390/en15145085
+ 2022
+ Hopwood, M. W., Stein, J. S., Braid,
+J. L., & Seigneur, H. P. (2022). Physics-based method for generating
+fully synthetic IV curve training datasets for machine learning
+classification of PV failures. Energies, 15(14), 5085.
+https://doi.org/10.3390/en15145085
+
+
+ pvOps: Improving operational assessments
+through data fusion
+ Mendoza
+ 2021 IEEE 48th photovoltaic specialists
+conference (PVSC)
+ 10.1109/PVSC43889.2021.9518439
+ 2021
+ Mendoza, H., Hopwood, M., &
+Gunda, T. (2021). pvOps: Improving operational assessments through data
+fusion. 2021 IEEE 48th Photovoltaic Specialists Conference (PVSC),
+0112–0119.
+https://doi.org/10.1109/PVSC43889.2021.9518439
+
+
+ Pandas-dev/pandas: pandas
+ The pandas development team
+ 10.5281/zenodo.3509134
+ 2020
+ The pandas development team. (2020).
+Pandas-dev/pandas: pandas (latest). Zenodo.
+https://doi.org/10.5281/zenodo.3509134
+
+
+ Identifying degradation modes of photovoltaic
+modules using unsupervised machine learning on electroluminescense
+images
+ Pierce
+ 2020 47th IEEE photovoltaic specialists
+conference (PVSC)
+ 10.1109/PVSC45281.2020.9301021
+ 2020
+ Pierce, B. G., Karimi, A. M., Liu,
+J., French, R. H., & Braid, J. L. (2020). Identifying degradation
+modes of photovoltaic modules using unsupervised machine learning on
+electroluminescense images. 2020 47th IEEE Photovoltaic Specialists
+Conference (PVSC), 1850–1855.
+https://doi.org/10.1109/PVSC45281.2020.9301021
+
+
+ Performance monitoring using pecos (v.
+0.1)
+ Klise
+ 10.2172/1734479
+ 2016
+ Klise, K. A., & Stein, J. S.
+(2016). Performance monitoring using pecos (v. 0.1). Sandia National
+Laboraties. https://doi.org/10.2172/1734479
+
+
+ Collaborative data science
+ Plotly Technologies Inc.
+ 2015
+ Plotly Technologies Inc. (2015).
+Collaborative data science. Plotly Technologies Inc.
+https://plot.ly
+
+
+ Seaborn: Statistical data
+visualization
+ Waskom
+ Journal of Open Source
+Software
+ 60
+ 6
+ 10.21105/joss.03021
+ 2021
+ Waskom, M. L. (2021). Seaborn:
+Statistical data visualization. Journal of Open Source Software, 6(60),
+3021. https://doi.org/10.21105/joss.03021
+
+
+ Matplotlib: A 2D graphics
+environment
+ Hunter
+ Computing in Science &
+Engineering
+ 3
+ 9
+ 10.1109/MCSE.2007.55
+ 2007
+ Hunter, J. D. (2007). Matplotlib: A
+2D graphics environment. Computing in Science & Engineering, 9(3),
+90–95. https://doi.org/10.1109/MCSE.2007.55
+
+
+ Natural language processing with
+python
+ Bird
+ 2009
+ Bird, S., Klein, E., & Loper, E.
+(2009). Natural language processing with python. O’Reilly
+Media.
+
+
+ Keras
+ Chollet
+ 2015
+ Chollet, F., & others. (2015).
+Keras. https://keras.io.
+
+
+ Scikit-learn: Machine learning in
+Python
+ Pedregosa
+ Journal of Machine Learning
+Research
+ 12
+ 2011
+ Pedregosa, F., Varoquaux, G.,
+Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
+Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
+Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011).
+Scikit-learn: Machine learning in Python. Journal of Machine Learning
+Research, 12, 2825–2830.
+
+
+ PVAnalytics: A python package for automated
+processing of solar time series data
+ Perry
+ 2022
+ Perry, K., Vining, W., Anderson, K.,
+Muller, M., & Hansen, C. (2022). PVAnalytics: A python package for
+automated processing of solar time series data. National Renewable
+Energy Lab.(NREL), Golden, CO (United States).
+
+
+
+
+
+
diff --git a/joss.05755/10.21105.joss.05755.jats b/joss.05755/10.21105.joss.05755.jats
new file mode 100644
index 0000000000..2ae8495b7a
--- /dev/null
+++ b/joss.05755/10.21105.joss.05755.jats
@@ -0,0 +1,661 @@
+
+
+
+
+
+
+
+Journal of Open Source Software
+JOSS
+
+2475-9066
+
+Open Journals
+
+
+
+5755
+10.21105/joss.05755
+
+pvOps: a Python package for empirical analysis of
+photovoltaic field data
+
+
+
+https://orcid.org/0009-0006-2383-1634
+
+Bonney
+Kirk L.
+
+
+*
+
+
+https://orcid.org/0000-0003-1945-4064
+
+Gunda
+Thushara
+
+
+
+
+https://orcid.org/0000-0001-6190-1767
+
+Hopwood
+Michael W.
+
+
+
+
+https://orcid.org/0009-0007-5812-606X
+
+Mendoza
+Hector
+
+
+
+
+https://orcid.org/0000-0002-3814-9906
+
+Jackson
+Nicole D.
+
+
+
+
+
+Sandia National Laboratories, USA
+
+
+
+
+University of Central Florida, USA
+
+
+
+
+* E-mail:
+
+
+4
+4
+2023
+
+8
+91
+5755
+
+Authors of papers retain copyright and release the
+work under a Creative Commons Attribution 4.0 International License (CC
+BY 4.0)
+2022
+The article authors
+
+Authors of papers retain copyright and release the work under
+a Creative Commons Attribution 4.0 International License (CC BY
+4.0)
+
+
+
+Python
+photovoltaic
+time series
+machine learning
+natural language processing
+
+
+
+
+
+ Summary
+
The purpose of pvOps is to support empirical
+ evaluations of data collected in the field related to the operations
+ and maintenance (O&M) of photovoltaic (PV) power plants.
+ pvOps presently contains modules that address
+ the diversity of field data, including text-based maintenance logs,
+ current-voltage (IV) curves, and timeseries of production information.
+ The package functions leverage machine learning, visualization, and
+ other techniques to enable cleaning, processing, and fusion of these
+ datasets. These capabilities are intended to facilitate easier
+ evaluation of field patterns and extraction of relevant insights to
+ support reliability-related decision-making for PV sites. The
+ open-source code, examples, and instructions for installing the
+ package through PyPI can be accessed through the
+ GitHub
+ repository.
+
+
+ Statement of Need
+
Continued interest in PV deployment across the world has resulted
+ in increased awareness of needs associated with managing reliability
+ and performance of these systems during operation. Current open-source
+ packages for PV analysis focus on theoretical evaluations of solar
+ power simulations (e.g., pvlib
+ (Holmgren
+ et al., 2018)), data cleaning and feature development for
+ production data (e.g. pvanalytics
+ (Perry
+ et al., 2022)), specific use cases of empirical evaluations
+ (e.g., RdTools
+ (Deceglie
+ et al., 2018) and Pecos
+ (Klise
+ & Stein, 2016) for degradation analysis), or analysis of
+ electroluminescene images (e.g., PVimage
+ (Pierce
+ et al., 2020)); see
+ openpvtools
+ for a list of additional open source PV packages. However, a general
+ package that can support data-driven, exploratory evaluations of
+ diverse field collected information is currently lacking. For example,
+ a maintenance log that describes an inverter failure may be temporally
+ correlated to a dip in production levels. Identifying such
+ relationships across different types of field data can improve
+ understanding of the impacts of certain types of failures on a PV
+ plant. To address this gap, we present pvOps,
+ an open-source Python package that can be used by researchers and
+ industry analysts alike to evaluate and extract insights from
+ different types of data routinely collected during PV field
+ operations.
+
PV data collected in the field varies greatly in structure (e.g.,
+ timeseries and text records) and quality (e.g., completeness and
+ consistency). The data available for analysis is frequently
+ semi-structured. Furthermore, the level of detail collected between
+ different owners/operators might vary. For example, some may capture a
+ general start and end time for an associated event whereas others
+ might include additional time details for different resolution
+ activities. This diversity in data types and structures often leads to
+ data being under-utilized due to the amount of manual processing
+ required. To address these issues, pvOps
+ provides a suite of data processing, cleaning, and visualization
+ methods to leverage insights across a broad range of data types,
+ including operations and maintenance records, production timeseries,
+ and IV curves. The functions within pvOps
+ enable users to better parse available data to understand patterns in
+ outages and production losses.
+
+
+ Package Overview
+
The following table summarizes the four modules within
+ pvOps by presenting: the type of data they
+ analyze, example data features, and highlights of relevant
+ functions.
+
Table 1. Summary of modules and functions within
+ ‘pvOps‘
fill data gaps in dates and categorical records, visualize
+ word clusters and patterns over time
+
+
+
+
+
+
+
+
+
timeseries
+
Production data
+
site, timestamp,
+ power production,
+ irradiance
+
estimate expected energy with multiple models, evaluate
+ inverter clipping
+
+
+
+
+
+
+
+
+
text2time
+
O&M records and production data
+
see entries for text and
+ timeseries modules above
+
analyze overlaps between O&M and production
+ (timeseries) records, visualize overlaps between O&M
+ records and production data
+
+
+
+
+
+
+
+
+
iv
+
IV records
+
current, voltage,
+ irradiance, temperature
+
simulate IV curves with physical faults, extract diode
+ parameters from IV curves, classify faults using IV
+ curves
+
+
+
+
+
The functions within each module can be used to build pipelines
+ that integrate relevant data processing, fusion, and visualization
+ capabilities to support user endgoals. For example, a user with IV
+ curve data could build a pipeline that leverages functions within the
+ iv module to process and extract diode
+ parameters within IV curves as well as train models to support
+ classifications based on fault type. A pipeline could be also be built
+ that leverages functions across modules if a user has access to
+ multiple types of data (e.g., both O&M and production records). A
+ sample end-to-end workflow using pvOps modules
+ could be:
+
+
+
Use functions within the text module to
+ systematically review data quality issues within O&M records,
+ train a machine learning model on available records, and use the
+ model to estimate possible labels for missing entries
+
+
+
Leverage the functions within the
+ timeseries module, use machine learning to
+ develop their own expected energy models for a given time series
+ of irradiance and system size details, or use a pre-trained
+ expected energy model
+ (Hopwood
+ & Gunda, 2022) or leverage industry standard equations
+ as a basis for evaluating possible production losses
+
+
+
Couple outputs from the above two analyses (using functions in
+ the text2time module) based on timestamps
+ to develop summaries and visualizations of production impacts
+ observed during these periods
+
+
+
The
+ package
+ documentation for pvOps provides
+ thorough examples exploring the various capabilities of each module.
+ Additional details about the iv module
+ capabilities are captured in
+ (Hopwood
+ et al., 2020;
+ Hopwood,
+ Stein, et al., 2022) while more information about the design
+ and development of the text,
+ timeseries, and
+ text2time modules are captured in
+ (Mendoza
+ et al., 2021). Key package dependencies of
+ pvOps include pandas
+ (The
+ pandas development team, 2020), sklearn
+ (Pedregosa
+ et al., 2011), nltk
+ (Bird
+ et al., 2009), and keras
+ (Chollet
+ & others, 2015) for analysis and
+ matplotlib
+ (Hunter,
+ 2007), seaborn
+ (Waskom,
+ 2021), and plotly
+ (Plotly
+ Technologies Inc., 2015) for visualization.
+
+
+ Ongoing Development
+
The pvOps functionality and documentation
+ continues to be improved and updated as new empirical techniques are
+ identified. For example, research efforts have demonstrated utility of
+ natural language processing techniques (e.g., topic modeling) and
+ survival analyses to support evaluation of patterns in O&M records
+ (Gunda
+ et al., 2020). Additional statistical methods, such as Hidden
+ Markov Modeling, have also been successfully used to support
+ classification of failures within production data
+ (Hopwood,
+ Patel, et al., 2022). These and other capabilities will
+ continue to be added to the package to improve its utility for
+ supporting empirical analyses of field data.
This material is supported by the U.S. Department of Energy, Office
+ of Energy Efficiency and Renewable Energy - Solar Energy Technologies
+ Office. Sandia National Laboratories, a multimission laboratory
+ managed and operated by National Technology and Engineering Solutions
+ of Sandia LLC, a wholly owned subsidiary of Honeywell International
+ Inc. for the U.S. Department of Energy’s National Nuclear Security
+ Administration under contract DE-NA0003525.
+
+
+
+
+
+
+
+ DeceglieMichael G
+ JordanDirk
+ NagAmbarish
+ DelineChristopher A
+ ShinnAdam
+
+ RdTools: An open source python library for PV degradation analysis
+ National Renewable Energy Lab.(NREL), Golden, CO (United States)
+ 2018
+
+
+
+
+
+ GundaThushara
+ HackettSean
+ KrausLaura
+ DownsChristopher
+ JonesRyan
+ McNalleyChristopher
+ BolenMichael
+ WalkerAndy
+
+ A machine learning evaluation of maintenance records for common failure modes in PV inverters
+
+ IEEE
+ 2020
+ 8
+ 10.1109/ACCESS.2020.3039182
+ 211610
+ 211620
+
+
+
+
+
+ HolmgrenWilliam F
+ HansenClifford W
+ MikofskiMark A
+
+ Pvlib python: A python package for modeling solar energy systems
+
+ 2018
+ 3
+ 29
+ 10.21105/joss.00884
+ 884
+
+
+
+
+
+
+ HopwoodMichael W
+ GundaThushara
+ SeigneurHubert
+ WaltersJoseph
+
+ Neural network-based classification of string-level IV curves from physically-induced failures of photovoltaic modules
+
+ IEEE
+ 2020
+ 8
+ 10.1109/ACCESS.2020.3021577
+ 161480
+ 161487
+
+
+
+
+
+ HopwoodMichael W
+ PatelLekha
+ GundaThushara
+
+ Classification of photovoltaic failures with hidden markov modeling, an unsupervised statistical approach
+
+ MDPI
+ 2022
+ 15
+ 14
+ 10.3390/en15145104
+ 5104
+
+
+
+
+
+
+ HopwoodMichael W
+ GundaThushara
+
+ Generation of data-driven expected energy models for photovoltaic systems
+
+ MDPI
+ 2022
+ 12
+ 4
+ 10.3390/app12041872
+ 1872
+
+
+
+
+
+
+ HopwoodMichael W
+ SteinJoshua S
+ BraidJennifer L
+ SeigneurHubert P
+
+ Physics-based method for generating fully synthetic IV curve training datasets for machine learning classification of PV failures
+
+ MDPI
+ 2022
+ 15
+ 14
+ 10.3390/en15145085
+ 5085
+
+
+
+
+
+
+ MendozaHector
+ HopwoodMichael
+ GundaThushara
+
+ pvOps: Improving operational assessments through data fusion
+
+ IEEE
+ 2021
+ 10.1109/PVSC43889.2021.9518439
+ 0112
+ 0119
+
+
+
+
+
+ The pandas development team
+
+ Pandas-dev/pandas: pandas
+ Zenodo
+ 202002
+ https://doi.org/10.5281/zenodo.3509134
+ 10.5281/zenodo.3509134
+
+
+
+
+
+ PierceBenjamin G
+ KarimiAhmad Maroof
+ LiuJiQi
+ FrenchRoger H
+ BraidJennifer L
+
+ Identifying degradation modes of photovoltaic modules using unsupervised machine learning on electroluminescense images
+
+ IEEE
+ 2020
+ 10.1109/PVSC45281.2020.9301021
+ 1850
+ 1855
+
+
+
+
+
+ KliseKatherine A
+ SteinJoshua S
+
+ Performance monitoring using pecos (v. 0.1)
+ Sandia National Laboraties
+ 2016
+ 10.2172/1734479
+
+
+
+
+
+ Plotly Technologies Inc.
+
+ Collaborative data science
+ Plotly Technologies Inc.
+ Montreal, QC
+ 2015
+ https://plot.ly
+
+
+
+
+
+ WaskomMichael L.
+
+ Seaborn: Statistical data visualization
+
+ The Open Journal
+ 2021
+ 6
+ 60
+ https://doi.org/10.21105/joss.03021
+ 10.21105/joss.03021
+ 3021
+
+
+
+
+
+
+ HunterJ. D.
+
+ Matplotlib: A 2D graphics environment
+
+ IEEE COMPUTER SOC
+ 2007
+ 9
+ 3
+ 10.1109/MCSE.2007.55
+ 90
+ 95
+
+
+
+
+
+ BirdSteven
+ KleinEwan
+ LoperEdward
+
+
+ O’Reilly Media
+ 2009
+
+
+
+
+
+ CholletFrançois
+ others
+
+ Keras
+ https://keras.io
+ 2015
+
+
+
+
+
+ PedregosaF.
+ VaroquauxG.
+ GramfortA.
+ MichelV.
+ ThirionB.
+ GriselO.
+ BlondelM.
+ PrettenhoferP.
+ WeissR.
+ DubourgV.
+ VanderplasJ.
+ PassosA.
+ CournapeauD.
+ BrucherM.
+ PerrotM.
+ DuchesnayE.
+
+ Scikit-learn: Machine learning in Python
+
+ 2011
+ 12
+ 2825
+ 2830
+
+
+
+
+
+ PerryKirsten
+ ViningWilliam
+ AndersonKevin
+ MullerMatthew
+ HansenCliff
+
+ PVAnalytics: A python package for automated processing of solar time series data
+ National Renewable Energy Lab.(NREL), Golden, CO (United States)
+ 2022
+
+
+
+
+
diff --git a/joss.05755/10.21105.joss.05755.pdf b/joss.05755/10.21105.joss.05755.pdf
new file mode 100644
index 0000000000..84deacbb47
Binary files /dev/null and b/joss.05755/10.21105.joss.05755.pdf differ