Skip to content

Commit

Permalink
i #308 Add Scitools Understand Parser
Browse files Browse the repository at this point in the history
Adds Scitools Understand Dependencies parser for files and classes.

---------

Signed-off-by: Carlos Paradis <[email protected]>
Co-authored-by: Nicholas Beydler <[email protected]>
Co-authored-by: Carlos Paradis <[email protected]>
  • Loading branch information
3 people authored Dec 8, 2024
1 parent 513a3f0 commit ac522b6
Show file tree
Hide file tree
Showing 38 changed files with 607 additions and 42 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Package: kaiaulu
Type: Package
Title: Kaiaulu
Version: 0.0.0.9700
Description: Kaiaulu is an R package and common interface that helps with understanding evolving software development communities, and the artifacts (gitlog, mailing list, files, etc.) which developers collaborate and communicate about. See Paradis et al., (2012) <doi:10.1007/978-3-031-15116-3_6>.
Description: Kaiaulu is an R package and common interface that helps with understanding evolving software development communities, and the artifacts (gitlog, mailing list, files, etc.) which developers collaborate and communicate about. See Paradis et al., (2012) <doi:10.1007/978-3-031-15116-3_6>.
Authors@R: c(
person('Carlos', 'Paradis', role = c('aut', 'cre'),
email = '[email protected]',
Expand All @@ -21,6 +21,7 @@ Authors@R: c(
person('Anthony', 'Lau', role = c('ctb')),
person('Sean', 'Sunoo', role = c('ctb')),
person('Ian Jaymes', 'Iwata', role= c('ctb')),
person('Raven', 'Quiddaoen', role= c('ctb')),
person('Nicholas', 'Beydler', role = c('ctb')),
person('Mark', 'Burgess', role = c('ctb'))
)
Expand Down
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
export(annotate_src_text)
export(assign_exact_identity)
export(bipartite_graph_projection)
export(build_understand_project)
export(commit_message_id_coverage)
export(community_oslom)
export(convert_pipermail_to_mbox)
Expand Down Expand Up @@ -42,6 +43,7 @@ export(example_notebook_alternating_function_in_files)
export(example_notebook_function_in_code_blocks)
export(example_renamed_file)
export(example_test_example_src_repo)
export(export_understand_dependencies)
export(filter_by_commit_interval)
export(filter_by_commit_size)
export(filter_by_file_extension)
Expand Down Expand Up @@ -189,6 +191,7 @@ export(parse_r_dependencies)
export(parse_r_function_definition)
export(parse_r_function_dependencies)
export(parse_rfile_ast)
export(parse_understand_dependencies)
export(query_src_text)
export(query_src_text_class_names)
export(query_src_text_namespace)
Expand All @@ -214,6 +217,7 @@ export(transform_gitlog_to_temporal_network)
export(transform_r_dependencies_to_network)
export(transform_reply_to_bipartite_network)
export(transform_temporal_gitlog_to_adsmj)
export(transform_understand_dependencies_to_network)
export(weight_scheme_count_deleted_nodes)
export(weight_scheme_cum_temporal)
export(weight_scheme_pairwise_cum_temporal)
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ __kaiaulu 0.0.0.9700 (in development)__

### NEW FEATURES

* `build`, `export` `parse` and `transform` functions for Scitools Understand have been added. [#308](https://github.com/sailuh/kaiaulu/issues/308)
* The GitHUB API has been expanded to use refresh, along with other functions. `github_api_project_issue_search` has been added that makes the search/issues endpoint API calls. `github_api_project_issue_or_pr_comments_by_date` and `github_api_project_issue_by_date` have been added to download issue data and comments by date ranges. `github_parse_search_issues_refresh` has been added that parses the issue data downloaded from the search endpoint in the refresh_issues folder. `github_api_project_issue_refresh` and `github_api_project_issue_or_pr_comment_refresh` were added to download issue data or comments respectively that have not already been downloaded. `format_created_at_from_file` was added to retrieve the greatest date from a JSON file. See the Reference Docs on GitHub section for more details. [#282](https://github.com/sailuh/kaiaulu/issues/282)
* `config.R` now contains a set of getter functions used to centralize the gathering of configuration data and these getter functions are used to refactor configuration file information gathering. For example, loading configuration file information with variable assignment is as follows `git_repo_path <- config_file[["version_control"]][["log"]]` but refactoring with a config.R getter function becomes `git_repo_path <- get_git_repo_path(config_file)`. [#230](https://github.com/sailuh/kaiaulu/issues/230)
* `refresh_jira_issues()` had been added. It is a wrapper function for the previous downloader and downloads only issues greater than the greatest key already downloaded. [#275](https://github.com/sailuh/kaiaulu/issues/275)
Expand Down
202 changes: 202 additions & 0 deletions R/src.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,174 @@
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at https://mozilla.org/MPL/2.0/.

############## Understand Project Builder ##############

#' Build Understand DB
#'
#' Uses Scitools Understand to create a source code project Und Database.
#'
#' @param scitools_path path to the scitools binary `und`
#' @param project_path path to the project source code folder to create the Understand DB.
#' @param language the primary language of the project (language must be supported by Understand)
#' @param output_dir path to output directory (formatted output_path/)
#'
#' @return The created Scitools Understand DB path
#' @references See pg. 352 in https://documentation.scitools.com/pdf/understand.pdf Sept. 2024 Edition
#' @export
#' @family parsers
build_understand_project <- function(scitools_path, project_path, language, output_dir){

scitools_path <- path.expand(scitools_path)

# Create variables for command line
command <- scitools_path
project_path <- shQuote(project_path) # Quoting the project path
db_dir <- file.path(output_dir, "Understand.und")
args <- c("create", "-db", db_dir, "-languages", language)

# Build the Understand project by parsing through using Understand's und command
build_output <- system2(command, args)
args <- c("-db", db_dir, "add", project_path)
db_output <- system2(command, args)
analyze_output <- args <- c("analyze", db_dir)
output <- system2(command, args)

return(db_dir)

}

#' Extract Understand Dependencies
#'
#' Extract the XML dependency file for either class or file granularity from
#' an understand DB.
#'
#' @param scitools_path path to the scitools binary `und`
#' @param db_path path to the scitools DB (see \code{\link{build_understand_project}})
#' @param parse_type Type of dependencies to generate into xml (either "file" or "class")
#' @param output_filepath path to the output XML filepath of dependencies
#'
#' @return The output directory where the db will be created, i.e. output_dir parameter.
#' @references See pg. 352 in https://documentation.scitools.com/pdf/understand.pdf Sept. 2024 Edition
#' @export
#' @family parsers
export_understand_dependencies <- function(scitools_path, db_filepath, parse_type = c("file", "class"), output_filepath){

scitools_path <- path.expand(scitools_path)

# Before running, check if parse_type is correct
parse_type <- match.arg(parse_type)

# Create the variables used in command lines
#db_dir <- file.path(understand_dir, "Understand.und")

#file_name <- paste0(parse_type, "Dependencies.xml")
#xml_dir <- file.path(db_dir, file_name)

# Generate the XML file
# Derived from pg. 352 in https://documentation.scitools.com/pdf/understand.pdf Sept. 2024 Edition
args <- c("export", "-dependencies", parse_type, "cytoscape", output_filepath, db_filepath)
output <- system2(scitools_path, args)

return(output_filepath)

# Generated XML file is assumed to be in this approximate format (regardless of parse_type) using Understand Build 1202
# <graph ...>
# ... [Irrelevant graph attributes and rdf grandchildren]
# <node id="67" label="ObjectMapper id:67">
# <att type="string" name="node.shape" value="rect"/>
# <att type="string" name="node.fontSize" value="5"/>
# <att type="string" name="node.label" value="ObjectMapper"/>
# <att type="string" name="longName" value="com.fasterxml.jackson.databind.ObjectMapper"/>
# <att type="string" name="kind" value="Unknown Class"/>
# <graphics type="RECTANGLE" h="35" w="35" x="0" y="0" fill="#ffffff" width="1" outline="#000000" cy:nodeTransparency="1.0" cy:nodeLabelFont="Default-0-8" cy:borderLineType="solid"/>
# </node>
# ... [Other nodes sharing the format]
# <edge source="2" target="9" label="App(Depends On)CalculatorUI">
# <att type="string" name="edge.targetArrowShape" value="ARROW"/>
# <att type="string" name="edge.color" value="#0000FF"/>
# <att type="string" name="canonicalName" value="App(Depends On)CalculatorUI"/>
# <att type="string" name="interaction" value="Depends On"/>
# <att type="string" name="dependency kind" value="Call, Create"/>
# </edge>
# ... [Other edges sharing the format]


}

############## Parsers ##############

#' Parse Scitools Understand Dependencies XML
#'
#' Parses either a file or class scitools understand dependency XML to table.
#'
#' @param dependencies_path path to the exported Understand dependencies file (see \code{\link{export_understand_dependencies}}).
#' @export
#' @family parsers
parse_understand_dependencies <- function(dependencies_path) {

# Parse the XML file
xml_data <- xmlParse(dependencies_path) # Creates pointer to file
xml_nodes <- xmlRoot(xml_data) # Finds the head: graph
xml_nodes <- xmlChildren(xml_nodes)
# xml_nodes now contains the nodes and edges (which were children of graph) and also graph's atts

# From child nodes- filter for those with name "node"
# Create a list by iterating through all the children in xml_nodes
node_elements <- lapply(xml_nodes, function(child) {
if (xmlName(child) == "node") { # We're searching for nodes, not att or edges
id <- xmlGetAttr(child, "id") # Extract the id from the node line
att_nodes <- xmlChildren(child) # To access the atts of the node
node_label <- xmlGetAttr(att_nodes[[3]], "value") # Relevant att is the 3rd line
long_name <- xmlGetAttr(att_nodes[[4]], "value") # Relevant att is the 4th line
return(data.table(node_label = node_label, id = id, long_name = long_name)) # Returns the table containing the filtered node data
} else {
return(NULL) # Return NULL for the entry to be filtered out later
}
})

# Remove NULLs and combine the results from the node_elements list
node_list <- rbindlist(node_elements[!sapply(node_elements, is.null)], use.names = TRUE, fill = TRUE)

# From child nodes- filter for those with name "edge"
# Create a list by iterating through all the children in xml_nodes
edge_elements <- lapply(xml_nodes, function(child) {
if (xmlName(child) == "edge") { # We're searching for edges, not att or nodes
# Extract the id_from and id_to from the edge line
id_from <- xmlGetAttr(child, "source")
id_to <- xmlGetAttr(child, "target")
att_nodes <- xmlChildren(child) # To access the atts of the edge
dependency_kind <- xmlGetAttr(att_nodes[[5]], "value") # Relevant att is the 5th line
# Error handling for empty and NULL dependency_kind (this is necessary as errors do occur even in the formatted style)
# Code correctly handles all the edges, however produces error if error handling is not included... so...
if (!is.null(dependency_kind) && dependency_kind != "") {
dependency_kind <- unlist(stri_split(dependency_kind, regex = ",\\s*")) # Separates the string into a vector
return(data.table(id_from = id_from, id_to = id_to, dependency_kind = dependency_kind)) # Returns the table containing the filtered node data
} else {
return(NULL) # Return NULL for the entry to be filtered out later
}
} else {
return(NULL) # Return NULL for the entry to be filtered out later
}
})

# Remove NULLs and combine the results from the edge_elements list
edge_list <- rbindlist(edge_elements[!sapply(edge_elements, is.null)], use.names = TRUE, fill = TRUE)

# Merge edges with nodes to get label_from
edge_list <- merge(edge_list, node_list[, .(id, node_label)], by.x = "id_from", by.y = "id", all.x = TRUE)
setnames(edge_list, "node_label", "label_from")

# Merge again to get label_to
edge_list <- merge(edge_list, node_list[, .(id, node_label)], by.x = "id_to", by.y = "id", all.x = TRUE)
setnames(edge_list, "node_label", "label_to")

# Reorder columns to have label_from and label_to on the left
edge_list <- edge_list[, .(label_from, label_to, id_from, id_to, dependency_kind)]

# Create a list of the network to return
graph <- list(node_list = node_list, edge_list = edge_list)
return(graph)
}

#' Parse dependencies from Depends
#'
Expand Down Expand Up @@ -215,6 +381,42 @@ parse_r_dependencies <- function(folder_path){

############## Network Transform ##############

#' Transform Understand Dependencies
#'
#' @description This function subsets a parsed table from parse_understand_dependencies
#'
#' @param parsed Parsed table from \code{\link{parse_understand_dependencies}}
#' @param weight_types The weight types as defined in Depends. Accepts single string and vector input
#' @export
#' @family edgelists
transform_understand_dependencies_to_network <- function(parsed, weight_types) {

nodes <- parsed[["node_list"]]
edges <- parsed[["edge_list"]]

# Create an ID column, as the file name in a label may occur
# again in other parts of the code.

nodes$node_label <- stringi::stri_c(nodes$node_label,"|",nodes$id)

edges$label_from <- stringi::stri_c(edges$label_from,"|",edges$id_from)
edges$label_to <- stringi::stri_c(edges$label_to,"|",edges$id_to)

# Filter out by weights if vector provided
if (length(weight_types) > 0) {
edges <- edges[dependency_kind %in% weight_types]
}

# If filter removed all edges:
if (nrow(edges) == 0) {
stop("Error: No edges found under weight_types.")
}

# Create a list to return
graph <- list(node_list = nodes, edge_list = edges)
return(graph)
}

#' Transform parsed dependencies into a network
#'
#' @param depends_parsed A parsed mbox by \code{\link{parse_dependencies}}.
Expand Down
4 changes: 4 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,12 @@ reference:
Notebooks for examples.
- contents:
- parse_dependencies
- build_understand_project
- export_understand_dependencies
- parse_understand_dependencies
- parse_r_dependencies
- transform_dependencies_to_network
- transform_understand_dependencies_to_network
- transform_r_dependencies_to_network
- subtitle: __Gang of Four Patterns__
desc: >
Expand Down
15 changes: 14 additions & 1 deletion conf/helix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,20 @@ tool:
# project_path: ../../rawdata/kaiaulu/git_repo/understand/
# # Where the output for the understands analysis is stored
# output_path: ../../analysis/kaiaulu/understand/

understand:
# Accepts one language at a time: ada, assembly, c/c++, c#, fortran, java, jovial, delphi/pascal, python, vhdl, basic, javascript
code_language: java
# Specify which types of Dependencies to keep
keep_dependencies_type:
- Import
- Call
- Create
- Use
- Type GenericArgument
# Where the files to analyze should be stored
project_path: ../../rawdata/helix/git_repo/helix/
# Where the output for the understands analysis is stored
output_path: ../../analysis/helix/understand/
# Analysis Configuration #
analysis:
# You can specify the intervals in 2 ways: window, or enumeration
Expand Down
15 changes: 0 additions & 15 deletions conf/kaiaulu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -208,21 +208,6 @@ tool:
# 3. Use sudo ./gradlew build
# 4. After building, locate the engine class files and specify as the class_folder_path:
# in this case they are in: /path/to/junit5/analysis/junit-platform-engine/build/classes/java/main/org/junit/platform/engine/
understand:
# Accepts one language at a time: ada, assembly, c/c++, c#, fortran, java, jovial, delphi/pascal, python, vhdl, basic, javascript
code_language: java
# Specify which types of Dependencies to keep
keep_dependencies_type:
- Import
- Call
- Create
- Use
- Type GenericArgument
# Where the files to analyze should be stored
project_path: ../../rawdata/kaiaulu/git_repo/understand/
# Where the output for the understands analysis is stored
output_path: ../../analysis/kaiaulu/understand/


# Analysis Configuration #
analysis:
Expand Down
46 changes: 46 additions & 0 deletions man/build_understand_project.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit ac522b6

Please sign in to comment.