Skip to content

Commit

Permalink
Incorporating Gradle tasks for downloading and automatically installi…
Browse files Browse the repository at this point in the history
…ng JEP binaries for the Python frontend

### What's Done:
1) Introduced the `installJep` task to facilitate the automatic installation of JEP from [https://github.com/icemachined/jep-distro/](https://github.com/icemachined/jep-distro/).

2) Centralized and encapsulated the logic for JEP search and installation: all the relevant logic has been relocated to the Gradle task, separating it from the main code. The main code will now simply utilize the JEP stored in the `build/jep` directory.
  • Loading branch information
orchestr7 committed Dec 24, 2023
1 parent d1d86e6 commit 20d86bc
Show file tree
Hide file tree
Showing 4 changed files with 182 additions and 81 deletions.
31 changes: 31 additions & 0 deletions cpg-language-python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
### CPG Python

This module is essential for setting up the Python frontend for CPG.
Python parsing is accomplished using native Python, necessitating the installation of [JEP](https://github.com/ninia/jep) for proper JNI functionality. Follow the instructions below to install JEP and enable the frontend.

### Installation

1. This module is disabled by default, so you need to enable it by adding the following to the root `gradle.properties`:

```plaintext
enablePythonFrontend=true
```
2. If you have an x86 architecture, no further action is required as JEP will be automatically installed for Python 3.9-3.11
from [https://github.com/icemachined/jep-distro/releases/](https://github.com/icemachined/jep-distro/releases/).
Check for the latest supported version of Python or JEP on that page. Python version can be set with `-Ppython` property:
`./gradlew cpg-language-python:build -Ppython=3.12`, the default is `3.10`
If you have a different architecture or wish to use another version of JEP or Python, you will need to install it manually. Here are potential solutions:
- Utilize Homebrew/pip package manager to simplify the JEP installation process:
```plaintext
brew install pip3
pip3 install jep
```
- Create a [virtual environment](https://docs.python.org/3/library/venv.html) with the specified environment variable `CPG_PYTHON_VIRTUALENV`
set to `/(user.home)/.virtualenv/CPG_PYTHON_VIRTUALENV`.
- Manually install JEP and specify the `CPG_JEP_LIBRARY` environment variable with the appropriate path to the installation.
3. `./gradlew cpg-language-python:build`
118 changes: 117 additions & 1 deletion cpg-language-python/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
import java.nio.file.Path
import kotlin.io.path.*
import kotlin.io.path.Path
import kotlin.io.path.exists

/*
* Copyright (c) 2021, Fraunhofer AISEC. All rights reserved.
*
Expand All @@ -23,6 +28,15 @@
* \______/ \__| \______/
*
*/

val jepDistroVersion = "4.1.1"
// Python version can be set with `-Ppython` property: `./gradlew cpg-language-python:build -Ppython=3.12`, default is 3.10
val pythonVersion = if (project.hasProperty("python")) project.property("python") else "3.10"
val jepCpgDir: Path = Path(projectDir.path, "build", "jep")
val regularJepPathInVirtualEnv: Path = Path(
"lib", "python${pythonVersion}", "site-packages", "jep"
)

plugins {
id("cpg.frontend-conventions")
}
Expand All @@ -39,10 +53,112 @@ publishing {
}
}

/**
* Selecting JEP from existingExtracting JEP native binaries provided by https://github.com/icemachined/jep-distro
* to the 'cpg-language-python/jep' directory. Python version can be set with `-Ppython` property:
* `./gradlew cpg-language-python:build -Ppython=3.12`, the default is `3.10`
*/
tasks.register("installJep", Copy::class) {
// If the `build/jep` directory already exists, we will remove it; otherwise, this could
// cause cache-like issues when this directory is cached and not updated
if (jepCpgDir.exists()) {
@OptIn(ExperimentalPathApi::class) jepCpgDir.deleteRecursively()
}

// Python and JEP from a Virtual Environment: lib/python***/site-packages/jep
val jepInVirtualEnvPath = Path(
System.getProperty("user.home"), ".virtualenvs", System.getenv("CPG_PYTHON_VIRTUALENV") ?: "cpg"
) / regularJepPathInVirtualEnv

// It's straightforward to install JEP on macOS using Homebrew, and the relative directory will align with that in a Virtual Env.
val homeBrewPath = Path("/opt", "homebrew") / regularJepPathInVirtualEnv

// If the user has specified the environment variable CPG_JEP_LIBRARY with the path to JEP, we will prioritize reading it
val explicitJepLibraryPathFromEnv = System.getenv("CPG_JEP_LIBRARY")?.let { Path(it) }

// Priority order:
// 1. explicitly provided JEP Path from env. variable
// 2. JEP from virtual env.
// 3. JEP from mac's homebrew
val jepPath = when {
explicitJepLibraryPathFromEnv != null -> if (explicitJepLibraryPathFromEnv.exists()) explicitJepLibraryPathFromEnv else throw IllegalStateException(
"CPG_JEP_LIBRARY environment variable set as '${explicitJepLibraryPathFromEnv}' but the path does not exist."
)

jepInVirtualEnvPath.exists() -> jepInVirtualEnvPath
homeBrewPath.exists() -> homeBrewPath
else -> null
}

// Based on the host OS we will determine the extension for the JEP binary
val os = System.getProperty("os.name")
val jepBinary = "libjep." + when {
os.contains("Mac") -> "jnilib"
os.contains("Linux") -> "so"
os.contains("Windows") -> "dll"
else -> throw IllegalStateException("Cannot install JEP for this operating system: [$os]")
}

// If JEP already exists on the current host machine, there's no need to copy it from the distribution archive.
// We will copy it as-is to the build/jep directory. Otherwise, we will extract an archive from the distribution.
// Please note that this distribution contains only x86 builds, meaning it won't work for ARM machines (like Mac M1 and later generations).
if (jepPath != null && (jepPath / jepBinary).exists()) {
from(fileTree(jepPath))
into(jepCpgDir)
} else {
// As mentioned above, ARM machines are not supported, so you need to provide your own JEP distribution
// in that case with CPG_JEP_LIBRARY, or other installations like homebrew, virtual env., etc.
if (!System.getProperty("os.arch").contains("x86")) {
throw IllegalStateException(
"""
| Your system architecture, identified as <${System.getProperty("os.arch")}>, is not supported in icemachined/jep-distro.
| Consequently, you are required to install it manually. Here are the potential solutions:
| 1. Utilize Homebrew/pip package manager to facilitate the JEP installation process;
| 2. Create a virtual environment with the specified environment variable CPG_PYTHON_VIRTUALENV set to /(user.home)/.virtualenv/CPG_PYTHON_VIRTUALENV;
| 3. Manually install JEP and specify the CPG_JEP_LIBRARY environment variable with the appropriate path to the installation.
""".trimMargin()
)
}

// We added com.icemachined:jep-distro as a dependency, so TGZ archive containing the JEP distribution is downloaded by Gradle.
// This archive is stored in the Gradle dependency storage, allowing us to unpack it and copy its contents to the build/jep directory.
val jepDistroFromDependencies = File(configurations.compileClasspath.get().asFileTree.map { it.path }
.find { it.endsWith("jep-distro-cp$pythonVersion-$jepDistroVersion.tgz") }!!)
from(tarTree(jepDistroFromDependencies))
into(jepCpgDir.parent)
}
}

/**
* All tasks that are related to the compilation will depend on the installation of JEP
*/
tasks.named("compileKotlin") {
dependsOn("installJep")
}

tasks.named("sourcesJar") {
dependsOn("installJep")
}

tasks.named("processTestResources") {
dependsOn("installJep")
}

tasks.named("spotlessKotlin") {
dependsOn("installJep")
}

// In Python CPG spotlessGolang task is not needed, but anyway is incorrectly added to each project
tasks.named("spotlessGolang").configure {
enabled = false
}

dependencies {
// jep for python support
api(libs.jep)

// JNI binaries for JEP made by @icemachined and published to central (just all binaries in one archive)
implementation("com.icemachined:jep-distro-cp$pythonVersion:$jepDistroVersion")
// to evaluate some test cases
testImplementation(project(":cpg-analysis"))
}

Original file line number Diff line number Diff line change
Expand Up @@ -25,96 +25,50 @@
*/
package de.fraunhofer.aisec.cpg.frontends.python

import java.io.File
import java.lang.RuntimeException
import java.nio.file.Path
import java.nio.file.Paths
import java.nio.file.FileSystems
import jep.JepConfig
import jep.MainInterpreter
import jep.SharedInterpreter
import kotlin.io.path.absolutePathString
import kotlin.io.path.div
import kotlin.io.path.exists

/**
* Takes care of configuring Jep according to some well known paths on popular operating systems.
*/
object JepSingleton {
init {
// TODO logging
// TODO: add proper logging
val config = JepConfig()

config.redirectStdErr(System.err)
config.redirectStdout(System.out)

System.getenv("CPG_JEP_LIBRARY")?.let {
val library = File(it)
if (library.exists()) {
MainInterpreter.setJepLibraryPath(library.path)
config.addIncludePaths(library.path)
} else {
throw RuntimeException(
"CPG_JEP_LIBRARY environment variable defined as '${library}' but it does not exist."
)
}
}

val virtualEnvName = System.getenv("CPG_PYTHON_VIRTUALENV") ?: "cpg"
val virtualEnvPath =
Paths.get(System.getProperty("user.home"), ".virtualenvs", virtualEnvName)
val pythonVersions = listOf("3.9", "3.10", "3.11", "3.12", "3.13")
val wellKnownPaths = mutableListOf<Path>()
pythonVersions.forEach { version ->
// Linux
wellKnownPaths.add(
Paths.get(
"$virtualEnvPath",
"lib",
"python${version}",
"site-packages",
"jep",
"libjep.so"
)
)
// Mac OS
wellKnownPaths.add(
Paths.get(
"$virtualEnvPath",
"lib",
"python${version}",
"site-packages",
"jep",
"libjep.jnilib"
)
)
wellKnownPaths.add(
Paths.get(
"$virtualEnvPath",
"lib",
"python${version}",
"site-packages",
"jep",
"libjep.dll"
)
)
}
// try system-wide paths, too
// TODO: is this still needed?
wellKnownPaths.add(Paths.get("/", "usr", "lib", "libjep.so"))
wellKnownPaths.add(Paths.get("/", "Library", "Java", "Extensions", "libjep.jnilib"))

wellKnownPaths.forEach {
if (it.exists()) {
// Jep's configuration must be set before the first instance is created. Later
// calls to setJepLibraryPath and co result in failures.
MainInterpreter.setJepLibraryPath(it.toString())
// To understand how JEP is installed under the hood, check `installJep` task in
// build.gradle.kts. But the main idea is that it is always copied to `build/jep` directory.
val jepLocation = FileSystems.getDefault().getPath("build", "jep")
// Based on the host OS we will determine the extension for the JEP binary
val os = System.getProperty("os.name")
val jepBinaryPath =
jepLocation /
("libjep." +
when {
os.contains("Mac") -> "jnilib"
os.contains("Linux") -> "so"
os.contains("Windows") -> "dll"
else ->
throw IllegalStateException(
"Cannot setup JEP for this operating system: [$os]"
)
})
if (jepBinaryPath.exists()) {
// Jep's configuration must be set before the first instance is created. Later
// calls to setJepLibraryPath and co result in failures.
MainInterpreter.setJepLibraryPath(jepBinaryPath.absolutePathString())

// also add include path so that Python can find jep in case of virtual environment
// fixes: jep.JepException: <class 'ModuleNotFoundError'>: No module named 'jep'
if (
it.parent.fileName.toString() == "jep" &&
(Paths.get(it.parent.toString(), "__init__.py").exists())
) {
config.addIncludePaths(it.parent.parent.toString())
}
// also add include path so that Python can find jep in case of virtual environment
// fixes: jep.JepException: <class 'ModuleNotFoundError'>: No module named 'jep'
if ((jepLocation / "__init__.py").exists()) {
config.addIncludePaths(jepLocation.parent.absolutePathString())
}
}

Expand Down
10 changes: 5 additions & 5 deletions tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ A code property graph (CPG) is a representation of source code in form of a labe

## CPG Console

While the CPG tool is mostly used as a libary in external tools, such as [Codyze](http://github.com/Fraunhofer-AISEC/codyze), we decided to showcase its functionalities with a simple CLI based console that can be used to query the graph and run simple analysis steps.
While the CPG tool is mostly used as a library in external tools, such as [Codyze](http://github.com/Fraunhofer-AISEC/codyze), we decided to showcase its functionalities with a simple CLI based console that can be used to query the graph and run simple analysis steps.

To launch the console, first build it according to the instructions in our `README.md` and then run `bin/cpg-console`. You will be greeted by the interactive prompt of our console, which is implemented by the kotlin `ki` interactive shell. The commands on this shell follow the syntax of the Kotlin language. For more information please see the [Kotlin documentation](https://kotlinlang.org/docs/home.html).

Expand Down Expand Up @@ -82,7 +82,7 @@ Most interesting for the user is the `result` object which holds the complete tr

### Querying the translation result

In the following, we will use the aforementioned objects to query the source code for interesting patterns. To do so, we will explore several built-in functions that can be used in exploring the graph. The first of these, is the `all` function, it returns a list of all nodes that are direct descendents of a particular node, basicically flattening the hierarchy.
In the following, we will use the aforementioned objects to query the source code for interesting patterns. To do so, we will explore several built-in functions that can be used in exploring the graph. The first of these, is the `all` function, it returns a list of all nodes that are direct descendants of a particular node, basically flattening the hierarchy.

```kotlin
[3] result.all()
Expand Down Expand Up @@ -126,7 +126,7 @@ This also demonstrates quite nicely, that queries on the CPG work independently

### Looking for software errors

In a next step, we want to identify, which of those expression are accessing an array index that is greater than its capacity, thus leading to an error. From the code output we have seen before we can already identify two array indicies: `0` and `11`. But the other two are using a variable `b` as the index. Using the `evaluate` function, we can try to evaluate the variable `b`, to check if it has a constant value.
In a next step, we want to identify, which of those expression are accessing an array index that is greater than its capacity, thus leading to an error. From the code output we have seen before we can already identify two array indices: `0` and `11`. But the other two are using a variable `b` as the index. Using the `evaluate` function, we can try to evaluate the variable `b`, to check if it has a constant value.

```kotlin
[6] result.all<SubscriptExpression>().map { it.subscriptExpression.evaluate() }
Expand Down Expand Up @@ -184,7 +184,7 @@ Using the already known `:code` command, we can also show the relevant code loca

### Futher analysis

Because the manual analyis we have shown can be quite tedious, we already included several example analyis steps that can be performed on the currently loaded graph. They can be executed by running the `:run` command. This includes the aforementioned check for out of bounds as well as check for null pointers and will be extended in the future.
Because the manual analysis we have shown can be quite tedious, we already included several example analysis steps that can be performed on the currently loaded graph. They can be executed by running the `:run` command. This includes the aforementioned check for out-of-bounds as well as check for null pointers and will be extended in the future.

```kotlin
[11] :run
Expand Down Expand Up @@ -233,6 +233,6 @@ Then, additional tools, such as the Neo4j browser can be used to further explore

## Conclusion

In conclusion, the CPG tool can be used to translate source code of different programming languages to a uniform, language-independed represetation in the form of a code property graph. It can either be used as a library, in which it forms the underlying basis of the [Codyze](http://github.com/Fraunhofer-AISEC/codyze) analyizer or it's console can be used to quickly explore source code and find weaknesses.
In conclusion, the CPG tool can be used to translate source code of different programming languages to a uniform, language-independent representation in the form of a code property graph. It can either be used as a library, in which it forms the underlying basis of the [Codyze](http://github.com/Fraunhofer-AISEC/codyze) analyzer or it's console can be used to quickly explore source code and find weaknesses.

It is available as open source on GitHub: https://github.com/Fraunhofer-AISEC/cpg

0 comments on commit 20d86bc

Please sign in to comment.