Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy API Integration #499

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Policy API Integration #499

wants to merge 12 commits into from

Conversation

rlxdev
Copy link
Collaborator

@rlxdev rlxdev commented Nov 14, 2024

ScubaGoggles with Google's Policy API Implementation

This is the complete integration of the version of ScubaGoggles containing the
implementation of Googles's Policy API.

  • All Rego Tests Pass, including 185 new tests covering the baselines
    implemented by the Policy API.
  • Regal runs on all Rego code with no issues.
  • PYLint runs on all Python code with no issues.
  • Mitchel's "smoke" tests pass (locally, after modifying for credentials &
    scubagws tenant).
  • This version has been tested on Windows 10, with varying Python versions,
    and RedHat RHEL 8 linux.

Changes Introduced with this Version

Google Policy API

The Google Policy API is used for all baselines that can be assessed by the data
returned by the API. Google returns all settings available for the top-level
orgunit, and then only the settings that differ from the top-level orgunit for
sub-orgunits and groups.

The Policy API data is available to the rego code in the input data in the
"policies" section. The settings are grouped by orgunit or orgunit/group name.
There is no distinction between orgunits and groups.

The Policy API is queried using Google's AuthorizedSession class because
access to the API is not yet available using the Google API client, which is
used for the other APIs queried by ScubaGoggles (reports, directory, groups).
There have been no issues using this method to access the Policy API.

Because of the way Rego handles undefined variables, basically returning
undefined when a variable is not defined, the Python code fetching the Policy
API data must verify that all expected settings are present in the top-level
orgunit's data. It's expected that missing settings will be rare, but they
can't be ignored because missing settings will lead to incorrect results.
ScubaGoggles doesn't have to abort on any missing settings, so currently a
warning is issued for any missing settings.

As with the log events, the Rego code does the work to determine whether the
requirement has been met for each baseline covered by the Policy API. In
implementing the Policy API in Rego, the log event code has been kept along
with the new Policy API code for debugging/testing during the development of
the Policy API implementation. This code will be removed in a subsequent
release.

ScubaGoggles Package Distribution & Installation

This version of ScubaGoggles is able to be packaged using the standard Python
packaging software, using the setuptools back-end. Being able to install
ScubaGoggles and its dependencies as a package using pip makes the
installation much easier for users and provides a separation between the
distributed code and the user's data (credentials and reports).

Unfortunately, the current configuration in the repository does not conform to a
structure that can be packaged. This has led to subtle issues, such as circular
import errors when running ScubaGoggles, and the incorrect placement of
ScubaGoggles-related directories in the top-level site-packages directory in
the Python environment.

For this version, directories that are included in the package have been
moved below the scubagoggles directory:

  • baselines
  • docs
  • rego
  • sample-config-files
  • sample-report
  • Testing
  • utils

The current use of setup.py, with package configuration included in the file,
is deprecated by setuptools. This version now uses pyproject.toml for
package configuration.

The package is built using the scubagoggles/utils/build.sh script. It
produces a "wheel" file (.whl) that is installable using pip.

In addition to installing ScubaGoggles, users have to specify:

  • location where the reports will be created.
  • Google OAuth credentials file (credentials.json).
  • location of the OPA executable.

For this version, users will run scubagoggles setup to provide those
locations. Unless the user explicitly changes a location, the setup is a
one-time step, even after software updates.

What are the advantages of making ScubaGoggles an installable package:

  • Package installation is easier for users. They will not have to install
    dependent packages separately.
  • User's will not have to change their working directory to run ScubaGoggles.
  • The scubagoggles command will always work and there won't be a need to have
    them try Python scuba.py.
  • Upgrading ScubaGoggles will be easy and can be done in an existing (virtual)
    environment. User's will not have to relocate files, such as the
    credentials.json or OPA executable.

Versioning

There are two forms of the version number used for ScubaGoggles: the software
version number (e.g., "0.3.0"), and what I call the "version suffix" that
proliferates throughout the files referencing baseline identifiers. There are
multiple places in the code that define the version number. Whenever there is a
version upgrade, the change touches nearly every file in the repository.

For this version, the "official" version number is defined in only one place
(scubagoggles/__init__.py). All other references to the version number are
derived from this value. The version number is managed using the scubagoggles version command.

The Python code must use the Version class when referencing the version
number. For the Rego code, the version suffix used in the baseline identifiers
is accessible in the input written by the Python code.

Baseline identifiers in the Rego code are defined once in a variable for each
baseline, and all references to the identifier are to use this variable
(including the test Rego code). For example:

CommonControlsId3_2 := utils.PolicyIdWithSuffix("GWS.COMMONCONTROLS.3.2")

The utils.PolicyIdWithSuffix() function handles the addition of the version
suffix at the end of the identifier. There must be no reference to the
version suffix in any comments in the Rego code (in fact, there should be
no occurrences of the hard-coded version suffix in any Rego file). It
is also not necessary to reference a specific version suffix in the user
documentation.

The "drift rules" and baseline Markdown files will continue to have the
version suffix included in the baseline version. The version suffixes in
these files will be modified as part of the version upgrade process.

When a version upgrade occurs, these are the only files that are modified
in the process:

  • README.md
  • drift-rules/GWS Drift Monitoring Rules - Calendar.csv
  • drift-rules/GWS Drift Monitoring Rules - Chat.csv
  • drift-rules/GWS Drift Monitoring Rules - Classroom.csv
  • drift-rules/GWS Drift Monitoring Rules - Common Controls as of 11-14-23.csv
  • drift-rules/GWS Drift Monitoring Rules - Drive and Docs.csv
  • drift-rules/GWS Drift Monitoring Rules - Gmail.csv
  • drift-rules/GWS Drift Monitoring Rules - Groups.csv
  • drift-rules/GWS Drift Monitoring Rules - Meet.csv
  • drift-rules/GWS Drift Monitoring Rules - Sites.csv
  • scubagoggles/__init__.py
  • scubagoggles/baselines/calendar.md
  • scubagoggles/baselines/chat.md
  • scubagoggles/baselines/classroom.md
  • scubagoggles/baselines/commoncontrols.md
  • scubagoggles/baselines/drive.md
  • scubagoggles/baselines/gmail.md
  • scubagoggles/baselines/groups.md
  • scubagoggles/baselines/meet.md
  • scubagoggles/baselines/sites.md

Pandas Dependency Dropped

The pandas module was included as a requirement because it was used to create
an HTML table for the reports. Pandas is a "beast" of a module, and also
requires another "beast" in numpy. The table creation is easy in this case,
and it allows us to drop a module that really isn't being used for its
intended purpose (data analysis). Having pandas dropped as a requirement also
lessens the potential issues users will have when installing ScubaGoggles.

No "sudo" for Running OPA

There was some issue with OPA in the past that apparently required the use of
"sudo" (privilege escalation) to run it on macOS and perhaps linux. This
issue seems to be fixed, so the ability to use "sudo" for running OPA has
been removed. OPA is basically a language interpreter like Python - they are
user mode programs that should never require privilege escalation. If this
happens again, the onus is on OPA to fix the issue, rather than trying to
work-around it.

Python Logging Facility Enabled

The Python logging facility has been enabled, so logging may be inserted into
the code where appropriate using log.{debug, info, warning, error, critical}()
function calls. The current default level that is displayed to the user is
warning (meaning that warning level messages and above will be displayed and
not lower level (debug, info)), but this can be altered on the command line
using the --log option.

Downloading OPA

With users installing ScubaGoggles as a Python package, they can't easily
run the download_opa.py script. To continue to provide users the ability
to download OPA, the scubagoggles getopa command performs the same task, but
also ensures the executable has the correct permissions to run on macOS and
linux.

Ability to Purge Output Directories

This might be more useful for developers running ScubaGoggles often for testing,
but there's now the ability via scubagoggles purge to delete older versions
of the reports output directories.

Closes #498
Closes #449
Closes #438
Closes #360
Closes #430
Closes #412
Closes #305

Follow-up issues

🧪 Testing

These changes have been continuously tested during the implementation of the Policy API settings released by Google.
As mentioned in the description, all automated tests (PYLint, regal, Rego tests) pass.

✅ Pre-approval checklist

  • This PR has an informative and human-readable title.
  • If applicable, All future TODOs are captured in issues, which are referenced in the PR description.
  • The relevant issues PR resolves are linked preferably via closing keywords.
  • All relevant type-of-change labels have been added.
  • I have read and agree to the CONTRIBUTING.md document.
  • These code changes follow cisagov code standards.
  • All relevant repo and/or project documentation has been updated to reflect the changes in this PR.
  • Tests have been added and/or modified to cover the changes in this PR.
  • All new and existing tests pass.

✅ Pre-merge Checklist

  • This PR has been smoke tested to ensure main is in a functional state when this PR is merged.
  • Squash all commits into one PR level commit using the Squash and merge button.

✅ Post-merge Checklist

  • Delete the branch to clean up.
  • Close issues resolved by this PR if the closing keywords did not activate.

@rlxdev rlxdev linked an issue Nov 14, 2024 that may be closed by this pull request
@adhilto
Copy link
Collaborator

adhilto commented Nov 18, 2024

Tagging @mdueltgen @jkaufman-mitre for awareness, this PR contains baseline changes. No substantive changes to the policies, just changes along the line of changing the policies section headers to "Policies" for consistency.

Copy link
Collaborator

@adhilto adhilto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've barely scratched the surface of the testing that needs to be done, but here's a first round of feedback.

Two types of feedback:

  • Change now
  • Change soon but after this PR.

I've left individual comments on changes that should be made before this PR is merged.
I'm listing the changes that can wait here in this comment. In the spirit of keeping the scope of this PR from expanding any further, these changes can wait but should be tracked as issues before this PR is merged.

Simplify the rego code
Controls that have policy API support should exclusively use the policy API. No need to maintain the old reports code. Would greatly simplify the code. (I know you note this in the description but we still need to track it as an issue).

Rework the baseline version logic.
This code assumes that all controls will have the same version number. This is not a safe assumption. Pre-version 1.0, this will always be the case. But after the 1.0 release, if a control needs to be updated, only the version number of that control will be updated. For reference, see the Exchange Online baseline. Note that not all controls have the same version number (e.g., MS.EXO.1.1v1 and MS.EXO.2.2v2).

Update the "prerequisites" field of the rego tests
As noted in a comment in the reporter, "If Prerequisites is not defined, assume the test just depends on the reports API." This is no longer a safe assumption.

Update the report details message for controls relying on log events
Clearly indicate the controls that can only be checked via the log events. These checks give imperfect visibility which should be highlighted to the users. For example, if the log events for the top level OU is show compliance, ScubaGoggles will currently report "Requirement met for all OUs and groups," but in reality we cannot know that with full confidence.

scubagoggles/auth.py Outdated Show resolved Hide resolved
scubagoggles/docs/prerequisites/Prerequisites.md Outdated Show resolved Hide resolved
scubagoggles/docs/troubleshooting/Troubleshooting.md Outdated Show resolved Hide resolved
scubagoggles/docs/usage/Config.md Outdated Show resolved Hide resolved
scubagoggles/docs/usage/Config.md Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be reworked. With this update, ScubaGoggles now takes two conflicting config files, each playing different roles.
The ".scubagoggles" config created by the setup utility and the config file users can create and indicate using the
-config parameter. The .scubagoggles config only takes credentials, opa_dir, and output_dir. The config file
pointed to by the -config parameter can specify all CLI args (including those three locations, but using the names
spelled out by the cli: credentials, opapath , and outputpath).

  • First, the .scubagoggles config parameters should match the parameters used for the CLI.
  • Second, why do we need the .scubaconfig config file at all? All the features it covers are already provided by the
    config file option already built into ScubaGoggles.
    I'd recommend removing this feature altogether as the funtionality of .scubagoggles is already built in to ScubaGoggles.

I understand the desire to improve the user experience, but this misses the mark.

My recommendations:

  • Short term: cut this feature from ScubaGoggles
  • Long term: After this PR is merged in, we discuss as a team what approach we want to take to improve the setup experience, then implement whatever approach we decide on as a team.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this as conflicting. The one I've implemented is basically a one-time configuration for the user to provide the 3 things we need from them in order to run: the output directory location, the OPA executable directory, and the location of the credentials file. The other configuration is basically a run-time configuration that alters how ScubaGoggles runs, and I can see that the user may have more than one of those configuration files in use for different runs.

The user does not even need to really know about the "tool" configuration file and the intent was for it to be modified by the setup command. These settings apply across runs of ScubaGoggles. It's possible that the user might want to change the output directory location, but in practice I think it's not that likely but it can be done on the command line.

This does simplify the user's initial setup and I don't see a conflict with the baseline configuration, so I don't see the need to change this.

Copy link
Collaborator

@adhilto adhilto Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach I would recommend we take:

  • The setup utility creates a .scubagoggles folder in the user's home directory
  • The setup utility automatically downloads the OPA executable and stores it in the .scubagoggles folder. `
  • ~/.scubagoggles is made the default OPA location.

Further notes:

  • When I say the setup utility should automatically download the OPA executable, I'm not saying we should remove the separate scubagoggles getopa function, users should still be able to call that function directly if desired. But if the setup utility automatically invokes the getopa function, that's one less thing for the user to worry about.
  • We could make the setup utility also create or ask the user to specify a default output folder, but I don't see that as essential. If we did that then yes, we still would need a .scubagoggles file, but within the .scubagoggles folder.
  • We could also make the setup utility ask for the location of the user's credentials file, but I think that should be optional, as some users might have multiple organizations they manage and would prefer to specify which credentials to use each time. Either way, if the setup utility is going to ask the user for the location of the credentials file, it should do so after the user has created it (currently the README tells them to run the setup utility first, which I think will confuse users, since it asks them for a file they don't have yet).

scubagoggles/scuba_argument_parser.py Outdated Show resolved Hide resolved
scubagoggles/auth.py Show resolved Hide resolved
scubagoggles/docs/prerequisites/Prerequisites.md Outdated Show resolved Hide resolved
scubagoggles/main.py Outdated Show resolved Hide resolved
@rlxdev
Copy link
Collaborator Author

rlxdev commented Nov 19, 2024

Simplify the rego code Controls that have policy API support should exclusively use the policy API. No need to maintain the old reports code. Would greatly simplify the code. (I know you note this in the description but we still need to track it as an issue).

This has been the plan all along. I approached this conservatively and preserved the log event code to keep the testing working.
My thought is to remove this in the 1.1 version, and it can be done in a methodical way throughout.

Rework the baseline version logic. This code assumes that all controls will have the same version number. This is not a safe assumption. Pre-version 1.0, this will always be the case. But after the 1.0 release, if a control needs to be updated, only the version number of that control will be updated. For reference, see the Exchange Online baseline. Note that not all controls have the same version number (e.g., MS.EXO.1.1v1 and MS.EXO.2.2v2).

The way I've implemented the "version suffix" changes doesn't preclude the possibility of having different suffixes for certain baseline policies. When we do end up having different suffixes on the baselines, it will not require much additional coding. It must be implemented such that there is only one source of the baseline version, most likely the markdown file where the baseline is defined. The correct suffix for the policies can be obtained from the source and driven across the system through the code instead of hard-coding the suffix in multiple places.

@adhilto adhilto mentioned this pull request Nov 25, 2024
9 tasks
@@ -29,42 +33,47 @@ We use a three-step process:
3. **Report**. Package the results as HTML and JSON.

## Limitations of the tool
The majority of the conformance checks done by ScubaGoggles rely on [GWS Admin log events](https://support.google.com/a/answer/4579579?hl=en). If there is no log event corresponding to a SCuBA baseline policy, ScubaGoggles will indicate that the setting currently can not be checked on its HTML report output. In this situation, we recommend you manually review your GWS security configurations with the SCuBA security baselines. See [Limitations](/docs/usage/Limitations.md) for more details.
The majority of the conformance checks done by ScubaGoggles rely on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The majority of the conformance checks done by ScubaGoggles rely on
Some of the conformance checks done by ScubaGoggles rely on

--version <OPA-version>, -v <OPA-version>
Version of OPA to download (default: latest version)
--opa_directory <directory>, -r <directory>
Directory containing OPA executable (default: location established by setup
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Directory containing OPA executable (default: location established by setup
Directory containing OPA executable (default: location established by setup)

```
```
# example
scubagoggles getopa -v v0.60.0 -c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scubagoggles getopa -v v0.60.0 -c
scubagoggles getopa -v v0.60.0

Comment on lines +20 to +21
Admin role automatically have the privilege to consent to these scopes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence is duplicated

Suggested change
time you will be prompted to consent to these API scopes. Users with the Super
Admin role automatically have the privilege to consent to these scopes.
time you will be prompted to consent to these API scopes.

return

@staticmethod
def validate_config(args : argparse.Namespace) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying opapath in a config file now gives AttributeError: 'str' object has no attribute 'resolve'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants