Skip to content

Commit

Permalink
Merge pull request #3 from datamol-io/refactoring
Browse files Browse the repository at this point in the history
API Refactoring
  • Loading branch information
hadim authored Jul 10, 2023
2 parents 1832035 + f18ba22 commit 68caf43
Show file tree
Hide file tree
Showing 91 changed files with 14,809 additions and 23,094 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ jobs:
python=${{ matrix.python-version }}
- name: Install library
run: python -m pip install --no-deps .
run: python -m pip install --no-deps -e .

- name: Run tests
run: pytest

# - name: Test CLI
# run: medchem --help
- name: Test CLI
run: medchem --help

- name: Test building the doc
run: mkdocs build
Empty file removed CHANGELOGS.md
Empty file.
13 changes: 1 addition & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,6 @@

Medchem is a Python library that proposes multiple molecular medchem filters to a wide range of use cases relevant in a drug discovery context.

- ✅ xxxx
- 🐍 xxxx
- ⚗️ xxxx
- 🧠 xxxx
- ⮔ xxxx
- 🔌 xxxx

## Installation

```bash
Expand All @@ -37,7 +30,7 @@ Visit <https://medchem-docs.datamol.io/>.
micromamba create -n medchem -f env.yml
micromamba activate medchem

pip install -e .
pip install --no-deps -e .
```

### Tests
Expand All @@ -48,10 +41,6 @@ You can run tests locally with:
pytest
```

## Changelogs

See the latest changelogs at [CHANGELOGS.md](./CHANGELOGS.md).

## License

Under the Apache-2.0 license. See [LICENSE.md](LICENSE.md).
16 changes: 16 additions & 0 deletions docs/_assets/css/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ td p {
--md-footer-fg-color--light: var(--custom-secondary);
--md-footer-fg-color--lighter: var(--custom-secondary);


}

.md-header {
Expand Down Expand Up @@ -124,3 +125,18 @@ td p {
.md-container .jp-Cell-inputWrapper .jp-InputPrompt.jp-InputArea-prompt {
display: none !important;
}

/* Increase the width */
.md-grid {
max-width: 75rem;
}

/* Jupyter Tweaks */

.jupyter-wrapper {
--jp-code-font-size: 0.8em !important;
}

.jupyter-wrapper table.dataframe {
font-size: 0.7em !important;
}
3 changes: 0 additions & 3 deletions docs/api/medchem.alerts.md

This file was deleted.

3 changes: 0 additions & 3 deletions docs/api/medchem.catalog.md

This file was deleted.

6 changes: 6 additions & 0 deletions docs/api/medchem.catalogs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# `medchem.catalogs`

::: medchem.catalogs.list_named_catalogs
::: medchem.catalogs.merge_catalogs
::: medchem.catalogs.catalog_from_smarts
::: medchem.catalogs.NamedCatalogs
12 changes: 6 additions & 6 deletions docs/api/medchem.complexity.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# `medchem.rules`
# `medchem.complexity`

::: medchem.complexity.complexity_filter

---

::: medchem.complexity._complexity_calc
::: medchem.complexity.ComplexityFilter
::: medchem.complexity.WhitlockCT
::: medchem.complexity.BaroneCT
::: medchem.complexity.SMCM
::: medchem.complexity.TWC
3 changes: 3 additions & 0 deletions docs/api/medchem.constraints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `medchem.constraints`

::: medchem.constraints.Constraints
3 changes: 0 additions & 3 deletions docs/api/medchem.demerits.md

This file was deleted.

7 changes: 0 additions & 7 deletions docs/api/medchem.filter.md

This file was deleted.

19 changes: 19 additions & 0 deletions docs/api/medchem.functional.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# `medchem.functional`

::: medchem.functional.alert_filter
::: medchem.functional.nibr_filter
::: medchem.functional.catalog_filter
::: medchem.functional.chemical_group_filter
::: medchem.functional.rules_filter
::: medchem.functional.complexity_filter
::: medchem.functional.bredt_filter
::: medchem.functional.molecular_graph_filter
::: medchem.functional.lilly_demerit_filter
::: medchem.functional.protecting_groups_filter
::: medchem.functional.macrocycle_filter
::: medchem.functional.atom_list_filter
::: medchem.functional.ring_infraction_filter
::: medchem.functional.num_atom_filter
::: medchem.functional.num_stereo_center_filter
::: medchem.functional.halogenicity_filter
::: medchem.functional.symmetry_filter
5 changes: 4 additions & 1 deletion docs/api/medchem.groups.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# `medchem.groups`

::: medchem.groups
::: medchem.groups.list_default_chemical_groups
::: medchem.groups.list_functional_group_names
::: medchem.groups.get_functional_group_map
::: medchem.groups.ChemicalGroup
134 changes: 4 additions & 130 deletions docs/api/medchem.query.md
Original file line number Diff line number Diff line change
@@ -1,132 +1,6 @@
# `medchem.query`

This module helps build a filter based on a query language that can be parsed.
By default, the default query parser will be used, which contains the following instructions that can be orchestrated using boolean operation (`or`, `and`, `not` and parenthesis)

## Example

```python
import datamol as dm
from medchem.query.eval import QueryFilter

query = """HASPROP("tpsa" < 120) AND HASSUBSTRUCTURE("[OH]", True)"""
chemical_filter = QueryFilter(query, parser="lalr")
mols = dm.data.cdk2().mol[:10]
chemical_filter(mols, n_jobs=-1) # [False, False, False, False, False, True, True, True, False, False]
```

## Syntax

Any string provided as `query` argument needs to be quoted (similar to json) to avoid ambiguity in parsing.
* An example of valid query is `"""(HASPROP("tpsa" > 120 ) | HASSUBSTRUCTURE("c1ccccc1")) AND NOT HASALERT("pains") OR HASSUBSTRUCTURE("[OH]", max, 2)"""`.
* Examples of invalid queries are
* `"""HASPROP("tpsa" > 120) OR HASSUBSTRUCTURE("[OH]", True, >, 3)"""` : unexpected wrong operator `>`
* `"""HASPROP(tpsa > 120)"""` : tpsa is not quoted
* `"""HASPROP("tpsa") > 120"""` : this is not part of the language specification
* `"""(HASPROP("tpsa" > 120) AND HASSUBSTRUCTURE("[OH]", True, max, 3 )"""`: mismatching parenthesis `(`

* `"""HASPROP("tpsa" > 120) OR HASSUBSTRUCTURE("CO")"""`, `"""(HASPROP("tpsa" > 120)) OR (HASSUBSTRUCTURE("CO"))"""` and `"""(HASPROP("tpsa" > 120) OR HASSUBSTRUCTURE("CO"))"""` are equivalent


### HASALERT
check whether a molecule has an `alert` from a catalog
```python
# alert is one supported alert catalog by `medchem`. For example `pains`
HASALERT(alert:str)
```

### HASGROUP
check whether a molecule has a specific functional group from a catalog

```python
# group is one supported functional group provided by `medchem`
HASGROUP(group:str)
```


### MATCHRULE
check whether a molecule match a predefined druglikeness `rule` from a catalog
```python
# rule is one supported rule provided by `medchem`. For example `rule_of_five`
MATCHRULE(rule:str)
```

### HASSUPERSTRUCTURE
check whether a molecule has `query` as superstructure
```python
# query is a SMILES
HASSUPERSTRUCTURE(query:str)
```

### HASSUBSTRUCTURE
Check whether a molecule has `query` as substructure.
**Note that providing the comma separator `,` is _mandatory_ here as each variable is an argument.**

```python
# query is a SMILES or a SMARTS, operator is defined below, is_smarts is a boolean

HASSUBSTRUCTURE(query:str, is_smarts:Optional[bool], operator:Optional[str], limit:Optional[int])

# which correspond to setting this default values
HASSUBSTRUCTURE(query:str, is_smarts=False, operator="min", limit=1)
# same as
HASSUBSTRUCTURE(query:str, is_smarts=None, operator=None, limit=None)
```

Not providing optional arguments is allowed, but they need to be provided in the exact same order shown above. Thus:

* `HASSUBSTRUCTURE("CO")`
* `HASSUBSTRUCTURE("CO", False)`
* `HASSUBSTRUCTURE("CO", False, min)`
* `HASSUBSTRUCTURE("CO", False, min, 1)`

are all `valid` and `equivalent` (given their default values)

Furthermore, since the correct argument map can be inferred when no ambiguity arises, the following `are valid but discouraged`

* `HASSUBSTRUCTURE("CO", False, 1)`
* `HASSUBSTRUCTURE("CO", min, 1)`

Whereas, this is invalid:
* `HASSUBSTRUCTURE("CO", min, False, 1)`


### HASPROP
Check whether a molecule has `prop` as property within a defined limit.
**Any comma `,` provided between arguments will be ignored**

```python
# prop is a valid datamol.descriptors property, comparator is a required comparator operator and defined below
HASPROP(prop:str comparator:str limit:float)
```

### LIKE
Check whether a molecule is similar enough to another molecule.
**Any comma `,` provided between arguments will be ignored**

```python
# query is a SMILES
LIKE(query:str comparator:str limit:float)
```

### Basic operators:

* comparator: one of `=` `==`, `!=`, `<`, `>`, `<=`, `>=`
* misc: the following misc values are accepted and parsed `true`, `false`, `True`, `False`, `TRUE`, `FALSE`
* operator (can be quoted or unquoted):
* MIN: `min`, `MIN`
* MAX: `max`, `MAX`
* boolean operator:
* AND operator : `AND` or `&` or `&&` or `and`
* OR operator : `OR` or `|` or `||` or `or`
* NOT operator : `NOT` or `!` or `~` or `not`



## API

::: medchem.query.parser

---

::: medchem.query.eval
::: medchem.query.QueryFilter
::: medchem.query.QueryOperator
::: medchem.query.EvaluableQuery
::: medchem.query.QueryParser
13 changes: 11 additions & 2 deletions docs/api/medchem.rules.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
# `medchem.rules`

::: medchem.rules.RuleFilters

## Basic Rules

::: medchem.rules.basic_rules

---
## Utilities

::: medchem.rules.rule_filter
::: medchem.rules.in_range
::: medchem.rules.n_heavy_metals
::: medchem.rules.has_spider_chains
::: medchem.rules.n_fused_aromatic_rings
::: medchem.rules.fraction_atom_in_scaff
::: medchem.rules.list_descriptors
5 changes: 5 additions & 0 deletions docs/api/medchem.structural.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# `medchem.structural`

::: medchem.structural.CommonAlertsFilters
::: medchem.structural.NIBRFilters
::: medchem.structural.lilly_demerits.LillyDemeritsFilters
6 changes: 1 addition & 5 deletions docs/api/medchem.utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,8 @@

---

::: medchem.utils.matches

---

::: medchem.utils.loader

---

::: medchem.utils.graph
::: medchem.utils.graph
15 changes: 15 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Medchem CLI

Medchem proposes CLI commands in order to filter directly from file paths. CSV, JSON, Excel, Parquet and SDF are supported.

Available commands can be found with:

```bash
medchem --help
```

To know more about one specific command:

```bash
medchem common-alerts --help
```
Loading

0 comments on commit 68caf43

Please sign in to comment.