Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading metadata using Python? #159

Open
molpopgen opened this issue Aug 28, 2021 · 5 comments
Open

Reading metadata using Python? #159

molpopgen opened this issue Aug 28, 2021 · 5 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@molpopgen
Copy link
Member

It's currently unclear how to do this. No effort has been made to test it. It seems quite likely that any metadata encoded using JSON would work, but we should actually show that. It is anyone's guess for the other encoding methods, as it's unclear that we can predict the layout.

@molpopgen molpopgen added good first issue Good for newcomers help wanted Extra attention is needed labels Dec 13, 2021
@molpopgen
Copy link
Member Author

To do this:

  1. We need to add "set schema" functions.
  2. Figure out how to do some kind of external test
    that runs a binary and outputs a trees file
    containing metadata
  3. Add Python-based CI to the tests.
    For sanity, we can do this for Linux only.

@molpopgen
Copy link
Member Author

#294 had some progress on this. But what's missing is a rusty way of managing the schema. We have to put more thought into this.

@molpopgen
Copy link
Member Author

This crate seems helpful re: schema stuff.

@molpopgen
Copy link
Member Author

jsonschema may also be helpful.

@molpopgen
Copy link
Member Author

molpopgen commented May 16, 2023

Working example of schema validation:

use serde_json::json;

#[derive(serde::Serialize, serde::Deserialize, Debug)]
struct Fwdpy11TopLevelMetadata {
    generation: u32,
    model_params: Option<String>,
    seed: Option<u32>,
    demes_graph: Option<String>,
    // Note: without this, we need to encode
    // null as one of the optional valid values.
    #[serde(skip_serializing_if = "Option::is_none")]
    data: Option<String>,
}

fn main() {
    // NOTE: the 'data' field doesn't specify null
    // and the field is one of many not "required".
    // Without extra work, we get error when None
    // serializes to NULL
    let fwdpy11_individual_metadata: serde_json::Value = json!(
        {
            "codec": "json",
            "title": "Top-level metadata for table collection/tree sequence.",
            "type": "object",
            "properties": {
                "generation": {
                    "type": "integer",
                    "description": "The value of pop.generation at the time data were exported to tskit",
                },
                "model_params": {
                    "type": ["string", "null"],
                    "description": "One or more fwdpy11.ModelParams instances.",
                },
                "seed": {
                    "type": ["integer", "null"],
                    "description": "Random number seed.",
                },
                "demes_graph": {
                    "type": ["string", "null"],
                    "description": "A demographic model specified using demes.",
                },
                "data": {
                    "type": ["string", "object"],
                    "description": "This field is reserved for the user to fill.",
                },
            },
            "required": ["generation"],
        }
    );
    let cfg = jsonschema_valid::Config::from_schema(
        &fwdpy11_individual_metadata,
        Some(jsonschema_valid::schemas::Draft::Draft7),
    )
    .unwrap();
    assert!(cfg.validate_schema().is_ok());

    let md = Fwdpy11TopLevelMetadata {
        generation: 0,
        model_params: Some("".to_string()),
        seed: Some(0),
        demes_graph: Some("".to_string()),
        data: Some("".to_string()),
    };
    let data = serde_json::to_string(&md).unwrap();
    let data: serde_json::Value = serde_json::from_str(&data).unwrap();
    println!("{data:?}");
    assert!(cfg.validate(&data).is_ok());

    let md = Fwdpy11TopLevelMetadata {
        generation: 101,
        model_params: None,
        seed: None,
        demes_graph: None,
        data: None,
    };
    let data = serde_json::to_string(&md).unwrap();
    let data: serde_json::Value = serde_json::from_str(&data).unwrap();
    println!("{data:?}");
    //assert!(cfg.validate(&data).is_ok());
    if let Err(errors) = cfg.validate(&data) {
        for error in errors {
            let f = format!("{}", error);
            println!("{f}");
        }
    };

    // Try with other crate
    let compiled = jsonschema::JSONSchema::options()
        .with_draft(jsonschema::Draft::Draft7)
        .compile(&fwdpy11_individual_metadata)
        .expect("A valid schema");
    let result = compiled.validate(&data);
    if let Err(errors) = result {
        for error in errors {
            println!("Validation error: {}", error);
            println!("Instance path: {}", error.instance_path);
        }
    }
    let md: Fwdpy11TopLevelMetadata =
        serde_json::from_str(&serde_json::to_string(&md).unwrap()).unwrap();
    println!("decoded = {md:?}");
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant