Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create metadata fields and elements for data file summary statistics #104

Open
matthewhorridge opened this issue Jun 11, 2024 · 1 comment
Assignees

Comments

@matthewhorridge
Copy link

We would like to summarize data using descriptive statistics in the meta for a data file. The basic idea is that we have quantative and categorical data that can be summarized. For context:

Quantitative data and categorical data are two fundamental types of data used in statistical analysis and research. Here's a detailed comparison between the two:

Quantitative Data:

Quantitative data refers to numerical information that can be measured and quantified. This type of data represents quantities and can be subjected to various mathematical operations.

Key Characteristics:

  1. Numerical Values: Quantitative data consists of numbers.
  2. Measurement: Represents measurable quantities.
  3. Arithmetic Operations: Can be used in mathematical computations such as addition, subtraction, multiplication, and division.

Types of Quantitative Data:

  1. Discrete Data: Consists of distinct, separate values. Often counts of items.
    • Examples: Number of students in a class, number of cars in a parking lot.
  2. Continuous Data: Can take any value within a range. Often measurements.
    • Examples: Height, weight, temperature, time.

Examples of Quantitative Data:

  • Age of individuals.
  • Salary of employees.
  • Test scores.
  • Temperature readings.

Categorical Data:

Categorical data refers to information that can be grouped into categories but not measured numerically. This type of data represents characteristics or attributes.

Key Characteristics:

  1. Non-Numerical: Often involves names, labels, or categories.
  2. Grouping: Represents groups or categories.
  3. No Arithmetic Operations: Arithmetic operations cannot be meaningfully applied.

Types of Categorical Data:

  1. Nominal Data: Categories with no inherent order.
    • Examples: Gender (male, female), color (red, blue, green), nationality.
  2. Ordinal Data: Categories with a meaningful order or ranking.
    • Examples: Education level (high school, bachelor's, master's, doctorate), satisfaction rating (satisfied, neutral, unsatisfied).

Examples of Categorical Data:

  • Marital status (single, married, divorced).
  • Type of cuisine (Italian, Chinese, Mexican).
  • Customer feedback (positive, neutral, negative).

Summary:

  • Quantitative Data: Numerical, measurable, allows for arithmetic operations, includes discrete and continuous data.
  • Categorical Data: Non-numerical, represents categories or groups, does not allow for arithmetic operations, includes nominal and ordinal data.

Understanding the difference between quantitative and categorical data is crucial for selecting appropriate statistical methods and accurately interpreting research results.


A representation might be based on something like this:

Categorical data

fieldId: sex
fieldLabel: Sex at birth
categoricalType: Nominal
mode: 0
count: 34
valueSummary:
  - valueCode: 0
    valueLabel: Male
    count: 23
    percentage: 67.65
  - valueCode: 1
    valueLabel: Female
    count: 11
    percentage: 32.35

Qualitative data

fieldId: age
fieldLabel: Age in years
quantativeType: Continuous
count: 20
mode: 23
min:  23
q1: 29.75
median:  37.5
q3: 45.75
max: 60
range: 37
interQuartileRange: 16.0
mean: 38.55
variance: 116.26
standardDeviation: 10.78
skewness: 0.39
kurtosis: -0.83
@matthewhorridge matthewhorridge self-assigned this Jun 11, 2024
@matthewhorridge
Copy link
Author

See also #67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant