Skip to content

🍻 An open-source dataset of breweries, cideries, brewpubs, and bottleshops.

License

Notifications You must be signed in to change notification settings

Jocce-Nilsson/openbrewerydb

Β 
Β 

Repository files navigation

🍻 Open Brewery DB Dataset

All Contributors

Open Brewery DB Logo

This is the open-source dataset for the Open Brewery DB API which is served by a REST API built with Ruby on Rails

🎯 Purpose

Provide an approval-based pipeline to update the dataset and API.

πŸ—„ Data Formats

πŸš€ Getting Started

  1. git clone [email protected]:openbrewerydb/openbrewerydb.git
  2. cd openbrewerydb && npm install

βš™οΈ Scripts

The following npm scripts help maintain and manage the dataset:

Data Management

  • npm run validate

    • Validates all CSV files against the JSON Schema
    • Checks for required fields and data format consistency
    • Reports any validation errors that need attention
  • npm run csv:combine

    • Combines all individual CSV files from country/state-region folders into a single breweries.csv
    • Useful when you've made changes to individual state files and need to update the main dataset
  • npm run csv:split

    • Splits the main breweries.csv into separate files by country/state-region
    • Helps maintain organized, manageable data files for each region
    • Creates directories if they don't exist

Data Generation

  • npm run generate:ids

    • Creates unique OBDB IDs for each brewery based on name and city
    • Automatically updates breweries.csv with new IDs
    • Ensures no duplicate IDs exist in the dataset
  • npm run generate:json

    • Converts breweries.csv into a JSON format (breweries.json)
    • Useful for applications that prefer working with JSON data
    • Maintains data consistency across formats
  • npm run generate:sql

    • Creates PostgreSQL SQL file from breweries.csv
    • Includes table creation and data insertion statements
    • Perfect for database implementations
  • npm run generate:stats

    • Generates comprehensive dataset statistics
    • Shows brewery counts by state/city
    • Displays brewery type distribution
    • Reports data completeness metrics

Contributor Management

  • npm run contributors:add

    • Interactive CLI tool to add new contributors
    • Prompts for contributor information and contribution type
    • Updates .all-contributorsrc file
  • npm run contributors:check

    • Verifies if any contributors are missing from the list
    • Helps maintain accurate recognition of all contributors
  • npm run contributors:generate

    • Updates the Contributors section in README.md
    • Generates contributor table with avatars and contribution types

Workflow

  • npm run workflow:maintain
    • Comprehensive maintenance workflow that:
      1. Validates all CSV files
      2. Combines all CSV files
      3. Generates new IDs if needed
      4. Creates JSON and SQL files
      5. Splits back into individual state files
    • Run this after making any dataset updates

🀝 Contributing

For information on contributing to this project, please see the contributing guide and our code of conduct.

  1. Fork the repository
  2. Add or update breweries in the CSV (Excel, Google Sheets)
  3. Submit a Pull Request

Tips

First and foremost, don't worry about messing up! πŸ™‚ Thank you so much for contributing! πŸ™Œ

  • CSVs are organized by data/[country]/[state_province]
  • Required fields/columns: name, brewery_type, city, state_province, and country
  • When adding a brewery, do not include an id. This will be created after review.
  • Please either add to breweries.csv (preferred if adding breweries for a new country) or the individual state/province CSV file. Adding to both at the same time may introduce duplicates/errors.

πŸ‘Ύ Community

πŸ“« Feedback

Any feedback, please email me.

Cheers! 🍻

πŸ“Š Project Status

  • Status: Active
  • Last Dataset Update: 2024
  • Maintenance: Actively maintained through community contributions
  • Dataset Size: 8,000+ breweries
  • Coverage: United States, with growing international data

πŸ”§ Requirements

  • Node.js v18 or higher
  • npm package manager
  • Git

πŸ“š Data Schema

Each brewery entry contains the following fields:

Field Type Description Required
id String Unique identifier Yes
name String Name of the brewery Yes
brewery_type String Type of brewery (micro, regional, brewpub, etc.) Yes
street String Street address No
city String City Yes
state_province String State/Province Yes
postal_code String Postal code Yes
country String Country Yes
longitude String Decimal longitude coordinate No
latitude String Decimal latitude coordinate No
phone String Phone number No
website_url String Website URL No

πŸ“– Usage Examples

Python

import pandas as pd

# Read CSV
breweries_df = pd.read_csv('breweries.csv')

# Filter by state
california_breweries = breweries_df[breweries_df['state_province'] == 'California']

JavaScript/Node.js

const fs = require('fs');

// Read JSON
const breweries = JSON.parse(fs.readFileSync('breweries.json', 'utf8'));

// Filter by type
const microBreweries = breweries.filter(b => b.brewery_type === 'micro');

SQL

-- After importing breweries.sql
SELECT name, city, state_province
FROM breweries
WHERE brewery_type = 'brewpub'
ORDER BY state_province, city;

πŸ”„ Versioning

The dataset is updated regularly through community contributions. Each update goes through the following process:

  1. Community members submit new breweries or updates via pull requests
  2. Changes are reviewed and validated
  3. Upon approval, changes are merged and new dataset files are generated
  4. The API is automatically updated with the new data

Latest dataset version: 2024.1

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Mike Putnam
Mike Putnam

πŸ”£
Andrew A. Barber
Andrew A. Barber

πŸ”£
Jason Allen
Jason Allen

πŸ”£
Juicob
Juicob

πŸ”£
Will Karnasiewicz
Will Karnasiewicz

πŸ”£
Dylan T. Vavra
Dylan T. Vavra

πŸ”£
Madison Martinez
Madison Martinez

πŸ”£
Daniel Eremchuk
Daniel Eremchuk

πŸ”£
Alex Chong
Alex Chong

πŸ”£
Matt S
Matt S

πŸ”£
Samuel Rusher
Samuel Rusher

πŸ”£
Evan Caraway
Evan Caraway

πŸ”£
Tyler K Kuromiya Parker
Tyler K Kuromiya Parker

πŸ”£
Chris Mears
Chris Mears

πŸ’¬ πŸ’» πŸ”£ 🚧 πŸ“† πŸ”§ βœ…
donkeyslaps
donkeyslaps

πŸ”£
Pranav Davar
Pranav Davar

πŸ”§
Alexandre Hernandes Barrozo
Alexandre Hernandes Barrozo

πŸ”£
Resten
Resten

πŸ”£
Matt Higgins
Matt Higgins

πŸ”£
Alex Justesen
Alex Justesen

πŸ”£
Craig Kelly
Craig Kelly

πŸ”£
Krzysztof Rewak
Krzysztof Rewak

πŸ”£
John Baumert
John Baumert

πŸ”£
Charlie Cox
Charlie Cox

πŸ”£
Miles Kane
Miles Kane

πŸ”£
Anthony Laflamme
Anthony Laflamme

πŸ’»
Georg Engelsmann
Georg Engelsmann

πŸ”£
Clinton Williams
Clinton Williams

πŸ”£
Brent Busby
Brent Busby

πŸ”£
kenster89
kenster89

πŸ”£
Adilet Sarsembayev
Adilet Sarsembayev

πŸ”£
b-mc2
b-mc2

πŸ”£
Nicole
Nicole

πŸ”£
Nicholas Hance
Nicholas Hance

πŸ”£
Joachim Nilsson
Joachim Nilsson

πŸ”£
Alejandro Lopez Rocha
Alejandro Lopez Rocha

πŸ”£
zshapleigh
zshapleigh

πŸ”£
Praval Visvanath
Praval Visvanath

πŸ”£
JohnHenry
JohnHenry

πŸ”£
Alfredo Garcia
Alfredo Garcia

πŸ”£
Qerewe
Qerewe

πŸ”£
Nathan Peters
Nathan Peters

πŸ”£
Erich Cervantez
Erich Cervantez

πŸ”£
Ronald Sahagun
Ronald Sahagun

πŸ”£

This project follows the all-contributors specification. Contributions of any kind welcome!

πŸ“Š Statistics

Last updated: 2024-11-01

Overview

  • Total Breweries: 8,355
  • Data Completeness: 78.0%

πŸ› Top 10 States by Brewery Count

State Count
California 918
Washington 486
Colorado 448
New York 419
Michigan 375
Texas 352
Pennsylvania 345
Florida 312
North Carolina 307
Ohio 303

🍺 Brewery Types Distribution

Type Count Percentage
micro 4,305 51.5%
brewpub 2,500 29.9%
planning 684 8.2%
regional 225 2.7%
closed 216 2.6%
contract 192 2.3%
large 90 1.1%
proprietor 69 0.8%
bar 37 0.4%
taproom 20 0.2%
nano 13 0.2%
beergarden 3 0.0%
location 1 0.0%

πŸŒ† Top 10 Cities by Brewery Count

City Count
Denver, Colorado 92
San Diego, California 91
Portland, Oregon 85
Seattle, Washington 80
Chicago, Illinois 64
Austin, Texas 49
Houston, Texas 40
San Francisco, California 39
Minneapolis, Minnesota 38
Cincinnati, Ohio 34

πŸ“‹ Data Completeness by Field

Field Completeness
name 100.0%
brewery_type 100.0%
city 100.0%
state_province 100.0%
postal_code 100.0%
country 100.0%
address_1 91.0%
phone 90.0%
website_url 86.0%
longitude 72.0%
latitude 72.0%
address_2 1.0%
address_3 0.0%

About

🍻 An open-source dataset of breweries, cideries, brewpubs, and bottleshops.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 58.8%
  • TypeScript 41.2%