This is the open-source dataset for the Open Brewery DB API which is served by a REST API built with Ruby on Rails
Provide an approval-based pipeline to update the dataset and API.
git clone [email protected]:openbrewerydb/openbrewerydb.git
cd openbrewerydb && npm install
The following npm scripts help maintain and manage the dataset:
-
npm run validate
- Validates all CSV files against the JSON Schema
- Checks for required fields and data format consistency
- Reports any validation errors that need attention
-
npm run csv:combine
- Combines all individual CSV files from country/state-region folders into a single
breweries.csv
- Useful when you've made changes to individual state files and need to update the main dataset
- Combines all individual CSV files from country/state-region folders into a single
-
npm run csv:split
- Splits the main
breweries.csv
into separate files by country/state-region - Helps maintain organized, manageable data files for each region
- Creates directories if they don't exist
- Splits the main
-
npm run generate:ids
- Creates unique OBDB IDs for each brewery based on name and city
- Automatically updates
breweries.csv
with new IDs - Ensures no duplicate IDs exist in the dataset
-
npm run generate:json
- Converts
breweries.csv
into a JSON format (breweries.json
) - Useful for applications that prefer working with JSON data
- Maintains data consistency across formats
- Converts
-
npm run generate:sql
- Creates PostgreSQL SQL file from
breweries.csv
- Includes table creation and data insertion statements
- Perfect for database implementations
- Creates PostgreSQL SQL file from
-
npm run generate:stats
- Generates comprehensive dataset statistics
- Shows brewery counts by state/city
- Displays brewery type distribution
- Reports data completeness metrics
-
npm run contributors:add
- Interactive CLI tool to add new contributors
- Prompts for contributor information and contribution type
- Updates
.all-contributorsrc
file
-
npm run contributors:check
- Verifies if any contributors are missing from the list
- Helps maintain accurate recognition of all contributors
-
npm run contributors:generate
- Updates the Contributors section in
README.md
- Generates contributor table with avatars and contribution types
- Updates the Contributors section in
npm run workflow:maintain
- Comprehensive maintenance workflow that:
- Validates all CSV files
- Combines all CSV files
- Generates new IDs if needed
- Creates JSON and SQL files
- Splits back into individual state files
- Run this after making any dataset updates
- Comprehensive maintenance workflow that:
For information on contributing to this project, please see the contributing guide and our code of conduct.
- Fork the repository
- Add or update breweries in the CSV (Excel, Google Sheets)
- Submit a Pull Request
First and foremost, don't worry about messing up! π Thank you so much for contributing! π
- CSVs are organized by
data/[country]/[state_province]
- Required fields/columns:
name
,brewery_type
,city
,state_province
, andcountry
- When adding a brewery, do not include an
id
. This will be created after review. - Please either add to
breweries.csv
(preferred if adding breweries for a new country) or the individual state/province CSV file. Adding to both at the same time may introduce duplicates/errors.
Any feedback, please email me.
Cheers! π»
- Status: Active
- Last Dataset Update: 2024
- Maintenance: Actively maintained through community contributions
- Dataset Size: 8,000+ breweries
- Coverage: United States, with growing international data
- Node.js v18 or higher
- npm package manager
- Git
Each brewery entry contains the following fields:
Field | Type | Description | Required |
---|---|---|---|
id | String | Unique identifier | Yes |
name | String | Name of the brewery | Yes |
brewery_type | String | Type of brewery (micro, regional, brewpub, etc.) | Yes |
street | String | Street address | No |
city | String | City | Yes |
state_province | String | State/Province | Yes |
postal_code | String | Postal code | Yes |
country | String | Country | Yes |
longitude | String | Decimal longitude coordinate | No |
latitude | String | Decimal latitude coordinate | No |
phone | String | Phone number | No |
website_url | String | Website URL | No |
import pandas as pd
# Read CSV
breweries_df = pd.read_csv('breweries.csv')
# Filter by state
california_breweries = breweries_df[breweries_df['state_province'] == 'California']
const fs = require('fs');
// Read JSON
const breweries = JSON.parse(fs.readFileSync('breweries.json', 'utf8'));
// Filter by type
const microBreweries = breweries.filter(b => b.brewery_type === 'micro');
-- After importing breweries.sql
SELECT name, city, state_province
FROM breweries
WHERE brewery_type = 'brewpub'
ORDER BY state_province, city;
The dataset is updated regularly through community contributions. Each update goes through the following process:
- Community members submit new breweries or updates via pull requests
- Changes are reviewed and validated
- Upon approval, changes are merged and new dataset files are generated
- The API is automatically updated with the new data
Latest dataset version: 2024.1
Thanks goes to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind welcome!
Last updated: 2024-11-01
- Total Breweries: 8,355
- Data Completeness: 78.0%
State | Count |
---|---|
California | 918 |
Washington | 486 |
Colorado | 448 |
New York | 419 |
Michigan | 375 |
Texas | 352 |
Pennsylvania | 345 |
Florida | 312 |
North Carolina | 307 |
Ohio | 303 |
Type | Count | Percentage |
---|---|---|
micro | 4,305 | 51.5% |
brewpub | 2,500 | 29.9% |
planning | 684 | 8.2% |
regional | 225 | 2.7% |
closed | 216 | 2.6% |
contract | 192 | 2.3% |
large | 90 | 1.1% |
proprietor | 69 | 0.8% |
bar | 37 | 0.4% |
taproom | 20 | 0.2% |
nano | 13 | 0.2% |
beergarden | 3 | 0.0% |
location | 1 | 0.0% |
City | Count |
---|---|
Denver, Colorado | 92 |
San Diego, California | 91 |
Portland, Oregon | 85 |
Seattle, Washington | 80 |
Chicago, Illinois | 64 |
Austin, Texas | 49 |
Houston, Texas | 40 |
San Francisco, California | 39 |
Minneapolis, Minnesota | 38 |
Cincinnati, Ohio | 34 |
Field | Completeness |
---|---|
name | 100.0% |
brewery_type | 100.0% |
city | 100.0% |
state_province | 100.0% |
postal_code | 100.0% |
country | 100.0% |
address_1 | 91.0% |
phone | 90.0% |
website_url | 86.0% |
longitude | 72.0% |
latitude | 72.0% |
address_2 | 1.0% |
address_3 | 0.0% |