-
Notifications
You must be signed in to change notification settings - Fork 0
/
metadata.json
162 lines (162 loc) · 11.6 KB
/
metadata.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
{
"Fifa 23 Players Data": [
"Kaggle",
4847,
2023,
"Fifa 23 Players Data",
"FIFA 23 is a football simulation video game published by Electronic Arts. It is the 30th and final installment in the FIFA series that is developed by EA Sports, and released worldwide on 30 September 2022 for PC, Nintendo Switch, PlayStation 4, PlayStation 5, Xbox One, Xbox Series X/S and Google Stadia. The role of performance analysis within football is more important than ever. Whether it’s the opposition, potential transfer targets or last weekend’s fixture, analysing performances and data can be the difference between success and failure.",
"https://www.kaggle.com/datasets/sanjeetsinghnaik/fifa-23-players-dataset"
],
"Forbes Richest Atheletes (Forbes Richest Athletes 1990-2020)": [
"Kaggle",
7112,
2021,
"Fifa 23 Players Data",
"Here is a completel list of the world's highest-paid athletes since the first list published by Forbes in 1990. In 2002, they changed the reporting period from the full calendar year to June-to-June, and consequently, there are no records for 2001.",
"https://www.kaggle.com/datasets/parulpandey/forbes-highest-paid-athletes-19902019"
],
"Olympic Swimming Results 1912to2020": [
"Kaggle",
1695,
2023,
"Olympic Swimming Results 1912to2020",
"Swimming is one of the most popular Olympic sports where individual competitors race over various distance events in butterfly, backstroke, breaststroke, freestyle, and individual medley. Additionally, four swimmers can take part in a freestyle or medley relay. A medley relay consists of four swimmers each swimming a different stroke - ordered as backstroke, breaststroke, butterfly, and freestyle - over a set distance. Swimming has been a sport at every modern Summer Olympics but has only been open to women since 1912.",
"https://www.kaggle.com/datasets/datasciencedonut/olympic-swimming-1912-to-2020"
],
"Sport car price": [
"Kaggle",
8202,
2023,
"Sport car price",
"This dataset contains information about the prices of different sports cars from various manufacturers. The dataset includes the make and model of the car, the year of production, the engine size, the horsepower, the torque, the 0-60 MPH time, and the price in USD. The dataset is useful for analyzing the prices of different sports cars and identifying trends in the market.",
"https: //www.kaggle.com/datasets/bhanupratapbiswas/olympic-data"
],
"dataset olympics": [
"Kaggle",
6437,
2023,
"dataset olympics",
"The Olympic Games are an international multi-sport event held every four years in which thousands of athletes from around the world participate in various sports competitions. The Olympics are one of the most significant and prestigious sporting events globally, promoting unity, friendship, and fair play among nations.",
"https: //www.kaggle.com/datasets/bhanupratapbiswas/olympic-data"
],
"fifa eda stats": [
"Kaggle",
1912,
2022,
"FIFA EDA stats",
"You want to create your own football club named ‘ultralearnManral’. Your club don't have a team yet. Team will require to hire players for their roster. You wants to make players selection decisions using past data. Create some reports/kind of things which recommends data backed players for main team To start with, a total 14-16 players are required. Collected data contains information about players, clubs they are currently playing for and various performance measures. NOTE: As always assume budget for hiring players to be limited, team needs 18-22 possible players to choose from. Formulating a report will help management/stack-holders make some decision regarding potential players.",
"https://www.kaggle.com/datasets/mukeshmanral/fifa-data-for-eda-and-stats"
],
"sports clubs operations and costs": [
"Kaggle",
128,
2023,
"sports clubs operations and costs",
"This table contains information about sports clubs registered with the Chamber of Commerce (excluding water sports and professional sports clubs). Information about their members, accommodation, deployment of volunteers and employees (not) in salaried employment and the composition of income and expenses is presented in this table. The data can be broken down into the various disciplines: Power lifting and combat sports clubs, individual indoor sports clubs, indoor team sports clubs, swimming and diving clubs, other indoor sports clubs, athletic clubs, golf clubs, sport fishing clubs, horse riding clubs, tennis clubs, outdoor team sports clubs (excluding football), football clubs, cycling clubs and other outdoor sports clubs.",
"https://www.kaggle.com/datasets/niramay/sports-clubs-operation-costs-and-revenues"
],
"Microsoft_Stock": [
"Kaggle",
7261,
2021,
"Microsoft_Stock",
"This file contains the stock information of Microsoft from 04/01/2015 to 04/01/2021",
"https://www.kaggle.com/datasets/vijayvvenkitesh/microsoft-stock-time-series-analysis"
],
"Room Occupancy Detection Data IOT": [
"Kaggle",
1239,
2020,
"Room Occupancy Detection Data IOT",
"Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. ",
"https://www.kaggle.com/datasets/kukuroo3/room-occupancy-detection-data-iot-sensor"
],
"Solar Power Generation Data": [
"Kaggle",
41300,
2022,
"Solar Power Generation Data",
"This data has been gathered at two solar power plants in India over a 34 day period. It has two pairs of files - each pair has one power generation dataset and one sensor readings dataset. The power generation datasets are gathered at the inverter level - each inverter has multiple lines of solar panels attached to it. The sensor data is gathered at a plant level - single array of sensors optimally placed at the plant.",
"https://www.kaggle.com/datasets/anikannal/solar-power-generation-data"
],
"Baggage Complaints": [
"Kaggle",
1725,
2023,
"Baggage Complaints",
"Anyone who travels by air knows that occasional problems are inevitable. Flights can be delayed or cancelled due to weather conditions, mechanical problems, or labor strikes, and baggage can be lost, delayed, damaged, or pilfered. Given that many airlines are now charging for bags, issues with baggage are particularly annoying. Baggage problems can have a serious impact on customer loyalty, and can be costly to the airlines (airlines often have to deliver bags). Air carriers report flight delays, cancellations, overbookings, late arrivals, baggage complaints, and other operating statistics to the U.S. government, which compiles the data and reports it to the public. The data set contains monthly observations from 2004 to 2010 for United Airlines, American Eagle, and Hawaiian Airlines. ",
"https://www.kaggle.com/datasets/gabrielsantello/airline-baggage-complaints-time-series-dataset"
],
"Cinema Tickets": [
"Kaggle",
9331,
2021,
"Cinema Tickets",
"About eight months sales history of different cinemas with detailed data of screening , during 2018 with encoded annonymized locations. ",
"https://www.kaggle.com/datasets/arashnic/cinema-ticket"
],
"data science survey": [
"Kaggle",
20800,
2018,
"2018 Kaggle Machine Learning and Data Science Survey",
"The most comprehensive dataset available on the state of ML and data science",
"https://www.kaggle.com/datasets/kaggle/kaggle-survey-2018"
],
"CAMEL AI- Biology Problems:Solutions": [
"Kaggle",
25,
2023,
"CAMEL AI- Biology Problems:Solutions",
"Biology Problem-Solution Pairs for LLM Training",
"https://www.kaggle.com/datasets/thedevastator/synbio-problem-solution-dataset"
],
"Computer Science Conferences and Ranking": [
"Kaggle",
41,
2024,
"Computer Science Conferences and Ranking",
"Computer science conferences play a crucial role in fostering collaboration, knowledge sharing, and innovation within the field. Among the top-tier conferences, the Association for Computing Machinery (ACM) International Conference on Computer Science and Information Technology (ICCSIT) holds a prominent position, often earning an 'A' rank. This conference attracts researchers and professionals globally, providing a platform for discussing cutting-edge research and emerging trends.",
"https://www.kaggle.com/datasets/azminetoushikwasi/top-computer-science-conference-and-ranking"
],
"Data_Science_Job_Postings_And_Skills": [
"Kaggle",
1736,
2024,
"catapulthacks/science_datasets/Data_Science_Job_Postings_And_Skills.csv",
"LinkedIn is a popular professional networking platform with millions of job postings across various industries. This dataset provides a raw dump of data science-related job postings collected from LinkedIn. It includes information about job titles, companies, locations, search parameters, and other relevant details. The main objective of this dataset is not only to provide insights into the data science job market and the skills required by professionals in this field but also to offer users an opportunity to practice their data cleaning skills. By working with this dataset, users can gain hands-on experience in cleaning and preprocessing raw data, a critical skill for aspiring data scientists.",
"https://www.kaggle.com/datasets/asaniczka/data-science-job-postings-and-skills"
],
"Healthcare NLP- LLMs, Transformers": [
"Kaggle",
2997,
2024,
"Healthcare NLP- LLMs, Transformers",
"MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.",
"https://www.kaggle.com/datasets/jpmiller/layoutlm"
],
"Job details of popular jobs in data science field in india": [
"Kaggle",
232,
2022,
"Job details of popular jobs in data science field in india",
"This data is entirely RAW with only the duplicate entries removed. This dataset will be suitable for beginners looking for a portfolio project as one needs to clean it before drawing useful insights. For reference, you can see my similar analysis on another dataset on Naukri here data science jobs.",
"https://www.kaggle.com/datasets/sridharstreaks/popular-jobs-titles-in-data-science-field-in-india"
],
"palmer_penguins_extended": [
"Kaggle",
873,
2023,
"palmer_penguins_extended",
"This dataset is an extended version of the classic Palmer's Penguins dataset, providing a more comprehensive view of penguin characteristics and their environment. It includes new features such as diet, year of observation, life stage, and health metrics, in addition to the original attributes. The dataset spans from 2021 to 2025.",
"https://www.kaggle.com/datasets/samybaladram/palmers-penguin-dataset-extended"
],
"SciTail Multiple Choice Science Exams": [
"Kaggle",
74,
2023,
"SciTail Multiple Choice Science Exams",
"The Scitail dataset is your gateway to unlocking powerful and advanced Sci-Fi Natural Language Inference (NLI) algorithms. With data sourced from popular books, movies, and TV shows in the genre, this dataset gives you the opportunity to develop and train NLI algorithms capable of understanding complex sci-fi conversations. Containing seven distinct formats including training sets for both predictor format and datagem format as well as testing sets in tsv format and SNLI format - all containing the same fields but in varied structures - this is an essential resource for any scientist looking to explore the realm of sci-fi NLI! Train your algorithm today with Scitail; unlock a future of supercharged Sci-Fi language processing!",
"https://www.kaggle.com/datasets/thedevastator/futuristic-natural-language-inference-with-the-s"
]
}