Skip to content

Commit

Permalink
🎉📝🌐 update, website content.
Browse files Browse the repository at this point in the history
  • Loading branch information
imarranz committed Jun 6, 2024
1 parent eb696de commit 49fdd2b
Show file tree
Hide file tree
Showing 62 changed files with 3,008 additions and 0 deletions.
16 changes: 16 additions & 0 deletions srcsite/01_introduction/011_introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@

## Introduction

![](../figures/chapters/010_introduction.png)

In recent years, the amount of data generated by businesses, organizations, and individuals has increased exponentially. With the rise of the Internet, mobile devices, and social media, we are now generating more data than ever before. This data can be incredibly valuable, providing insights that can inform decision-making, improve processes, and drive innovation. However, the sheer volume and complexity of this data also present significant challenges.

Data science has emerged as a discipline that helps us make sense of this data. It involves using statistical and computational techniques to extract insights from data and communicate them in a way that is actionable and relevant. With the increasing availability of powerful computers and software tools, data science has become an essential part of many industries, from finance and healthcare to marketing and manufacturing.

However, data science is not just about applying algorithms and models to data. It also involves a complex and often iterative process of data acquisition, cleaning, exploration, modeling, and implementation. This process is commonly known as the data science workflow.

Managing the data science workflow can be a challenging task. It requires coordinating the efforts of multiple team members, integrating various tools and technologies, and ensuring that the workflow is well-documented, reproducible, and scalable. This is where data science workflow management comes in.

Data science workflow management is especially important in the era of big data. As we continue to collect and analyze ever-larger amounts of data, it becomes increasingly important to have robust mathematical and statistical knowledge to analyze it effectively. Furthermore, as the importance of data-driven decision making continues to grow, it is critical that data scientists and other professionals involved in the data science workflow have the tools and techniques needed to manage this process effectively.

To achieve these goals, data science workflow management relies on a combination of best practices, tools, and technologies. Some popular tools for data science workflow management include Jupyter Notebooks, GitHub, Docker, and various project management tools.
14 changes: 14 additions & 0 deletions srcsite/01_introduction/012_introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

## What is Data Science Workflow Management?

Data science workflow management is the practice of organizing and coordinating the various tasks and activities involved in the data science workflow. It encompasses everything from data collection and cleaning to analysis, modeling, and implementation. Effective data science workflow management requires a deep understanding of the data science process, as well as the tools and technologies used to support it.

At its core, data science workflow management is about making the data science workflow more efficient, effective, and reproducible. This can involve creating standardized processes and protocols for data collection, cleaning, and analysis; implementing quality control measures to ensure data accuracy and consistency; and utilizing tools and technologies that make it easier to collaborate and communicate with other team members.

One of the key challenges of data science workflow management is ensuring that the workflow is well-documented and reproducible. This involves keeping detailed records of all the steps taken in the data science process, from the data sources used to the models and algorithms applied. By doing so, it becomes easier to reproduce the results of the analysis and verify the accuracy of the findings.

Another important aspect of data science workflow management is ensuring that the workflow is scalable. As the amount of data being analyzed grows, it becomes increasingly important to have a workflow that can handle large volumes of data without sacrificing performance. This may involve using distributed computing frameworks like Apache Hadoop or Apache Spark, or utilizing cloud-based data processing services like Amazon Web Services (AWS) or Google Cloud Platform (GCP).

Effective data science workflow management also requires a strong understanding of the various tools and technologies used to support the data science process. This may include programming languages like Python and R, statistical software packages like SAS and SPSS, and data visualization tools like Tableau and PowerBI. In addition, data science workflow management may involve using project management tools like JIRA or Asana to coordinate the efforts of multiple team members.

Overall, data science workflow management is an essential aspect of modern data science. By implementing best practices and utilizing the right tools and technologies, data scientists and other professionals involved in the data science process can ensure that their workflows are efficient, effective, and scalable. This, in turn, can lead to more accurate and actionable insights that drive innovation and improve decision-making across a wide range of industries and domains.
27 changes: 27 additions & 0 deletions srcsite/01_introduction/013_introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@

## References

### Books

* Peng, R. D. (2016). R programming for data science. Available at [https://bookdown.org/rdpeng/rprogdatascience/](https://bookdown.org/rdpeng/rprogdatascience/)

* Wickham, H., & Grolemund, G. (2017). R for data science: import, tidy, transform, visualize, and model data. Available at [https://r4ds.had.co.nz/](https://r4ds.had.co.nz/)

* Géron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. Available at [https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/)

* Shrestha, S. (2020). Data Science Workflow Management: From Basics to Deployment. Available at [https://www.springer.com/gp/book/9783030495362](https://www.springer.com/gp/book/9783030495362)

* Grollman, D., & Spencer, B. (2018). Data science project management: from conception to deployment. Apress.

* Kelleher, J. D., Tierney, B., & Tierney, B. (2018). Data science in R: a case studies approach to computational reasoning and problem solving. CRC Press.

* VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data. O'Reilly Media, Inc.

* Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M., Frederic, J., ... & Ivanov, P. (2016). Jupyter Notebooks-a publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas, 87.

* Pérez, F., & Granger, B. E. (2007). IPython: a system for interactive scientific computing. Computing in Science & Engineering, 9(3), 21-29.

* Rule, A., Tabard-Cossa, V., & Burke, D. T. (2018). Open science goes microscopic: an approach to knowledge sharing in neuroscience. Scientific Data, 5(1), 180268.

* Shen, H. (2014). Interactive notebooks: Sharing the code. Nature, 515(7525), 151-152.

10 changes: 10 additions & 0 deletions srcsite/02_fundamentals/021_fundamentals_of_data_science.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

## Fundamentals of Data Science

![](../figures/chapters/020_fundamentals_of_data_science.png)

Data science is an interdisciplinary field that combines techniques from statistics, mathematics, and computer science to extract knowledge and insights from data. The rise of big data and the increasing complexity of modern systems have made data science an essential tool for decision-making across a wide range of industries, from finance and healthcare to transportation and retail.

The field of data science has a rich history, with roots in statistics and data analysis dating back to the 19th century. However, it was not until the 21st century that data science truly came into its own, as advancements in computing power and the development of sophisticated algorithms made it possible to analyze larger and more complex datasets than ever before.

This chapter will provide an overview of the fundamentals of data science, including the key concepts, tools, and techniques used by data scientists to extract insights from data. We will cover topics such as data visualization, statistical inference, machine learning, and deep learning, as well as best practices for data management and analysis.
12 changes: 12 additions & 0 deletions srcsite/02_fundamentals/022_fundamentals_of_data_science.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@

## What is Data Science?

Data science is a multidisciplinary field that uses techniques from mathematics, statistics, and computer science to extract insights and knowledge from data. It involves a variety of skills and tools, including data collection and storage, data cleaning and preprocessing, exploratory data analysis, statistical inference, machine learning, and data visualization.

The goal of data science is to provide a deeper understanding of complex phenomena, identify patterns and relationships, and make predictions or decisions based on data-driven insights. This is done by leveraging data from various sources, including sensors, social media, scientific experiments, and business transactions, among others.

Data science has become increasingly important in recent years due to the exponential growth of data and the need for businesses and organizations to extract value from it. The rise of big data, cloud computing, and artificial intelligence has opened up new opportunities and challenges for data scientists, who must navigate complex and rapidly evolving landscapes of technologies, tools, and methodologies.

To be successful in data science, one needs a strong foundation in mathematics and statistics, as well as programming skills and domain-specific knowledge. Data scientists must also be able to communicate effectively and work collaboratively with teams of experts from different backgrounds.

Overall, data science has the potential to revolutionize the way we understand and interact with the world around us, from improving healthcare and education to driving innovation and economic growth.
16 changes: 16 additions & 0 deletions srcsite/02_fundamentals/023_fundamentals_of_data_science.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@

## Data Science Process

The data science process is a systematic approach for solving complex problems and extracting insights from data. It involves a series of steps, from defining the problem to communicating the results, and requires a combination of technical and non-technical skills.

The data science process typically begins with understanding the problem and defining the research question or hypothesis. Once the question is defined, the data scientist must gather and clean the relevant data, which can involve working with large and messy datasets. The data is then explored and visualized, which can help to identify patterns, outliers, and relationships between variables.

Once the data is understood, the data scientist can begin to build models and perform statistical analysis. This often involves using machine learning techniques to train predictive models or perform clustering analysis. The models are then evaluated and tested to ensure they are accurate and robust.

Finally, the results are communicated to stakeholders, which can involve creating visualizations, dashboards, or reports that are accessible and understandable to a non-technical audience. This is an important step, as the ultimate goal of data science is to drive action and decision-making based on data-driven insights.

The data science process is often iterative, as new insights or questions may arise during the analysis that require revisiting previous steps. The process also requires a combination of technical and non-technical skills, including programming, statistics, and domain-specific knowledge, as well as communication and collaboration skills.

To support the data science process, there are a variety of software tools and platforms available, including programming languages such as Python and R, machine learning libraries such as scikit-learn and TensorFlow, and data visualization tools such as Tableau and D3.js. There are also specific data science platforms and environments, such as Jupyter Notebook and Apache Spark, that provide a comprehensive set of tools for data scientists.

Overall, the data science process is a powerful approach for solving complex problems and driving decision-making based on data-driven insights. It requires a combination of technical and non-technical skills, and relies on a variety of software tools and platforms to support the process.
Loading

0 comments on commit 49fdd2b

Please sign in to comment.