Skip to content

Commit

Permalink
tutorial sobre commits atômicos
Browse files Browse the repository at this point in the history
  • Loading branch information
IanniMuliterno committed Jul 31, 2024
1 parent bd638a0 commit 150598a
Show file tree
Hide file tree
Showing 6 changed files with 193 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"hash": "1aa6f4bb8672a3388bd0001cbbfa5b97",
"result": {
"markdown": "---\ntitle: \"Commits atômicos e profissionais de dados\"\nsubtitle: \"Aprenda a commitar e entenda porque um profissional de dados precisa dar atenção a isto.\"\nauthor: \n - \"[Ían Muliterno](https://imuliterno.netlify.app/)\" \ndate: \"2024-04-24\" \ncategories: \n# exemplo de categorias:\n - \"Tutorial\"\ntoc: true # isso habilita o sumário ao lado do post\nimage: \"images/logo.jpg\" # imagem usada na página inicial junto ao post\nbibliography: \"pacotes.bib\" # arquivo de bibliografia. Pode adicionar mais arquivos!\ndraft: true # enquanto estiver draft: true, o post é um rascunho\n---\n\n\n::: {.callout-note collapse=\"false\"}\n## Autoria\n\n<center>\n\n![Foto](https://avatars.githubusercontent.com/u/21000314?v=4){style=\"width: 30%; border-radius: 50%;\"}\n\n[<i class=\"bi bi-house-fill\"></i>](https://) [<i class=\"fab fa-github\"></i>](https://github.com/USER) [<i class=\"fab fa-linkedin\"></i>](https://www.linkedin.com/in/USER/) [<i class=\"fab fa-instagram\"></i>](https://www.instagram.com/USER/)\n\n</center>\n\nEste post foi escrito por ....\n:::\n\nAqui podemos adicionar o conteúdo do post! Segue abaixo alguns exemplos que podem facilitar a criação do post:\n\n## Exemplo de texto com marcações\n\nO pacote `{dados}` [@R-dados] disponibiliza a base de dados `pinguins`, uma versão traduzida para português do pacote `{palmerpenguins}` [@R-palmerpenguins]. A tradução dessa base foi feita por Jean Prado, que faz parte da co-organização da R-Ladies São Paulo!\n\n[![Arte por \\@allison_horst .](images/paste-9B5D36BA.png){fig-align=\"center\"}](https://allisonhorst.github.io/palmerpenguins/)\n\nAlém destes pacotes, nesse arquivo foram usados o `{knitr}` [@R-knitr] e `{tidyverse}` [@R-tidyverse].\n\n## Carregando pacotes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(dados)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: package 'dados' was built under R version 4.3.3\n```\n:::\n:::\n\n\n## Exemplo de código em linha\n\nA base pinguins apresenta 344 pinguins (pois cada linha representa um pinguim). As colunas presentes na base são: especie, ilha, comprimento_bico, profundidade_bico, comprimento_nadadeira, massa_corporal, sexo e ano.\n\n## Exemplo de gráfico\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npinguins |> \n ggplot() +\n aes(x = massa_corporal, y = comprimento_nadadeira, color = especie) +\n geom_point() +\n theme_light() + \n scale_color_viridis_d()\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-2-1.png){fig-align='center' width=672}\n:::\n:::\n\n\n## Exemplo de tabela\n\n\n::: {.cell}\n\n```{.r .cell-code}\npinguins |> \n count(especie, ilha, name = \"quantidade\") |> \n knitr::kable()\n```\n\n::: {.cell-output-display}\n|especie |ilha | quantidade|\n|:-------------------|:---------|----------:|\n|Pinguim-de-adélia |Biscoe | 44|\n|Pinguim-de-adélia |Dream | 56|\n|Pinguim-de-adélia |Torgersen | 52|\n|Pinguim-de-barbicha |Dream | 68|\n|Pinguim-gentoo |Biscoe | 124|\n:::\n:::\n\n::: {.cell}\n\n:::\n\n\n\n\n<!-- inicio font awesome -->\n\n\n```{=html}\n<script src=\"https://kit.fontawesome.com/1f72d6921a.js\" crossorigin=\"anonymous\"></script>\n```\n\n<!-- final font awesome -->\n",
"supporting": [
"index_files"
],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
111 changes: 111 additions & 0 deletions posts/2024-04-DataScience_and_DevOps/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
title: "Commits atômicos e profissionais de dados"
subtitle: "Aprenda a commitar e entenda porque um profissional de dados precisa dar atenção a isto."
author:
- "[Ían Muliterno](https://imuliterno.netlify.app/)"
date: "2024-04-24"
categories:
# exemplo de categorias:
- "Tutorial"
toc: true # isso habilita o sumário ao lado do post
image: "images/logo.jpg" # imagem usada na página inicial junto ao post
bibliography: "pacotes.bib" # arquivo de bibliografia. Pode adicionar mais arquivos!
draft: FALSE
---

::: {.callout-note collapse="false"}
## Autoria

<center>

![Foto](https://avatars.githubusercontent.com/u/21000314?v=4){style="width: 30%; border-radius: 50%;"}

[<i class="bi bi-house-fill"></i>](https://) [<i class="fab fa-github"></i>](https://github.com/USER) [<i class="fab fa-linkedin"></i>](https://www.linkedin.com/in/USER/) [<i class="fab fa-instagram"></i>](https://www.instagram.com/USER/)

</center>

Este post foi escrito por Ían Muliterno, co-organizador da comunidade R-ladies São Paulo. Ían é formado em estatística pela UFPE e trabalha como consultor em cientista de dados e ML engineer. Atualmente além do trabalho, participa de um hackathon da Chainlink e estuda Solidity, para embarcar no mercado de Web3.
:::

## Introdução

Com a evolução da área de ciência de dados e machine learning, a integração das práticas de DevOps se tornam cada vez mais essenciais. Se você duvida, basta dar uma olhada no conceito de DevOps e pensar como seria possível fazer o deploy de modelos e trabalhar em cloud, sem a aplicação desses conceitos básicos para um desenvolvedor de software.

[![Loop de DevOps .](images/paste-9B5D36BA.png){fig-align="center"}](https://marvel-b1-cdn.bc0a.com/f00000000236551/dt-cdn.net/wp-content/uploads/2021/07/13429_ILL_DevOpsLoop.png)

DevOps é uma combinação das palavras "Desenvolvimento" e "Operações". É uma estratégia que une os times que criam um software com o time que garante que o software funciona bem para o usuário final. O principal objetivo dessa estratégia é acelerar o processo de entrega de um software de qualidade.

Os principais pilares do DevOps são:

1. Automação
2. CI/CD
3. Colaboração
4. Teste e monitoramento
5. IaC

Automation: This is about making repetitive tasks run by themselves without needing a person to do them every time. For example, testing new parts of an app automatically to make sure they work as expected. Continuous Integration and Continuous Deployment (CI/CD): This means constantly adding small changes to the software and making sure they're good to go live to users. "Integration" is about blending these new changes smoothly with the old ones, and "Deployment" is about putting them into action so people can use them. Collaboration: Everyone involved, from those who write the code to those who deploy it, works closely together. This way, they can solve problems faster and make better software. Monitoring and Testing: Regularly checking the software to make sure it’s working right and finding any issues before they become big problems. This includes testing new updates to ensure they don’t mess anything up. Infrastructure as Code (IaC): This involves managing and setting up computer servers and networks in a way that’s as easy as writing a script or a piece of code. This helps in setting up and changing technical resources fast and consistently. By focusing on these pillars, DevOps helps create better software, faster, and with fewer problems, making everyone’s experience smoother and more enjoyable.

# draft

### Blog Post: Embracing Atomic Commits for Data Professionals

#### Introduction

As the fields of data science, machine learning, and artificial intelligence continue to evolve, the integration of DevOps practices into these domains has become increasingly essential. Among the many practices of DevOps, the concept of atomic commits is particularly vital for maintaining the integrity and manageability of code changes. This blog post aims to demystify atomic commits and discuss their importance for data professionals.

#### What are Atomic Commits?

An atomic commit refers to a version control practice where each commit contains a single functional change. This means that each commit is self-contained, with a clear, specific purpose. An atomic commit should be able to pass all tests by itself, ensuring that it does not break the codebase.

#### Why Atomic Commits?

**1. Clarity:** Atomic commits make it easier to understand the history of changes, which is crucial when debugging or when multiple people are working on the same project.

**2. Revertibility:** With atomic commits, if a particular change introduces a bug, it can be reverted without affecting other unrelated changes.

**3. Code Review:** Smaller, well-defined changes are easier to review, leading to more effective peer reviews.

#### Examples of Atomic Commits

Let’s consider a project where you’re developing a machine learning model for predicting customer churn. Here are examples of atomic commits:

- **Commit 1:** Add a new feature for extracting the customer's tenure length.
- **Commit 2:** Implement logistic regression for churn prediction.
- **Commit 3:** Fix a bug where missing values in the tenure feature caused a crash.

Each of these commits addresses a specific task or fix, making them atomic.

#### DevOps Requirements in Data Roles

As data professionals venture further into roles that overlap with software engineering, understanding and implementing DevOps practices becomes crucial. Here are some DevOps requirements commonly expected in data roles:

**1. Continuous Integration/Continuous Deployment (CI/CD):** Data scientists need to ensure that their models can be seamlessly integrated and deployed into production systems. This requires regular commits, automated testing, and frequent integration of changes.

**2. Version Control:** Effective use of version control systems like Git is essential for managing code changes, especially in collaborative environments.

**3. Testing and Monitoring:** Automated testing of code and monitoring of the model performance in production are critical to ensure reliability and accuracy of data products.

**4. Containerization:** Tools like Docker are increasingly used to containerize data applications, ensuring consistency across different computing environments.

#### How to Make Atomic Commits

**1. Keep Changes Small and Focused:** Before committing, ask yourself if your changes can be logically divided into smaller parts.

**2. Write Clear Commit Messages:** Each commit should be accompanied by a concise yet descriptive message that explains the rationale behind the change.

**3. Test Before Committing:** Ensure that your changes do not break existing functionality by running tests before committing.

#### Conclusion

Adopting atomic commits is not just about following a best practice; it’s about making your workflow more manageable and transparent, especially in fast-paced environments where data scientists and AI engineers collaborate closely with software developers. By incorporating these practices, data professionals can enhance the quality, reliability, and scalability of their data solutions, bridging the gap between data science and operational deployment.

------------------------------------------------------------------------

This blog post could be adapted further based on the specific interests or typical projects of the R-Ladies community, incorporating more R-specific examples or tools.

<!-- inicio font awesome -->

```{=html}
<script src="https://kit.fontawesome.com/1f72d6921a.js" crossorigin="anonymous"></script>
```
<!-- final font awesome -->
66 changes: 66 additions & 0 deletions posts/2024-04-DataScience_and_DevOps/pacotes.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
@Manual{R-dados,
title = {dados: Translate Datasets to Portuguese},
author = {Riva Quiroga and Sara Mortara and Beatriz Milz and Andrea Sánchez-Tapia and Alejandra Andrea {Tapia Silva} and Beatriz {Maurer Costa} and Jean Prado and Renata Hirota and William Amorim and Emmanuelle {Rodrigues Nunes}},
year = {2022},
note = {R package version 0.1.0},
url = {https://github.com/cienciadedatos/dados},
}

@Manual{R-knitr,
title = {knitr: A General-Purpose Package for Dynamic Report Generation in R},
author = {Yihui Xie},
year = {2023},
note = {R package version 1.45},
url = {https://yihui.org/knitr/},
}

@Manual{R-palmerpenguins,
title = {palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data},
author = {Allison Horst and Alison Hill and Kristen Gorman},
year = {2022},
note = {R package version 0.1.1,
https://github.com/allisonhorst/palmerpenguins},
url = {https://allisonhorst.github.io/palmerpenguins/},
}

@Manual{R-tidyverse,
title = {tidyverse: Easily Install and Load the Tidyverse},
author = {Hadley Wickham},
year = {2023},
note = {R package version 2.0.0,
https://github.com/tidyverse/tidyverse},
url = {https://tidyverse.tidyverse.org},
}

@Book{knitr2015,
title = {Dynamic Documents with {R} and knitr},
author = {Yihui Xie},
publisher = {Chapman and Hall/CRC},
address = {Boca Raton, Florida},
year = {2015},
edition = {2nd},
note = {ISBN 978-1498716963},
url = {https://yihui.org/knitr/},
}

@InCollection{knitr2014,
booktitle = {Implementing Reproducible Computational Research},
editor = {Victoria Stodden and Friedrich Leisch and Roger D. Peng},
title = {knitr: A Comprehensive Tool for Reproducible Research in {R}},
author = {Yihui Xie},
publisher = {Chapman and Hall/CRC},
year = {2014},
note = {ISBN 978-1466561595},
}

@Article{tidyverse2019,
title = {Welcome to the {tidyverse}},
author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
year = {2019},
journal = {Journal of Open Source Software},
volume = {4},
number = {43},
pages = {1686},
doi = {10.21105/joss.01686},
}

0 comments on commit 150598a

Please sign in to comment.