Fix links from setup to docs, add in new article about publishing to …

…medium
data-catering · Dec 19, 2024 · 05e412f · 05e412f
1 parent aafbbf8
commit 05e412f
Show file tree

Hide file tree

Showing 20 changed files with 236 additions and 30 deletions.
diff --git a/docs/diagrams/.DS_Store b/docs/diagrams/.DS_Store
diff --git a/docs/diagrams/blog/paid-medium-articles/audience_stats_dashboard.png b/docs/diagrams/blog/paid-medium-articles/audience_stats_dashboard.png
diff --git a/docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_day_per_story.png b/docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_day_per_story.png
diff --git a/docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_interaction.png b/docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_interaction.png
diff --git a/docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_story.png b/docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_story.png
diff --git a/docs/diagrams/blog/paid-medium-articles/duckdb_linear_regression_interactions.png b/docs/diagrams/blog/paid-medium-articles/duckdb_linear_regression_interactions.png
diff --git a/docs/diagrams/blog/paid-medium-articles/duckdb_total_earnings.png b/docs/diagrams/blog/paid-medium-articles/duckdb_total_earnings.png
diff --git a/docs/diagrams/blog/paid-medium-articles/inspect_graphql_query.png b/docs/diagrams/blog/paid-medium-articles/inspect_graphql_query.png
diff --git a/docs/diagrams/blog/paid-medium-articles/partner_program_dashboard.png b/docs/diagrams/blog/paid-medium-articles/partner_program_dashboard.png
diff --git a/docs/diagrams/blog/paid-medium-articles/per_story_stats_bottom.png b/docs/diagrams/blog/paid-medium-articles/per_story_stats_bottom.png
diff --git a/docs/diagrams/blog/paid-medium-articles/per_story_stats_middle.png b/docs/diagrams/blog/paid-medium-articles/per_story_stats_middle.png
diff --git a/docs/diagrams/blog/paid-medium-articles/per_story_stats_top.png b/docs/diagrams/blog/paid-medium-articles/per_story_stats_top.png
diff --git a/docs/diagrams/blog/paid-medium-articles/story_stats_dashboard.png b/docs/diagrams/blog/paid-medium-articles/story_stats_dashboard.png
diff --git a/docs/diagrams/blog/paid-medium-articles/story_stats_sorting.png b/docs/diagrams/blog/paid-medium-articles/story_stats_sorting.png
diff --git a/docs/docs/guide/scenario/data-validation.md b/docs/docs/guide/scenario/data-validation.md
@@ -146,7 +146,7 @@ validation considers an acceptable error threshold before marking it as failed.
 - Customisation
     - Adjust the regex pattern and error threshold based on your specific data schema and validation requirements.
     - For the full list of types of basic validations that can be
-      used, [check this page](../../../setup/validation/basic-validation.md).
+      used, [check this page](../../../docs/validation/basic-validation.md).
 - Understanding Tolerance
     - Be mindful of the error threshold, as it directly influences what percentage of deviations from the pattern is
       acceptable.
@@ -220,7 +220,7 @@ Line 2: `validation.groupBy("account_id").max("balance").lessThan(900)`
 - Adjust the `errorThreshold` or validation to your specification scenario. The full list
   of [types of validations can be found here](../../validation.md).
 - For the full list of types of group by validations that can be
-  used, [check this page](../../../setup/validation/group-by-validation.md).
+  used, [check this page](../../../docs/validation/group-by-validation.md).
 
 === "Java"
 

diff --git a/docs/get-started/quick-start.md b/docs/get-started/quick-start.md
@@ -112,6 +112,6 @@ Check the report generated under `docker/data/custom/report/index.html`.
 
 ### Guided tour
 
-[**Check out the starter guide here**](../setup/guide/scenario/first-data-generation.md) that will take your through
-step by step. You can also check the other guides [**here**](../setup/guide/index.md) to see the other possibilities of
+[**Check out the starter guide here**](../docs/guide/scenario/first-data-generation.md) that will take your through
+step by step. You can also check the other guides [**here**](../docs/guide/index.md) to see the other possibilities of
 what Data Caterer can achieve for you.
diff --git a/docs/index.md b/docs/index.md
@@ -26,11 +26,11 @@ alt="Data Caterer generate, validate and clean data testing flow">
 
 ## Main features
 
-- :material-connection: [Connect to any data source](setup/connection.md)
-- :material-auto-fix: [Auto generate production-like data from data connections or metadata sources](setup/guide/scenario/auto-generate-connection.md)
-- :material-relation-many-to-one: [Relationships across data sources](setup/generator/foreign-key.md)
-- :material-test-tube: [Validate based on data generated](setup/validation.md)
-- :material-delete-sweep: [Clean up generated and downstream data](setup/delete-data.md)
+- :material-connection: [Connect to any data source](docs/connection.md)
+- :material-auto-fix: [Auto generate production-like data from data connections or metadata sources](docs/guide/scenario/auto-generate-connection.md)
+- :material-relation-many-to-one: [Relationships across data sources](docs/generator/foreign-key.md)
+- :material-test-tube: [Validate based on data generated](docs/validation.md)
+- :material-delete-sweep: [Clean up generated and downstream data](docs/delete-data.md)
 
 <span class="center-content">
 [Try now](get-started/quick-start.md){ .md-button .md-button--primary .button-spaced }
@@ -89,6 +89,14 @@ alt="Data Caterer generate, validate and clean data testing flow">
 
 </div>
 
+## Who can use it
+
+| Type      | Interface                                              | User                                 |
+|-----------|--------------------------------------------------------|--------------------------------------|
+| No Code   | [UI](get-started/quick-start.md#windows)               | QA, Testers, Data Scientist, Analyst |
+| Low Code  | [YAML](get-started/quick-start.md#yaml)                | DevOps, Kubernetes Fans              |
+| High Code | [Java/Scala](get-started/quick-start.md#javascala-api) | Software Developers, Data Engineers  |
+
 <span class="center-content">
 [Try now](get-started/quick-start.md){ .md-button .md-button--primary .button-spaced }
 [Demo](sample/ui/index.html){ .md-button .md-button--primary .button-spaced }

diff --git a/docs/use-case.md b/docs/use-case.md
@@ -46,7 +46,7 @@ data gets generated and is consumed, you can also run validations to ensure your
 These scenarios can be put together from existing tasks or data sources can be enabled/disabled based on your
 requirement. Built into Data Caterer and controlled via feature flags, is the ability to test edge cases based on the
 data type of the fields used for data generation (`enableEdgeCases` flag within `<field>.generator.options`, see more
-[**here**](setup/generator/data-generator.md)).
+[**here**](docs/generator/data-generator.md)).
 
 ## Data debugging
 
@@ -59,7 +59,7 @@ in whichever environment you want to test your changes against.
 ## Data profiling
 
 When using Data Caterer with the feature flag `enableGeneratePlanAndTasks` enabled
-(see [**here**](setup/configuration.md)), metadata relating all the fields defined in the data sources you have
+(see [**here**](docs/configuration.md)), metadata relating all the fields defined in the data sources you have
 configured will be generated via data profiling. You can run this as a standalone job (can disable `enableGenerateData`) 
 so that you can focus on the profile of the data you are utilising. This can be run against your production data sources 
 to ensure the metadata can be used to accurately generate data in other environments. This is a key feature of Data 
@@ -69,6 +69,6 @@ lead to serious concerns about data security as seen [**here**](use-case/busines
 ## Schema gathering
 
 When using Data Caterer with the feature flag `enableGeneratePlanAndTasks` enabled
-(see [**here**](setup/configuration.md)), all schemas of the data sources defined will be tracked in a common format (as
+(see [**here**](docs/configuration.md)), all schemas of the data sources defined will be tracked in a common format (as
 tasks). This data, along with the data profiling metadata, could then feed back into your schema registries to help keep
 them up to date with your system.
diff --git a/docs/use-case/blog/a-year-of-getting-paid-from-medium-articles.md b/docs/use-case/blog/a-year-of-getting-paid-from-medium-articles.md
@@ -0,0 +1,198 @@
+---
+title: "A year of getting paid from Medium articles"
+description: "Analysing my Medium article data to see how my trial of one year of paid articles performed."
+image: "https://data.catering/diagrams/logo/data_catering_logo.svg"
+---
+
+# Creating Articles
+
+At the end of last year, I decided to put more effort into creating articles both to help boost awareness of my
+new business Data Catering and consolidate my knowledge and exploration of technology topics. I had been using
+Medium for a couple of articles before and was enticed by the fact that you could monetize these articles via people
+viewing, reading and reacting to what you have written. So I signed up to the Medium Partner Program for a year to
+find out how much my articles are worth, hoping to at least break even at the end of it.
+
+A small thing to note, Medium refers to articles as "stories". So wherever you see "story", just replace with article
+as I'm not a story writer.
+
+## Monetizing an Article
+
+To earn money from an article, you have to restrict access to the article to only other members of the Medium Partner
+Program. You have the option as a writer of the article to also share a friend's link which allows others to bypass this
+restriction, but you don't earn anything based on the interactions from the friend's link.
+
+So your only chances of earning from an article are essentially based on people outside your network, who are members, 
+interacting with your article.
+
+## Medium Dashboards
+
+You have access to a few dashboards which give you summary statistics on how users are interacting with your articles.
+
+### Partner Program Dashboard
+
+Shows you a summary of the current month earnings and gives you an overview on a per-story basis. Another thing to
+note is that you now only get paid once you have reached $10 USD. Previously, you would get paid per month no matter
+the amount.
+
+![Medium Partner Program dashboard](../../../docs/diagrams/blog/paid-medium-articles/partner_program_dashboard.png)
+
+### Audience Dashboard
+
+An overview of subscribers to your articles. You can see a jump in October because I wrote an article that month.
+
+![Medium audience dashboard](../../../docs/diagrams/blog/paid-medium-articles/audience_stats_dashboard.png)
+
+### Story Dashboard
+
+Details on how many people have viewed and read your articles.
+
+![Story statistics dashboard](../../../docs/diagrams/blog/paid-medium-articles/story_stats_dashboard.png)
+
+Basic sorting options for story statistics.
+
+![Story statistics sorting options](../../../docs/diagrams/blog/paid-medium-articles/story_stats_sorting.png)
+
+### Per Story Dashboard
+
+You can drill down into more details at the article level.
+
+![Per story overview with earnings](../../../docs/diagrams/blog/paid-medium-articles/per_story_stats_top.png)
+
+The breakdown of member/non-member reads and views per day.
+
+![Per story member/non-member reads and views per day](../../../docs/diagrams/blog/paid-medium-articles/per_story_stats_middle.png)
+
+The sources of traffic to your article and the interests of your readers.
+
+![Per story traffic sources and reader interests](../../../docs/diagrams/blog/paid-medium-articles/per_story_stats_bottom.png)
+
+## I Want More Insights
+
+I wanted to run some further analysis on my article data. So I went searching for some export options within the Medium
+website but could only find you can export out your [audience statistics](#audience-dashboard). Eventually, I found
+something more comprehensive in [this GitHub repo called medium_stats](https://github.com/otosky/medium_stats). Great!
+
+When I ran it after installing it via pip, it ran into a JSON decoding error. Most likely Medium has changed its API
+and the project needs to be updated. Using my internet skills, I opened up "Inspect" in my browsers, went to the "
+Network" tab and tried to find out which API call contains all the juicy information. After a few clicks, I found this 
+GraphQL response.
+
+![GraphQL query to get all story stats](../../../docs/diagrams/blog/paid-medium-articles/inspect_graphql_query.png)
+
+I could see the
+[medium_stats project was already making some GraphQL calls](https://github.com/otosky/medium_stats/blob/master/medium_stats/scraper.py#L15).
+So I quickly cloned the repo, added in the missing GraphQL calls and got it to kinda work. Now I have exported out my
+stats in JSON format. How can we analyse this data quickly?
+
+### DuckDB
+
+Without putting too much thought into it, I knew I could easily use DuckDB to query this JSON data via SQL.
+Now I can run queries like this to get a nice compact view of the stats I'm interested in.
+
+#### Views, reads and earnings per story
+
+```sql
+SELECT node.title                                                                 AS title,
+       node.totalStats.views                                                      AS views,
+       node.totalStats.reads                                                      AS reads,
+       node.readingTime,
+       CAST(CONCAT(node.earnings.total.units, '.',
+                   LEFT(CAST(node.earnings.total.nanos AS string), 2)) AS DOUBLE) AS earnings
+FROM
+    read_json('/tmp/stats_exports/*/agg_stats/*.json')
+ORDER BY earnings DESC;
+```
+
+![DuckDB query results for earnings per story](../../../docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_story.png)
+
+#### Earnings per day per story
+
+```sql
+SELECT a.node.title AS title, e.date AS date, e.total_earnings AS earn
+FROM (SELECT d.id                                          AS id,
+             STRFTIME(MAKE_TIMESTAMP(CAST(d.daily_earning.periodStartedAt AS BIGINT) * 1000),
+                      '%Y-%m-%d')                          AS date,
+             ROUND(SUM(d.daily_earning.amount / 100.0), 2) AS total_earnings
+      FROM (SELECT p.post.id AS id, UNNEST(p.post.earnings.dailyEarnings) AS daily_earning
+            FROM (SELECT UNNEST(data.post) AS post
+                  FROM
+                      read_json('/tmp/stats_exports/*/post_events/*.json')) p) d
+      GROUP BY id,
+               date) e
+         JOIN read_json('/tmp/stats_exports/*/agg_stats/*.json') a ON a.node.id = e.id
+ORDER BY earn DESC;
+```
+
+![DuckDB query results for earnings per day per story](../../../docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_day_per_story.png)
+
+#### Earnings per member interaction per story
+
+```sql
+SELECT id,
+       STRFTIME(MAKE_TIMESTAMP(CAST(earnings.periodStartedAt AS BIGINT) * 1000),
+                '%Y-%m-%d')                                      AS date,
+       earnings.amount                                           AS amount,
+       stats.readersThatReadCount                                AS reads,
+       stats.readersThatViewedCount                              AS views,
+       stats.readersThatClappedCount                             AS claps,
+       stats.readersThatRepliedCount                             AS replies,
+       stats.readersThatHighlightedCount                         AS highlights,
+       stats.readersThatInitiallyFollowedAuthorFromThisPostCount AS follows
+FROM (SELECT d.id               AS id,
+             d.stats            AS stats,
+             UNNEST(d.earnings) AS earnings
+      FROM (SELECT t.post.id                                   AS id,
+                   t.post.earnings.dailyEarnings               AS earnings,
+                   UNNEST(t.post.postStatsDailyBundle.buckets) AS stats
+            FROM (SELECT UNNEST(data.post) AS post
+                  FROM read_json('/tmp/stats_exports/*/post_earnings_breakdown/*.json')) t) d
+      WHERE earnings NOT NULL AND stats.membershipType = 'MEMBER')
+WHERE earnings.periodStartedAt = stats.dayStartsAt
+ORDER BY amount DESC;
+```
+
+![DuckDB query results for earnings with interactions from members](../../../docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_interaction.png)
+
+#### Linear regression of member interactions with earnings
+
+I don't think I have enough data to get an accurate estimation of the formula used by Medium to calculate earnings but
+the below query gives a rough indication. Someone else with a larger following could get a better estimate.
+
+```sql
+SELECT REGR_SLOPE(earnings.amount, stats.readersThatReadCount)                                AS slope_read,
+       REGR_SLOPE(earnings.amount, stats.readersThatViewedCount)                              AS slope_view,
+       REGR_SLOPE(earnings.amount, stats.readersThatClappedCount)                             AS slope_clap,
+       REGR_SLOPE(earnings.amount, stats.readersThatRepliedCount)                             AS slope_reply,
+       REGR_SLOPE(earnings.amount, stats.readersThatHighlightedCount)                         AS slope_highlight,
+       REGR_SLOPE(earnings.amount, stats.readersThatInitiallyFollowedAuthorFromThisPostCount) AS slope_follow,
+       REGR_INTERCEPT(earnings.amount, stats.readersThatReadCount)                            AS intercept
+FROM (SELECT d.id               AS id,
+             d.stats            AS stats,
+             UNNEST(d.earnings) AS earnings
+      FROM (SELECT t.post.id                                   AS id,
+                   t.post.earnings.dailyEarnings               AS earnings,
+                   UNNEST(t.post.postStatsDailyBundle.buckets) AS stats
+            FROM (SELECT UNNEST(data.post) AS post
+                  FROM read_json('/tmp/stats_exports/*/post_earnings_breakdown/*.json')) t) d
+      WHERE earnings NOT NULL AND stats.membershipType = 'MEMBER')
+WHERE earnings.periodStartedAt = stats.dayStartsAt;
+```
+
+![DuckDB query results for linear regression between member interactions and earnings](../../../docs/diagrams/blog/paid-medium-articles/duckdb_linear_regression_interactions.png)
+
+## Did I Reach Break Even?
+
+So I wasn't really expecting much out of it but was at least hoping to break even with my initial $50 USD investment.
+
+#### Total earnings across all articles
+
+```sql
+SELECT SUM(CAST(CONCAT(node.earnings.total.units, '.',
+                       LEFT(CAST(node.earnings.total.nanos AS string), 2)) AS DOUBLE)) AS total_earnings
+FROM
+    read_json('/tmp/stats_exports/*/agg_stats/*.json');
+```
+
+![DuckDB query results for total earnings](../../../docs/diagrams/blog/paid-medium-articles/duckdb_total_earnings.png)
+
+Nope!