Skip to content

Commit

Permalink
Fix links from setup to docs, add in new article about publishing to …
Browse files Browse the repository at this point in the history
…medium
  • Loading branch information
pflooky committed Dec 19, 2024
1 parent aafbbf8 commit 05e412f
Show file tree
Hide file tree
Showing 20 changed files with 236 additions and 30 deletions.
Binary file modified docs/diagrams/.DS_Store
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/docs/guide/scenario/data-validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ validation considers an acceptable error threshold before marking it as failed.
- Customisation
- Adjust the regex pattern and error threshold based on your specific data schema and validation requirements.
- For the full list of types of basic validations that can be
used, [check this page](../../../setup/validation/basic-validation.md).
used, [check this page](../../../docs/validation/basic-validation.md).
- Understanding Tolerance
- Be mindful of the error threshold, as it directly influences what percentage of deviations from the pattern is
acceptable.
Expand Down Expand Up @@ -220,7 +220,7 @@ Line 2: `validation.groupBy("account_id").max("balance").lessThan(900)`
- Adjust the `errorThreshold` or validation to your specification scenario. The full list
of [types of validations can be found here](../../validation.md).
- For the full list of types of group by validations that can be
used, [check this page](../../../setup/validation/group-by-validation.md).
used, [check this page](../../../docs/validation/group-by-validation.md).

=== "Java"

Expand Down
4 changes: 2 additions & 2 deletions docs/get-started/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,6 @@ Check the report generated under `docker/data/custom/report/index.html`.

### Guided tour

[**Check out the starter guide here**](../setup/guide/scenario/first-data-generation.md) that will take your through
step by step. You can also check the other guides [**here**](../setup/guide/index.md) to see the other possibilities of
[**Check out the starter guide here**](../docs/guide/scenario/first-data-generation.md) that will take your through
step by step. You can also check the other guides [**here**](../docs/guide/index.md) to see the other possibilities of
what Data Caterer can achieve for you.
18 changes: 13 additions & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ alt="Data Caterer generate, validate and clean data testing flow">

## Main features

- :material-connection: [Connect to any data source](setup/connection.md)
- :material-auto-fix: [Auto generate production-like data from data connections or metadata sources](setup/guide/scenario/auto-generate-connection.md)
- :material-relation-many-to-one: [Relationships across data sources](setup/generator/foreign-key.md)
- :material-test-tube: [Validate based on data generated](setup/validation.md)
- :material-delete-sweep: [Clean up generated and downstream data](setup/delete-data.md)
- :material-connection: [Connect to any data source](docs/connection.md)
- :material-auto-fix: [Auto generate production-like data from data connections or metadata sources](docs/guide/scenario/auto-generate-connection.md)
- :material-relation-many-to-one: [Relationships across data sources](docs/generator/foreign-key.md)
- :material-test-tube: [Validate based on data generated](docs/validation.md)
- :material-delete-sweep: [Clean up generated and downstream data](docs/delete-data.md)

<span class="center-content">
[Try now](get-started/quick-start.md){ .md-button .md-button--primary .button-spaced }
Expand Down Expand Up @@ -89,6 +89,14 @@ alt="Data Caterer generate, validate and clean data testing flow">

</div>

## Who can use it

| Type | Interface | User |
|-----------|--------------------------------------------------------|--------------------------------------|
| No Code | [UI](get-started/quick-start.md#windows) | QA, Testers, Data Scientist, Analyst |
| Low Code | [YAML](get-started/quick-start.md#yaml) | DevOps, Kubernetes Fans |
| High Code | [Java/Scala](get-started/quick-start.md#javascala-api) | Software Developers, Data Engineers |

<span class="center-content">
[Try now](get-started/quick-start.md){ .md-button .md-button--primary .button-spaced }
[Demo](sample/ui/index.html){ .md-button .md-button--primary .button-spaced }
Expand Down
6 changes: 3 additions & 3 deletions docs/use-case.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ data gets generated and is consumed, you can also run validations to ensure your
These scenarios can be put together from existing tasks or data sources can be enabled/disabled based on your
requirement. Built into Data Caterer and controlled via feature flags, is the ability to test edge cases based on the
data type of the fields used for data generation (`enableEdgeCases` flag within `<field>.generator.options`, see more
[**here**](setup/generator/data-generator.md)).
[**here**](docs/generator/data-generator.md)).

## Data debugging

Expand All @@ -59,7 +59,7 @@ in whichever environment you want to test your changes against.
## Data profiling

When using Data Caterer with the feature flag `enableGeneratePlanAndTasks` enabled
(see [**here**](setup/configuration.md)), metadata relating all the fields defined in the data sources you have
(see [**here**](docs/configuration.md)), metadata relating all the fields defined in the data sources you have
configured will be generated via data profiling. You can run this as a standalone job (can disable `enableGenerateData`)
so that you can focus on the profile of the data you are utilising. This can be run against your production data sources
to ensure the metadata can be used to accurately generate data in other environments. This is a key feature of Data
Expand All @@ -69,6 +69,6 @@ lead to serious concerns about data security as seen [**here**](use-case/busines
## Schema gathering

When using Data Caterer with the feature flag `enableGeneratePlanAndTasks` enabled
(see [**here**](setup/configuration.md)), all schemas of the data sources defined will be tracked in a common format (as
(see [**here**](docs/configuration.md)), all schemas of the data sources defined will be tracked in a common format (as
tasks). This data, along with the data profiling metadata, could then feed back into your schema registries to help keep
them up to date with your system.
198 changes: 198 additions & 0 deletions docs/use-case/blog/a-year-of-getting-paid-from-medium-articles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
---
title: "A year of getting paid from Medium articles"
description: "Analysing my Medium article data to see how my trial of one year of paid articles performed."
image: "https://data.catering/diagrams/logo/data_catering_logo.svg"
---

# Creating Articles

At the end of last year, I decided to put more effort into creating articles both to help boost awareness of my
new business Data Catering and consolidate my knowledge and exploration of technology topics. I had been using
Medium for a couple of articles before and was enticed by the fact that you could monetize these articles via people
viewing, reading and reacting to what you have written. So I signed up to the Medium Partner Program for a year to
find out how much my articles are worth, hoping to at least break even at the end of it.

A small thing to note, Medium refers to articles as "stories". So wherever you see "story", just replace with article
as I'm not a story writer.

## Monetizing an Article

To earn money from an article, you have to restrict access to the article to only other members of the Medium Partner
Program. You have the option as a writer of the article to also share a friend's link which allows others to bypass this
restriction, but you don't earn anything based on the interactions from the friend's link.

So your only chances of earning from an article are essentially based on people outside your network, who are members,
interacting with your article.

## Medium Dashboards

You have access to a few dashboards which give you summary statistics on how users are interacting with your articles.

### Partner Program Dashboard

Shows you a summary of the current month earnings and gives you an overview on a per-story basis. Another thing to
note is that you now only get paid once you have reached $10 USD. Previously, you would get paid per month no matter
the amount.

![Medium Partner Program dashboard](../../../docs/diagrams/blog/paid-medium-articles/partner_program_dashboard.png)

### Audience Dashboard

An overview of subscribers to your articles. You can see a jump in October because I wrote an article that month.

![Medium audience dashboard](../../../docs/diagrams/blog/paid-medium-articles/audience_stats_dashboard.png)

### Story Dashboard

Details on how many people have viewed and read your articles.

![Story statistics dashboard](../../../docs/diagrams/blog/paid-medium-articles/story_stats_dashboard.png)

Basic sorting options for story statistics.

![Story statistics sorting options](../../../docs/diagrams/blog/paid-medium-articles/story_stats_sorting.png)

### Per Story Dashboard

You can drill down into more details at the article level.

![Per story overview with earnings](../../../docs/diagrams/blog/paid-medium-articles/per_story_stats_top.png)

The breakdown of member/non-member reads and views per day.

![Per story member/non-member reads and views per day](../../../docs/diagrams/blog/paid-medium-articles/per_story_stats_middle.png)

The sources of traffic to your article and the interests of your readers.

![Per story traffic sources and reader interests](../../../docs/diagrams/blog/paid-medium-articles/per_story_stats_bottom.png)

## I Want More Insights

I wanted to run some further analysis on my article data. So I went searching for some export options within the Medium
website but could only find you can export out your [audience statistics](#audience-dashboard). Eventually, I found
something more comprehensive in [this GitHub repo called medium_stats](https://github.com/otosky/medium_stats). Great!

When I ran it after installing it via pip, it ran into a JSON decoding error. Most likely Medium has changed its API
and the project needs to be updated. Using my internet skills, I opened up "Inspect" in my browsers, went to the "
Network" tab and tried to find out which API call contains all the juicy information. After a few clicks, I found this
GraphQL response.

![GraphQL query to get all story stats](../../../docs/diagrams/blog/paid-medium-articles/inspect_graphql_query.png)

I could see the
[medium_stats project was already making some GraphQL calls](https://github.com/otosky/medium_stats/blob/master/medium_stats/scraper.py#L15).
So I quickly cloned the repo, added in the missing GraphQL calls and got it to kinda work. Now I have exported out my
stats in JSON format. How can we analyse this data quickly?

### DuckDB

Without putting too much thought into it, I knew I could easily use DuckDB to query this JSON data via SQL.
Now I can run queries like this to get a nice compact view of the stats I'm interested in.

#### Views, reads and earnings per story

```sql
SELECT node.title AS title,
node.totalStats.views AS views,
node.totalStats.reads AS reads,
node.readingTime,
CAST(CONCAT(node.earnings.total.units, '.',
LEFT(CAST(node.earnings.total.nanos AS string), 2)) AS DOUBLE) AS earnings
FROM
read_json('/tmp/stats_exports/*/agg_stats/*.json')
ORDER BY earnings DESC;
```

![DuckDB query results for earnings per story](../../../docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_story.png)

#### Earnings per day per story

```sql
SELECT a.node.title AS title, e.date AS date, e.total_earnings AS earn
FROM (SELECT d.id AS id,
STRFTIME(MAKE_TIMESTAMP(CAST(d.daily_earning.periodStartedAt AS BIGINT) * 1000),
'%Y-%m-%d') AS date,
ROUND(SUM(d.daily_earning.amount / 100.0), 2) AS total_earnings
FROM (SELECT p.post.id AS id, UNNEST(p.post.earnings.dailyEarnings) AS daily_earning
FROM (SELECT UNNEST(data.post) AS post
FROM
read_json('/tmp/stats_exports/*/post_events/*.json')) p) d
GROUP BY id,
date) e
JOIN read_json('/tmp/stats_exports/*/agg_stats/*.json') a ON a.node.id = e.id
ORDER BY earn DESC;
```

![DuckDB query results for earnings per day per story](../../../docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_day_per_story.png)

#### Earnings per member interaction per story

```sql
SELECT id,
STRFTIME(MAKE_TIMESTAMP(CAST(earnings.periodStartedAt AS BIGINT) * 1000),
'%Y-%m-%d') AS date,
earnings.amount AS amount,
stats.readersThatReadCount AS reads,
stats.readersThatViewedCount AS views,
stats.readersThatClappedCount AS claps,
stats.readersThatRepliedCount AS replies,
stats.readersThatHighlightedCount AS highlights,
stats.readersThatInitiallyFollowedAuthorFromThisPostCount AS follows
FROM (SELECT d.id AS id,
d.stats AS stats,
UNNEST(d.earnings) AS earnings
FROM (SELECT t.post.id AS id,
t.post.earnings.dailyEarnings AS earnings,
UNNEST(t.post.postStatsDailyBundle.buckets) AS stats
FROM (SELECT UNNEST(data.post) AS post
FROM read_json('/tmp/stats_exports/*/post_earnings_breakdown/*.json')) t) d
WHERE earnings NOT NULL AND stats.membershipType = 'MEMBER')
WHERE earnings.periodStartedAt = stats.dayStartsAt
ORDER BY amount DESC;
```

![DuckDB query results for earnings with interactions from members](../../../docs/diagrams/blog/paid-medium-articles/duckdb_earnings_per_interaction.png)

#### Linear regression of member interactions with earnings

I don't think I have enough data to get an accurate estimation of the formula used by Medium to calculate earnings but
the below query gives a rough indication. Someone else with a larger following could get a better estimate.

```sql
SELECT REGR_SLOPE(earnings.amount, stats.readersThatReadCount) AS slope_read,
REGR_SLOPE(earnings.amount, stats.readersThatViewedCount) AS slope_view,
REGR_SLOPE(earnings.amount, stats.readersThatClappedCount) AS slope_clap,
REGR_SLOPE(earnings.amount, stats.readersThatRepliedCount) AS slope_reply,
REGR_SLOPE(earnings.amount, stats.readersThatHighlightedCount) AS slope_highlight,
REGR_SLOPE(earnings.amount, stats.readersThatInitiallyFollowedAuthorFromThisPostCount) AS slope_follow,
REGR_INTERCEPT(earnings.amount, stats.readersThatReadCount) AS intercept
FROM (SELECT d.id AS id,
d.stats AS stats,
UNNEST(d.earnings) AS earnings
FROM (SELECT t.post.id AS id,
t.post.earnings.dailyEarnings AS earnings,
UNNEST(t.post.postStatsDailyBundle.buckets) AS stats
FROM (SELECT UNNEST(data.post) AS post
FROM read_json('/tmp/stats_exports/*/post_earnings_breakdown/*.json')) t) d
WHERE earnings NOT NULL AND stats.membershipType = 'MEMBER')
WHERE earnings.periodStartedAt = stats.dayStartsAt;
```

![DuckDB query results for linear regression between member interactions and earnings](../../../docs/diagrams/blog/paid-medium-articles/duckdb_linear_regression_interactions.png)

## Did I Reach Break Even?

So I wasn't really expecting much out of it but was at least hoping to break even with my initial $50 USD investment.

#### Total earnings across all articles

```sql
SELECT SUM(CAST(CONCAT(node.earnings.total.units, '.',
LEFT(CAST(node.earnings.total.nanos AS string), 2)) AS DOUBLE)) AS total_earnings
FROM
read_json('/tmp/stats_exports/*/agg_stats/*.json');
```

![DuckDB query results for total earnings](../../../docs/diagrams/blog/paid-medium-articles/duckdb_total_earnings.png)

Nope!
Loading

0 comments on commit 05e412f

Please sign in to comment.