Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed some factory issues to address Percy's false positive flags #11771

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

mmmavis
Copy link
Collaborator

@mmmavis mmmavis commented Jan 25, 2024

I used this past Percy test as a reference to identify false-positive flags we have been getting recently that are caused by factory configuration

This PR

  • fixes how to factory generate dates so they are relative to a fixed date (2020-01-01) instead of "now"
  • fixes how factory generates MozFest's ticket snippet's cost value

Related to #10328

┆Issue is synchronized with this Jira Task

@mmmavis mmmavis force-pushed the 10328-percy-flags-date branch from d079754 to c794da8 Compare January 25, 2024 23:05
@@ -18,17 +18,12 @@ class Meta:
"footer_sentence",
)

class Params:
Copy link
Collaborator Author

@mmmavis mmmavis Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These details seem a bit unnecessary to me. I'm deleting them to avoid complication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are helpful if creating these in a test. Instead of doing something like HighlightFactory(publish_after=timedelta(datetime.now() + 1)) to indicate an unpublished highlight, we can just do HighlightFactory(unpublished=True) and it will create it for us with a publish_after in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth deleting them, they are helpful factory traits

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool I'll add them back!

@@ -32,34 +32,15 @@ class Meta:
exclude = ("headline_sentence",)

class Params:
unpublished = Trait(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, these details seem a bit unnecessary to me. I'm deleting them to avoid complication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep these as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add them back!

video = Trait(is_video=True)

headline = LazyAttribute(lambda o: o.headline_sentence.rstrip("."))
outlet = Faker("company")
date = Faker("date") if RANDOM_SEED and not TESTING else Faker("past_date", start_date="-30d")
date = Faker("past_date", start_date=datetime(2020, 1, 1), tzinfo=timezone.utc)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure why we needed to have two ways to generate values for date here. Any insights?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure... it seems to be generating a random date in the past 30 days for testing while for all other generations it uses a random date with no restrictions

if RANDOM_SEED and not TESTING
else Faker("past_datetime", start_date="-30d", tzinfo=timezone.utc)
)
publish_after = Faker("past_datetime", start_date=datetime(2020, 1, 1), tzinfo=timezone.utc)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here. I'm curious why RANDOM_SEED and TESTING matter when it comes to generating value for publish_after

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either. I think we're safe to simplify this if the Python tests pass.

Faker("date_time", tzinfo=timezone.utc)
if RANDOM_SEED and not TESTING
else Faker("past_datetime", start_date="-30d", tzinfo=timezone.utc)
first_published_at = Faker("past_datetime", start_date=datetime(2020, 1, 1), tzinfo=timezone.utc)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same q as above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

@mmmavis mmmavis force-pushed the 10328-percy-flags-date branch from c794da8 to faf23d0 Compare January 25, 2024 23:32
@mmmavis mmmavis force-pushed the 10328-percy-flags-date branch from 0c04f73 to 5cd4015 Compare January 25, 2024 23:40
@mmmavis mmmavis changed the title [WIP] Fixed how to factory generate dates [WIP] Fixed some factory issues to address Percy's false positive flags Jan 25, 2024
@mmmavis mmmavis changed the title [WIP] Fixed some factory issues to address Percy's false positive flags Fixed some factory issues to address Percy's false positive flags Jan 26, 2024
@mmmavis mmmavis marked this pull request as ready for review January 26, 2024 07:16
Copy link
Contributor

@jhonatan-lopes jhonatan-lopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mmmavis, thanks for these changes. I'm a bit concerned that these won't be enough to solve the issue though. The dates randomly changing on RCC/Research Hub does not seem to be caused by the factory date generation that you changed here, for instance.

I think it relates more to controlling the randomness. If the randomness is indeed reproducible, the dates shouldn't be changing like that from one Percy test to another.

@@ -18,17 +18,12 @@ class Meta:
"footer_sentence",
)

class Params:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are helpful if creating these in a test. Instead of doing something like HighlightFactory(publish_after=timedelta(datetime.now() + 1)) to indicate an unpublished highlight, we can just do HighlightFactory(unpublished=True) and it will create it for us with a publish_after in the future.

@@ -18,17 +18,12 @@ class Meta:
"footer_sentence",
)

class Params:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth deleting them, they are helpful factory traits

Comment on lines -31 to +26
publish_after = Faker("past_datetime", start_date="-30d", tzinfo=timezone.utc)
publish_after = Faker("past_datetime", start_date=datetime(2020, 1, 1), tzinfo=timezone.utc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this would solve the problem though... 🤔 we would still get a random date but now between Jan 1st 2020 and the current date instead of the last 30 days. We might need a different approach

@@ -32,34 +32,15 @@ class Meta:
exclude = ("headline_sentence",)

class Params:
unpublished = Trait(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep these as well

video = Trait(is_video=True)

headline = LazyAttribute(lambda o: o.headline_sentence.rstrip("."))
outlet = Faker("company")
date = Faker("date") if RANDOM_SEED and not TESTING else Faker("past_date", start_date="-30d")
date = Faker("past_date", start_date=datetime(2020, 1, 1), tzinfo=timezone.utc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure... it seems to be generating a random date in the past 30 days for testing while for all other generations it uses a random date with no restrictions

if RANDOM_SEED and not TESTING
else Faker("past_datetime", start_date="-30d", tzinfo=timezone.utc)
)
publish_after = Faker("past_datetime", start_date=datetime(2020, 1, 1), tzinfo=timezone.utc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either. I think we're safe to simplify this if the Python tests pass.

Faker("date_time", tzinfo=timezone.utc)
if RANDOM_SEED and not TESTING
else Faker("past_datetime", start_date="-30d", tzinfo=timezone.utc)
first_published_at = Faker("past_datetime", start_date=datetime(2020, 1, 1), tzinfo=timezone.utc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

@mmmavis
Copy link
Collaborator Author

mmmavis commented Jan 26, 2024

Hi @jhonatan-lopes , thanks for the feedback! I'll add the Trait stuff back.

Regarding the date generation issue - from my observation in the past, the date difference seemed always to be "x days" from when the main branch was last updated. For example, if the last Percy build for main was on January 1, 2023 and an article's published date was generated to be June 1, 2020, if I ran a Percy build again on the next day (Jan 2, 2023) then the same article's published date would become June 2, 2020. I suspected the issue was because we used "today" as the baseline to generate the date (e.g., Faker("past_datetime", start_date="-30d", tzinfo=timezone.utc)) and that became a variable (with changing value everyday) used in faker's algorithm which ultimately caused the "x days" difference if we ran the Percy build x days apart. I'm thinking by having a fixed date as baseline will always give us an expected value given that the same SEED value is provided, just like how get_random_option works. That said, as a proof of concept I ran another Percy build today on this PR and compared the results with yesterday's build. There are still some leftover issues I need to address. I'm gonna revise my code and have this PR open for another few days to keep my proof of concept going before re-requesting a new review from you. 😁

@jhonatan-lopes
Copy link
Contributor

That sounds great, thanks @mmmavis! I tried some stuff locally here but couldn't get to the bottom of the problem either but your comment definitely does make sense. I'll try some other stuff with that in mind (trying to "fudge today's date" and re-generating data) to see if I can repeat the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants