Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sweep: Modify di_website/publications/mixins.py or di_website/publications/models.py so stories.search(search_filter) only matches whole words #1339

Open
akmiller01 opened this issue Aug 4, 2023 · 2 comments
Labels
bug Something isn't working sweep Assigns Sweep to an issue or pull request. wontfix This will not be worked on

Comments

@akmiller01
Copy link
Contributor

akmiller01 commented Aug 4, 2023

Modify di_website/publications/mixins.py or di_website/publications/models.py so stories.search(search_filter) only matches whole words. ElasticSearch is interpreting this line https://github.com/devinit/DIwebsite-redesign/blob/develop/di_website/publications/mixins.py such that a search_filter variable here https://github.com/devinit/DIwebsite-redesign/blob/develop/di_website/publications/models.py is returning matches for "disbursement" when the search_filter is equal to "disability." The code should be modified such that a search_filter equal to "disability" only returns results for closely related words like "disabilities."

The search is using ElasticSearch, so it's possible the solution may involve passing es_extra dict to the index.SearchField to change an argument like the analyzer.

@akmiller01 akmiller01 added the bug Something isn't working label Aug 4, 2023
@sweep-ai sweep-ai bot added the sweep Assigns Sweep to an issue or pull request. label Aug 4, 2023
@sweep-ai
Copy link

sweep-ai bot commented Aug 4, 2023

Here's the PR! #1341.

⚡ Sweep Free Trial: I used GPT-4 to create this ticket. You have 5 GPT-4 tickets left. For more GPT-4 tickets, visit our payment portal.To get Sweep to recreate this ticket, leave a comment prefixed with "sweep:" or edit the issue.


Step 1: 🔍 Code Search

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description.

from django.db import models
from django.utils.functional import cached_property
from di_website.common.constants import MAX_RELATED_LINKS
from wagtail.models import Page
from wagtail.contrib.redirects.models import Redirect
from wagtail.search import index
from di_website.common.base import get_related_pages
from di_website.common.templatetags.string_utils import uid
from .fields import flexible_content_streamfield, content_streamfield, pub_foreword_flexible_content_streamfield
from .utils import WagtailImageField, get_downloads
RELATED_CHOICES = (
('manual', 'Manual'),
('country', 'Country'),
('topic', 'Topic')
)
class FilteredDatasetMixin(object):
@cached_property
def filtered_datasets(self):
results = []
all_pub_datasets = self.publication_datasets.all()
for pub_dataset in all_pub_datasets:
if type(pub_dataset.dataset.specific).__name__ == "DatasetPage":
results.append(pub_dataset)
return results
class UniquePageMixin(object):
@classmethod
def can_create_at(cls, parent) -> bool:
return super(UniquePageMixin, cls).can_create_at(parent) and not cls.objects.exists()
class UniqueForParentPageMixin(object):
@classmethod
def can_create_at(cls, parent) -> bool:
return super(UniqueForParentPageMixin, cls).can_create_at(parent) \
and not parent.get_children().type(cls).exists()
class ParentPageSearchMixin(object):
search_fields = Page.search_fields + [
index.FilterField('slug')
]
class PageSearchMixin(object):
search_fields = Page.search_fields + [
index.FilterField('slug'),
index.SearchField('content', partial_match=True)
]
class LegacyPageSearchMixin(object):
search_fields = Page.search_fields + [
index.FilterField('slug'),
index.SearchField('raw_content', partial_match=True),
index.SearchField('content', partial_match=True)
]
class PublicationPageSearchMixin(object):
search_fields = Page.search_fields + [
index.FilterField('slug'),
index.SearchField('title', partial_match=True),
index.SearchField('hero_text', partial_match=True)
]
def CustomPageSearchFields(fields):
return Page.search_fields + [index.SearchField(x, partial_match=True) for x in fields]
class PublishedDateMixin(models.Model):
class Meta:
abstract = True
published_date = models.DateTimeField(
blank=True,
null=True,
help_text='This date will be used for display and ordering',
)
@cached_property
def publication_date(self):
if self.published_date:
return self.published_date
parent = self.get_parent()
if parent and hasattr(parent, 'published_date') and parent.published_date:
return parent.published_date
return None
class UUIDMixin(models.Model):
class Meta:
abstract = True
uuid = models.CharField(max_length=6, default=uid)
def save(self, *args, **kwargs):
old_path = '/%s' % self.uuid
# using Redirect to enforce uuid uniqueness as using a unique field is prone to validation errors on page revisions
existing_redirect = Redirect.objects.filter(old_path=old_path).first()
if existing_redirect and existing_redirect.redirect_page.id == self.id:
super(UUIDMixin, self).save(*args, **kwargs)
else:
self.uuid = uid()
super(UUIDMixin, self).save(*args, **kwargs)
old_path = '/%s' % self.uuid
redirect = Redirect.objects.filter(old_path=old_path).first()
if not redirect:
Redirect(old_path=old_path, redirect_page=self).save()
else:
self.save(*args, **kwargs)
class ReportChildMixin(models.Model):
class Meta:
abstract = True
@cached_property
def publication_downloads_title(self):
return 'Downloads'
@cached_property
def publication_downloads_list(self):
return get_downloads(self, with_parent=True)
@cached_property
def data_downloads_title(self):
return 'Data downloads'
@cached_property
def data_downloads_list(self):
return get_downloads(self, with_parent=True, data=True)
@cached_property
def page_publication_downloads(self):
return self.publication_downloads.all()
@cached_property
def page_data_downloads(self):
return self.data_downloads.all()
class FlexibleContentMixin(models.Model):
class Meta:
abstract = True
content = flexible_content_streamfield()
class PubForewordFlexibleContentMixin(models.Model):
class Meta:
abstract = True
content = pub_foreword_flexible_content_streamfield()
class ContentMixin(models.Model):
class Meta:
abstract = True
content = content_streamfield()
class OptionalContentMixin(models.Model):
class Meta:
abstract = True
content = content_streamfield(blank=True)
class ReportDownloadMixin(models.Model):
class Meta:
abstract = True
download_report_cover = WagtailImageField(verbose_name='Report cover')
download_report_title = models.CharField(
max_length=255, blank=True,
default="Download this report", verbose_name='Section title')
download_report_body = models.TextField(blank=True, verbose_name='Section body', default='')
download_report_button_text = models.CharField(
max_length=255, blank=True,
default="Download now", verbose_name='Button caption')
report_download = models.ForeignKey(
'wagtaildocs.Document',
null=True,
blank=True,
on_delete=models.SET_NULL,
related_name='+'
)
class InheritCTAMixin(models.Model):
"""
Used by child pages of the PublicationPage to inherit Call To Action
"""
class Meta:
abstract = True
@cached_property
def call_to_action(self):
return self.get_parent().specific.publication_cta.filter(inherit=True)
class HeroButtonMixin(models.Model):
download_button_caption = models.CharField(
max_length=100, default="Downloads", blank=True, verbose_name='Downloads')
read_online_button_text = models.CharField(
max_length=256, default="Read Online", blank=True, verbose_name='Read Online')
request_hard_copy_text = models.CharField(
max_length=256, default="Request a hard copy", blank=True,
verbose_name='Read Hard Copy')
class Meta:
abstract = True
class RelatedLinksMixin(models.Model):
related_option_handler = models.CharField(
max_length=253, choices=RELATED_CHOICES, default='manual', verbose_name='Show By')
class Meta:
abstract = True
def sort_pages(self, combined_queryset):
pages_with_published_date = [d for d in combined_queryset if getattr(d.specific, 'published_date', 0) != 0 and d.specific.published_date]
pages_with_published_date.sort(key=lambda x: x.specific.published_date, reverse=True)
return pages_with_published_date[:MAX_RELATED_LINKS] if len(pages_with_published_date) > MAX_RELATED_LINKS else pages_with_published_date
def get_related_links(self, objects=None):
if not objects:
return None
if self.related_option_handler == 'topic' or self.related_option_handler == 'Topic':
combined_queryset = []
for key in objects:
results = objects[key].live().filter(topics__in=self.topics.get_queryset()).exclude(id=self.id).distinct()
for item in results:
if not item.alias_of:
combined_queryset.append(item)
slice_queryset = self.sort_pages(combined_queryset)
return get_related_pages(self, slice_queryset, objects)
elif self.related_option_handler == 'country' or self.related_option_handler == 'Country':
countries = [country.country.name for country in self.page_countries.all()]
combined_queryset = []
for key in objects:
results = objects[key].live().filter(page_countries__country__name__in=countries).exclude(id=self.id).distinct()
for item in results:
if not item.alias_of:
combined_queryset.append(item)
slice_queryset = self.sort_pages(combined_queryset)
return get_related_pages(self, slice_queryset, objects)
elif self.related_option_handler == 'manual' or self.related_option_handler == 'Manual':
return get_related_pages(self, self.publication_related_links.all(), objects)

import operator
from functools import reduce
from itertools import chain
from num2words import num2words
from taggit.models import Tag, TaggedItemBase
from django import forms
from django.utils.timezone import now
from django.contrib.contenttypes.models import ContentType
from django.core.paginator import EmptyPage, PageNotAnInteger, Paginator
from django.db import models
from django.utils.functional import cached_property
from django.utils.text import slugify
from modelcluster.contrib.taggit import ClusterTaggableManager
from modelcluster.fields import ParentalKey
from modelcluster.models import ClusterableModel
from wagtail.admin.panels import (FieldPanel, InlinePanel, PageChooserPanel)
from wagtail.contrib.redirects.models import Redirect
from wagtail.contrib.search_promotions.templatetags.wagtailsearchpromotions_tags import get_search_promotions
from wagtail import hooks
from wagtail.blocks import (CharBlock, PageChooserBlock, StructBlock, URLBlock)
from wagtail.fields import RichTextField, StreamField
from wagtail.models import Orderable, Page
from wagtail.images.blocks import ImageChooserBlock
from wagtail.search.models import Query
from wagtail.snippets.models import register_snippet
from di_website.common.base import (get_paginator_range, get_related_pages, hero_panels, call_to_action_panel)
from di_website.common.constants import (MAX_PAGE_SIZE, MAX_RELATED_LINKS, PODCAST_PROVIDERS, RICHTEXT_FEATURES)
from di_website.common.mixins import (HeroMixin, OtherPageMixin, SectionBodyMixin, TypesetBodyMixin, CallToActionMixin)
from di_website.downloads.utils import DownloadsPanel
from .edit_handlers import MultiFieldPanel
from .inlines import *
from .mixins import (
FilteredDatasetMixin, FlexibleContentMixin, HeroButtonMixin, InheritCTAMixin, PubForewordFlexibleContentMixin, PublicationPageSearchMixin,
PublishedDateMixin, ReportChildMixin, RelatedLinksMixin, ReportDownloadMixin, UniqueForParentPageMixin, UUIDMixin)
from .utils import (
ContentPanel, HeroButtonPanel, PublishedDatePanel, ReportDownloadPanel, UUIDPanel, WagtailImageField, RelatedLinksPanel,
get_downloads, get_first_child_of_type, get_ordered_children_of_type)
RED = 'poppy'
BLUE = 'bluebell'
PINK = 'rose'
YELLOW = 'sunflower'
ORANGE = 'marigold'
PURPLE = 'lavendar'
GREEN = 'leaf'
COLOUR_CHOICES = (
(RED, 'Red'),
(BLUE, 'Blue'),
(PINK, 'Pink'),
(YELLOW, 'Yellow'),
(ORANGE, 'Orange'),
(PURPLE, 'Purple'),
(GREEN, 'Green')
)
class PublicationTopic(TaggedItemBase):
content_object = ParentalKey('publications.PublicationPage', on_delete=models.CASCADE, related_name='publication_topics')
class LegacyPublicationTopic(TaggedItemBase):
content_object = ParentalKey('publications.LegacyPublicationPage', on_delete=models.CASCADE, related_name='legacy_publication_topics')
class ShortPublicationTopic(TaggedItemBase):
content_object = ParentalKey('publications.ShortPublicationPage', on_delete=models.CASCADE, related_name='short_publication_topics')
class AudioVisualMediaTopic(TaggedItemBase):
content_object = ParentalKey('publications.AudioVisualMedia', on_delete=models.CASCADE, related_name='audio_visual_media_topics')
@hooks.register('construct_media_chooser_queryset')
def show_my_uploaded_media_only(media, request):
# Only show uploaded audio files
media = media.filter(type='audio')
return media
@register_snippet
class Region(ClusterableModel):
name = models.CharField(max_length=255, unique=True)
panels = [
FieldPanel('name'),
]
class Meta:
ordering = ["name"]
def __str__(self):
return self.name
@register_snippet
class Country(ClusterableModel):
name = models.CharField(max_length=255, unique=True)
region = models.ForeignKey(
Region, related_name="+", on_delete=models.CASCADE)
slug = models.SlugField(
max_length=255, blank=True, null=True,
help_text="Optional. Will be auto-generated from name if left blank.")
panels = [
FieldPanel('name'),
FieldPanel('region'),
FieldPanel('slug'),
]
class Meta:
ordering = ["name"]
verbose_name_plural = 'Countries'
def __str__(self):
return self.name
def save(self, *args, **kwargs):
if not self.slug:
self.slug = slugify(self.name)
super(Country, self).save(*args, **kwargs)
@register_snippet
class PodcastProvider(models.Model):
podcast_provider_platform = models.CharField(
max_length=100,
choices=PODCAST_PROVIDERS
)
link_url = models.URLField(max_length=255)
panels = [
FieldPanel('podcast_provider_platform'),
FieldPanel('link_url')
]
def __str__(self):
provider = [choice[1] for choice in PODCAST_PROVIDERS if choice[0] == self.podcast_provider_platform]
return '%s - %s' % (provider[0], self.link_url)
class Meta():
verbose_name = 'Podcast Provider'
verbose_name_plural = 'Podcast Providers'
class PageCountry(Orderable):
page = ParentalKey(
Page, related_name='page_countries', on_delete=models.CASCADE
)
country = models.ForeignKey(
Country, related_name="+", null=True, blank=True, on_delete=models.CASCADE)
def __str__(self):
return self.country.name
@register_snippet
class ResourceCategory(ClusterableModel):
name = models.CharField(max_length=255, unique=True)
panels = [
FieldPanel('name'),
]
class Meta:
ordering = ["name"]
verbose_name = 'Resource Category'
verbose_name_plural = 'Resource Categories'
def __str__(self):
return self.name
@register_snippet
class PublicationType(ClusterableModel):
name = models.CharField(max_length=255, unique=True)
resource_category = models.ForeignKey(
ResourceCategory, related_name="+", on_delete=models.CASCADE, null=True, blank=False)
slug = models.SlugField(
max_length=255, blank=True, null=True,
help_text="Optional. Will be auto-generated from name if left blank.")
show_in_filter = models.BooleanField(
default=True,
help_text='Used to exclude obsolete tags that are still tied to resources',
verbose_name='Show in filter')
panels = [
FieldPanel('name'),
FieldPanel('resource_category'),
FieldPanel('show_in_filter'),
FieldPanel('slug'),
]
class Meta:
ordering = ["name"]
verbose_name = 'Resource Type'
def __str__(self):
return self.name
def save(self, *args, **kwargs):
if not self.slug:
self.slug = slugify(self.name)
super(PublicationType, self).save(*args, **kwargs)
class PublicationIndexPage(HeroMixin, Page):
content_panels = Page.content_panels + [
hero_panels(),
InlinePanel('page_notifications', label='Notifications')
]
subpage_types = ['PublicationPage', 'LegacyPublicationPage', 'ShortPublicationPage', 'general.General', 'AudioVisualMedia']
parent_page_types = ['home.HomePage']
def get_context(self, request):
context = super(PublicationIndexPage, self).get_context(request)
search_filter = request.GET.get('q', None)
if search_filter:
sort_options = [
('date_desc', 'newest first'),
('date_asc', 'oldest first'),
('score', 'relevance')
]
else:
sort_options = [
('date_desc', 'newest first'),
('date_asc', 'oldest first')
]
sort_ids = [sort_opt[0] for sort_opt in sort_options]
page = request.GET.get('page', None)
topic_filter = request.GET.get('topic', None)
country_filter = request.GET.get('country', None)
types_filter = request.GET.get('types', None)
selected_sort = request.GET.get('sort', 'date_desc')
if selected_sort not in sort_ids:
selected_sort = 'date_desc'
if topic_filter:
stories = PublicationPage.objects.descendant_of(self).live().filter(topics__slug=topic_filter)
legacy_pubs = LegacyPublicationPage.objects.descendant_of(self).live().filter(topics__slug=topic_filter)
short_pubs = ShortPublicationPage.objects.descendant_of(self).live().filter(topics__slug=topic_filter)
audio_visual_media = AudioVisualMedia.objects.descendant_of(self).live().filter(topics__slug=topic_filter)
else:
stories = PublicationPage.objects.descendant_of(self).live()
legacy_pubs = LegacyPublicationPage.objects.descendant_of(self).live()
short_pubs = ShortPublicationPage.objects.descendant_of(self).live()
audio_visual_media = AudioVisualMedia.objects.descendant_of(self).live()
if not request.user.is_authenticated:
stories = stories.public()
legacy_pubs = legacy_pubs.public()
short_pubs = short_pubs.public()
audio_visual_media = audio_visual_media.public()
if country_filter:
stories = stories.filter(page_countries__country__slug=country_filter)
legacy_pubs = legacy_pubs.filter(page_countries__country__slug=country_filter)
short_pubs = short_pubs.filter(page_countries__country__slug=country_filter)
audio_visual_media = audio_visual_media.filter(page_countries__country__slug=country_filter)
if types_filter:
stories = stories.filter(publication_type__slug=types_filter)
legacy_pubs = legacy_pubs.filter(publication_type__slug=types_filter)
short_pubs = short_pubs.filter(publication_type__slug=types_filter)
audio_visual_media = audio_visual_media.filter(publication_type__slug=types_filter)
if search_filter:
query = Query.get(search_filter)
query.add_hit()
if stories:
child_count = reduce(operator.add, [len(pub.get_children()) for pub in stories])
if child_count:
pub_children = reduce(operator.or_, [pub.get_children() for pub in stories]).live().specific().search(search_filter).annotate_score("_child_score")
if pub_children:
matching_parents = reduce(operator.or_, [stories.parent_of(child).annotate(_score=models.Value(child._child_score, output_field=models.FloatField())) for child in pub_children])
stories = list(chain(stories.exclude(id__in=matching_parents.values_list('id', flat=True)).search(search_filter).annotate_score("_score"), matching_parents))
else:
stories = stories.search(search_filter).annotate_score("_score")
else:
stories = stories.search(search_filter).annotate_score("_score")
legacy_pubs = legacy_pubs.search(search_filter).annotate_score("_score")
short_pubs = short_pubs.search(search_filter).annotate_score("_score")
audio_visual_media = audio_visual_media.search(search_filter).annotate_score('_score')
story_list = list(chain(stories, legacy_pubs, short_pubs, audio_visual_media))
elasticsearch_is_active = True
for story in story_list:
if hasattr(story, "_score"):
if story._score is None:
elasticsearch_is_active = False
if selected_sort == "score" and elasticsearch_is_active:
story_list.sort(key=lambda x: x._score, reverse=True)
elif selected_sort == "date_asc":
story_list.sort(key=lambda x: x.published_date, reverse=False)
else:
story_list.sort(key=lambda x: x.published_date, reverse=True)
promos = get_search_promotions(search_filter)
promo_pages = [promo.page.specific for promo in promos if promo.page.live and isinstance(promo.page.specific, (PublicationPage, ShortPublicationPage, LegacyPublicationPage))]
if promo_pages:
story_list = [story for story in story_list if story not in promo_pages]
story_list = list(chain(promo_pages, story_list))
paginator = Paginator(story_list, MAX_PAGE_SIZE)
try:
context['stories'] = paginator.page(page)
except PageNotAnInteger:
context['stories'] = paginator.page(1)
except EmptyPage:
context['stories'] = paginator.page(paginator.num_pages)
pubs_content_type = ContentType.objects.get_for_model(PublicationPage)
leg_pubs_content_type = ContentType.objects.get_for_model(LegacyPublicationPage)
short_pubs_content_type = ContentType.objects.get_for_model(ShortPublicationPage)
context['topics'] = Tag.objects.filter(
models.Q(publications_publicationtopic_items__content_object__content_type=pubs_content_type) |
models.Q(publications_legacypublicationtopic_items__content_object__content_type=leg_pubs_content_type) |
models.Q(publications_shortpublicationtopic_items__content_object__content_type=short_pubs_content_type)
).distinct().order_by('name')
resource_types = PublicationType.objects.filter(show_in_filter=True).order_by('resource_category', 'name')
context['resource_types'] = resource_types
# ensure only used resource types are pushed to the page
for resource_type in resource_types:
if not self.resource_type_has_publications(resource_type):
context['resource_types'] = context['resource_types'].exclude(id=resource_type.id)
context['selected_type'] = types_filter
context['selected_topic'] = topic_filter
context['countries'] = Country.objects.all().order_by('region', 'name')
context['selected_country'] = country_filter
context['search_filter'] = search_filter
context['selected_sort'] = selected_sort
context['sort_options'] = sort_options
context['is_filtered'] = search_filter or topic_filter or country_filter or types_filter
context['paginator_range'] = get_paginator_range(paginator, context['stories'])
return context
def resource_type_has_publications(self, resource_type):
return (PublicationPage.objects.filter(publication_type=resource_type).first() or
ShortPublicationPage.objects.filter(publication_type=resource_type).first() or
LegacyPublicationPage.objects.filter(publication_type=resource_type).first() or
AudioVisualMedia.objects.filter(publication_type=resource_type).first())
class Meta():
verbose_name = 'Resources Index Page'
class PublicationPage(
HeroMixin, HeroButtonMixin, PublishedDateMixin, PublicationPageSearchMixin, UUIDMixin,
FilteredDatasetMixin, RelatedLinksMixin, ReportDownloadMixin, Page):
class Meta:
verbose_name = 'Publication Page'
template = 'publications/publication_page_c.html'
parent_page_types = ['PublicationIndexPage', 'general.General']
subpage_types = [
'PublicationForewordPage',
'PublicationSummaryPage',
'PublicationChapterPage',
'PublicationAppendixPage',
]
colour = models.CharField(max_length=256, choices=COLOUR_CHOICES, default=RED)
authors = StreamField([
('internal_author', PageChooserBlock(
required=False,
target_model='ourteam.TeamMemberPage',
icon='heroicons-user-solid',
label='Internal Author'
)),
('external_author', StructBlock([
('name', CharBlock(required=False)),
('title', CharBlock(required=False)),
('photograph', ImageChooserBlock(required=False)),
('page', URLBlock(required=False))
], icon='heroicons-user-solid', label='External Author'))
], blank=True, use_json_field=True)
publication_type = models.ForeignKey(
PublicationType, related_name="+", null=True, blank=False, on_delete=models.SET_NULL, verbose_name="Resource Type")
topics = ClusterTaggableManager(through=PublicationTopic, blank=True, verbose_name="Topics")
content_panels = Page.content_panels + [
FieldPanel('colour'),
hero_panels(),
HeroButtonPanel(),
FieldPanel('authors'),
FieldPanel('publication_type'),
FieldPanel('topics'),
InlinePanel('publication_datasets', label='Datasets'),
InlinePanel('page_countries', label="Countries"),
PublishedDatePanel(),
DownloadsPanel(
heading='Downloads',
description='Downloads for this report.'
),
DownloadsPanel(
related_name='data_downloads',
heading='Data downloads',
description='Optional: data download for this report.'
),
ReportDownloadPanel(),
UUIDPanel(),
InlinePanel('page_notifications', label='Notifications'),
RelatedLinksPanel(),
InlinePanel('publication_cta', label='Call To Action', max_num=2),
]
@cached_property
def publication_downloads_title(self):
return 'Downloads'
@cached_property
def publication_downloads_list(self):
return get_downloads(self)
@cached_property
def data_downloads_title(self):
return 'Data downloads'
@cached_property
def data_downloads_list(self):
return get_downloads(self, with_parent=False, data=True)
@cached_property
def page_publication_downloads(self):
return self.publication_downloads.all()
@cached_property
def page_data_downloads(self):
return self.data_downloads.all()
@cached_property
def foreword(self):
return get_first_child_of_type(self, PublicationForewordPage)
@cached_property
def summary(self):
return get_first_child_of_type(self, PublicationSummaryPage)
@cached_property
def chapters(self):
return get_ordered_children_of_type(self, PublicationChapterPage, 'publicationchapterpage__chapter_number')
@cached_property
def appendices(self):
return get_ordered_children_of_type(self, PublicationAppendixPage, 'publicationappendixpage__appendix_number')
@cached_property
def listing(self):
children = [self.foreword, self.summary]
children += list(self.chapters)
return list(filter(None, children))
@cached_property
def meta_and_appendices(self):
children = list()
children += list(self.appendices)
return list(filter(None, children))
@cached_property
def listing_and_appendicies(self):
return self.listing + self.meta_and_appendices
@cached_property
def chapter_max(self):
try:
return max([chapter.chapter_number for chapter in self.chapters])
except ValueError:
return 0
@cached_property
def call_to_action(self):
return self.publication_cta.all()
@cached_property
def call_to_action_has_top_position(self):
for cta in self.publication_cta.all():
if cta.position == 'top':
return True
return False
def save(self, *args, **kwargs):
if not self.published_date:
self.published_date = now()
super(PublicationPage, self).save(*args, **kwargs)
old_path = '/%s' % self.slug
redirect = Redirect.objects.filter(old_path=old_path).first()
if not redirect:
Redirect(old_path=old_path, redirect_page=self).save()
def get_context(self, request, *args, **kwargs):
context = super().get_context(request, *args, **kwargs)
context['related_pages'] = self.get_related_links({
'audio': AudioVisualMedia.objects,
'legacy': LegacyPublicationPage.objects,
'publication': PublicationPage.objects,
'short': ShortPublicationPage.objects
})
return context
class PublicationForewordPage(
HeroMixin, ReportChildMixin, PubForewordFlexibleContentMixin, PublishedDateMixin, PublicationPageSearchMixin, UniqueForParentPageMixin,
UUIDMixin, FilteredDatasetMixin, ReportDownloadMixin, InheritCTAMixin, RelatedLinksMixin, Page):
class Meta:
verbose_name = 'Publication Foreword'
parent_page_types = ['PublicationPage']
subpage_types = []
colour = models.CharField(max_length=256, choices=COLOUR_CHOICES, default=RED)
content_panels = Page.content_panels + [
hero_panels(),
FieldPanel('colour'),
ContentPanel(),
InlinePanel('publication_datasets', label='Datasets'),
PublishedDatePanel(),
DownloadsPanel(
heading='Downloads',
description='Downloads for this foreword.'
),
DownloadsPanel(
related_name='data_downloads',
heading='Data downloads',
description='Optional: data download for this foreword.',
max_num=1,
),
ReportDownloadPanel(),
RelatedLinksPanel(),
InlinePanel('page_notifications', label='Notifications')
]
@cached_property
def label(self):
return 'the foreword'
@cached_property
def label_type(self):
return 'foreword'
@cached_property
def nav_label(self):
return 'foreword'
@cached_property
def publication_downloads_title(self):
return 'Downloads'
@cached_property
def publication_downloads_list(self):
return get_downloads(self)
@cached_property
def data_downloads_title(self):
return 'Data downloads'
@cached_property
def data_downloads_list(self):
return get_downloads(self, with_parent=False, data=True)
def get_context(self, request, *args, **kwargs):
context = super().get_context(request, *args, **kwargs)
context['related_pages'] = self.get_related_links({})
return context
class PublicationSummaryPage(
HeroMixin, ReportChildMixin, FlexibleContentMixin, PublishedDateMixin, PublicationPageSearchMixin, UniqueForParentPageMixin,
UUIDMixin, FilteredDatasetMixin, ReportDownloadMixin, RelatedLinksMixin, InheritCTAMixin, Page):
class Meta:
verbose_name = 'Publication Summary'
verbose_name_plural = 'Publication Summaries'
parent_page_types = ['PublicationPage']
subpage_types = []
colour = models.CharField(max_length=256, choices=COLOUR_CHOICES, default=RED)
content_panels = Page.content_panels + [
FieldPanel('colour'),
hero_panels(),
ContentPanel(),
InlinePanel('publication_datasets', label='Datasets'),
PublishedDatePanel(),
DownloadsPanel(
heading='Downloads',
description='Downloads for this summary.'
),
DownloadsPanel(
related_name='data_downloads',
heading='Data downloads',
description='Optional: data download for this summary.'
),
ReportDownloadPanel(),
InlinePanel('page_notifications', label='Notifications'),
RelatedLinksPanel(),
]
@cached_property
def label_type(self):
return 'summary'
@cached_property
def label(self):
return 'the executive summary'
@cached_property
def nav_label(self):
return 'executive summary'
@cached_property
def publication_downloads_title(self):
return 'Downloads'
@cached_property
def publication_downloads_list(self):
return get_downloads(self)
@cached_property
def data_downloads_title(self):
return 'Data downloads'
@cached_property
def data_downloads_list(self):
return get_downloads(self, with_parent=False, data=True)
@cached_property
def page_publication_downloads(self):
return self.publication_downloads.all()
@cached_property
def page_data_downloads(self):
return self.data_downloads.all()
@cached_property
def sections(self):
sections = []
for block in self.content:
if block.block_type == 'section_heading':
sections.append(block)
return sections
def get_context(self, request, *args, **kwargs):
context = super().get_context(request, *args, **kwargs)
context['related_pages'] = self.get_related_links({})
return context
class PublicationChapterPage(
HeroMixin, ReportChildMixin, FlexibleContentMixin, PublishedDateMixin, PublicationPageSearchMixin,
UUIDMixin, FilteredDatasetMixin, ReportDownloadMixin, RelatedLinksMixin, InheritCTAMixin, Page):
class Meta:
verbose_name = 'Publication Chapter'
parent_page_types = ['PublicationPage']
subpage_types = []
chapter_number = models.PositiveIntegerField(
choices=[(i, num2words(i).title()) for i in range(1, 21)]
)
colour = models.CharField(max_length=256, choices=COLOUR_CHOICES, default=RED)
content_panels = Page.content_panels + [
FieldPanel('colour'),
hero_panels(),
MultiFieldPanel(
[
FieldPanel('chapter_number', widget=forms.Select),
],
heading='Chapter number',
description='Chapter number: this should be unique for each chapter of a report.'
),
ContentPanel(),
InlinePanel('publication_datasets', label='Datasets'),
PublishedDatePanel(),
DownloadsPanel(
heading='Downloads',
description='Downloads for this chapter.'
),
DownloadsPanel(
related_name='data_downloads',
heading='Data downloads',
description='Optional: data download for this chapter.'
),
ReportDownloadPanel(),
InlinePanel('page_notifications', label='Notifications'),
RelatedLinksPanel(),
]
@cached_property
def chapter_word(self):
return num2words(self.chapter_number)
@cached_property
def label_type(self):
return 'chapter'
@cached_property
def label(self):
return 'chapter %s' % self.chapter_word
@cached_property
def label_num(self):
return 'chapter %s' % str(self.chapter_number).zfill(1)
@cached_property
def nav_label(self):
return 'chapter %s' % self.chapter_word
@cached_property
def sections(self):
sections = []
for block in self.content:
if block.block_type == 'section_heading':
sections.append(block)
return sections
@cached_property
def publication_downloads_title(self):
return 'Downloads'
@cached_property
def publication_downloads_list(self):
return get_downloads(self)
@cached_property
def data_downloads_title(self):
return 'Data downloads'
@cached_property
def data_downloads_list(self):
return get_downloads(self, with_parent=False, data=True)
@cached_property
def page_publication_downloads(self):
return self.publication_downloads.all()
@cached_property
def page_data_downloads(self):
return self.data_downloads.all()
def get_context(self, request, *args, **kwargs):
context = super().get_context(request, *args, **kwargs)
context['related_pages'] = self.get_related_links({})
return context
class PublicationAppendixPage(
HeroMixin, ReportChildMixin, FlexibleContentMixin, PublishedDateMixin, PublicationPageSearchMixin,
UUIDMixin, FilteredDatasetMixin, ReportDownloadMixin,RelatedLinksMixin, Page):
class Meta:
verbose_name = 'Publication Appendix'
verbose_name_plural = 'Publication Appendices'
parent_page_types = ['PublicationPage']
subpage_types = []
appendix_number = models.PositiveIntegerField(
choices=[(i, num2words(i).title()) for i in range(1, 21)]
)
colour = models.CharField(max_length=256, choices=COLOUR_CHOICES, default=RED)
content_panels = Page.content_panels + [
FieldPanel('colour'),
hero_panels(),
MultiFieldPanel(
[
FieldPanel('appendix_number', widget=forms.Select),
],
heading='Appendix number',
description='Appendix number: this should be unique for each appendix of a report.'
),
ContentPanel(),
InlinePanel('publication_datasets', label='Datasets'),
PublishedDatePanel(),
DownloadsPanel(
heading='Downloads',
description='Downloads for this appendix page.'
),
DownloadsPanel(
related_name='data_downloads',
heading='Data downloads',
description='Optional: data download for this appendix page.'
),
ReportDownloadPanel(),
InlinePanel('page_notifications', label='Notifications'),
RelatedLinksPanel(),
]
@cached_property
def appendix_word(self):
return num2words(self.appendix_number)
@cached_property
def label_type(self):
return 'appendix'
@cached_property
def label(self):
return 'appendix %s' % self.appendix_word
@cached_property
def label_num(self):
return 'appendix %s' % str(self.appendix_number).zfill(1)
@cached_property
def nav_label(self):
return 'appendix %s' % self.appendix_word
@cached_property
def publication_downloads_title(self):
return 'Downloads'
@cached_property
def publication_downloads_list(self):
return get_downloads(self)
@cached_property
def data_downloads_title(self):
return 'Data downloads'
@cached_property
def data_downloads_list(self):
return get_downloads(self, with_parent=False, data=True)
@cached_property
def page_publication_downloads(self):
return self.publication_downloads.all()
@cached_property
def page_data_downloads(self):
return self.data_downloads.all()
@cached_property
def sections(self):
sections = []
for block in self.content:
if block.block_type == 'section_heading':
sections.append(block)
return sections
class LegacyPublicationPage(HeroMixin, PublishedDateMixin, PublicationPageSearchMixin, FilteredDatasetMixin, CallToActionMixin, RelatedLinksMixin, ReportDownloadMixin, Page):
class Meta:
verbose_name = 'Legacy Publication'
parent_page_types = ['PublicationIndexPage']
subpage_types = []
colour = models.CharField(max_length=256, choices=COLOUR_CHOICES, default=RED)
authors = StreamField([
('internal_author', PageChooserBlock(
required=False,
target_model='ourteam.TeamMemberPage',
icon='heroicons-user-solid',
label='Internal Author')),
('external_author', StructBlock([
('name', CharBlock(required=False)),
('title', CharBlock(required=False)),
('photograph', ImageChooserBlock(required=False)),
('page', URLBlock(required=False))
], icon='heroicons-user-solid', label='External Author'))
], blank=True, use_json_field=True)
publication_type = models.ForeignKey(
PublicationType, related_name="+", null=True, blank=False, on_delete=models.SET_NULL, verbose_name="Resource Type")
topics = ClusterTaggableManager(through=LegacyPublicationTopic, blank=True, verbose_name="Topics")
raw_content = models.TextField(null=True, blank=True)
content = RichTextField(
help_text='Content for the legacy report',
null=True, blank=True,
features=RICHTEXT_FEATURES
)
summary_image = WagtailImageField(
required=False,
help_text='Optimal minimum size 800x400px',
)
content_panels = Page.content_panels + [
FieldPanel('colour'),
hero_panels(),
FieldPanel('authors'),
call_to_action_panel(),
FieldPanel('publication_type'),
FieldPanel('topics'),
InlinePanel('page_countries', label="Countries"),
PublishedDatePanel(),
InlinePanel('publication_datasets', label='Datasets'),
DownloadsPanel(
heading='Reports',
description='Report downloads for this legacy report.'
),
DownloadsPanel(
related_name='data_downloads',
heading='Data downloads',
description='Optional: data download for this legacy report.'
),
ReportDownloadPanel(),
MultiFieldPanel(
[
FieldPanel('content'),
FieldPanel('raw_content'),
],
heading='Summary',
description='Summary for the legacy publication.'
),
InlinePanel('page_notifications', label='Notifications'),
RelatedLinksPanel(),
]
@cached_property
def publication_downloads_title(self):
return 'Downloads'
@cached_property
def publication_downloads_list(self):
return get_downloads(self)
@cached_property
def data_downloads_title(self):
return 'Data downloads'
@cached_property
def data_downloads_list(self):
return get_downloads(self, with_parent=False, data=True)
@cached_property
def page_publication_downloads(self):
return self.publication_downloads.all()
@cached_property
def page_data_downloads(self):
return self.data_downloads.all()
def get_context(self, request, *args, **kwargs):
context = super().get_context(request, *args, **kwargs)
context['related_pages'] = self.get_related_links({
'audio': AudioVisualMedia.objects,
'legacy': LegacyPublicationPage.objects,
'publication': PublicationPage.objects,
'short': ShortPublicationPage.objects
})
return context;
class ShortPublicationPage(
HeroMixin, PublishedDateMixin, FlexibleContentMixin, PublicationPageSearchMixin,
UUIDMixin, FilteredDatasetMixin, CallToActionMixin, ReportDownloadMixin, RelatedLinksMixin, Page):
class Meta:
verbose_name = 'Short Publication'
parent_page_types = ['PublicationIndexPage', 'datasection.DataSectionPage', 'home.HomePage']
subpage_types = []
colour = models.CharField(max_length=256, choices=COLOUR_CHOICES, default=RED)
authors = StreamField([
('internal_author', PageChooserBlock(
required=False,
target_model='ourteam.TeamMemberPage',
icon='heroicons-user-solid',
label='Internal Author')),
('external_author', StructBlock([
('name', CharBlock(required=False)),
('title', CharBlock(required=False)),
('photograph', ImageChooserBlock(required=False)),
('page', URLBlock(required=False))
], icon='heroicons-user-solid', label='External Author'))
], blank=True, use_json_field=True)
publication_type = models.ForeignKey(
PublicationType, related_name="+", null=True, blank=False, on_delete=models.SET_NULL, verbose_name="Resource Type")
topics = ClusterTaggableManager(through=ShortPublicationTopic, blank=True, verbose_name="Topics")
content_panels = Page.content_panels + [
FieldPanel('colour'),
hero_panels(),
FieldPanel('authors'),
call_to_action_panel(),
FieldPanel('publication_type'),
FieldPanel('topics'),
InlinePanel('page_countries', label="Countries"),
PublishedDatePanel(),
ContentPanel(),
InlinePanel('publication_datasets', label='Datasets'),
DownloadsPanel(
heading='Downloads',
description='Downloads for this chapter.'
),
DownloadsPanel(
related_name='data_downloads',
heading='Data downloads',
description='Optional: data download for this chapter.'
),
ReportDownloadPanel(),
InlinePanel('page_notifications', label='Notifications'),
RelatedLinksPanel(),
]
@cached_property
def publication_downloads_title(self):
return 'Downloads'
@cached_property
def publication_downloads_list(self):
return get_downloads(self)
@cached_property
def data_downloads_title(self):
return 'Data downloads'
@cached_property
def data_downloads_list(self):
return get_downloads(self, with_parent=False, data=True)
@cached_property
def page_publication_downloads(self):
return self.publication_downloads.all()
@cached_property
def page_data_downloads(self):
return self.data_downloads.all()
@cached_property
def chapter_number(self):
return 1
@cached_property
def chapters(self):
return [self]
@cached_property
def listing_and_appendicies(self):
return [self]
@cached_property
def chapter_word(self):
return num2words(self.chapter_number)
@cached_property
def label_type(self):
return 'publication'
@cached_property
def label(self):
return 'publication'
@cached_property
def label_num(self):
return 'publication'
@cached_property
def sections(self):
sections = []
for block in self.content:
if block.block_type == 'section_heading':
sections.append(block)
return sections
def get_context(self, request, *args, **kwargs):
context = super().get_context(request, *args, **kwargs)
context['related_pages'] = self.get_related_links({
'audio': AudioVisualMedia.objects,
'legacy': LegacyPublicationPage.objects,
'publication': PublicationPage.objects,
'short': ShortPublicationPage.objects
})
return context
def save(self, *args, **kwargs):
if not self.published_date:
self.published_date = now()
super(ShortPublicationPage, self).save(*args, **kwargs)
class AudioVisualMedia(PublishedDateMixin, TypesetBodyMixin, HeroMixin, PublicationPageSearchMixin, RelatedLinksMixin, SectionBodyMixin, CallToActionMixin, Page):
"""
Audio Visual page to be used as a child of the Resources Index Page
"""
template = 'publications/audio_visual_media.html'
publication_type = models.ForeignKey(
PublicationType, related_name="+", null=True, blank=False, on_delete=models.SET_NULL, verbose_name="Resource Type")
participants = StreamField([
('internal_participant', PageChooserBlock(
required=False,
target_model='ourteam.TeamMemberPage',
icon='heroicons-user-solid', label='Internal Participant'
)),
('external_participant', StructBlock([
('name', CharBlock(required=False)),
('title', CharBlock(required=False)),
('photograph', ImageChooserBlock(required=False)),
('page', URLBlock(required=False))
], icon='heroicons-user-solid', label='External Participant'))
], blank=True, help_text="The people involved in the podcast or webinar", use_json_field=True)
topics = ClusterTaggableManager(through=AudioVisualMediaTopic, blank=True, verbose_name="Topics")
content_panels = Page.content_panels + [
hero_panels(),
FieldPanel('participants'),
call_to_action_panel(),
FieldPanel('body'),
FieldPanel('sections'),
FieldPanel('publication_type'),
InlinePanel('page_countries', label="Countries"),
FieldPanel('topics'),
PublishedDatePanel(),
RelatedLinksPanel(),
InlinePanel('page_notifications', label='Notifications'),
]
parent_page_types = ['PublicationIndexPage']
subpage_types = []
def get_context(self, request, *args, **kwargs):
context = super().get_context(request, *args, **kwargs)
context['related_pages'] = self.get_related_links({
'audio': AudioVisualMedia.objects,
'legacy': LegacyPublicationPage.objects,
'publication': PublicationPage.objects,
'short': ShortPublicationPage.objects
})
return context
class Meta:
verbose_name = 'Audio and Visual Media Page'
class PublicationPageRelatedLink(OtherPageMixin):
page = ParentalKey(Page, related_name='publication_related_links', on_delete=models.CASCADE)
panels = [
PageChooserPanel('other_page')
]

job_title_name = staff_dataset["position"]
job_title, _ = JobTitle.objects.get_or_create(name=job_title_name)
page_check = TeamMemberPage.objects.filter(slug=slug)
if not page_check:
staff_page = TeamMemberPage(
title=staff_name,
slug=slug,
name=staff_name,
image=img,
position=job_title,
my_story=staff_dataset["body"]
)
our_team_page.add_child(instance=staff_page)
staff_page.save_revision().publish()
Redirect.objects.create(
site=staff_page.get_site(),
old_path="/post/people/{}".format(slug),
redirect_page=staff_page
)
departments = staff_dataset["department"].split()
department_names = [dept.replace("-", " ").capitalize() for dept in departments]
for department_name in department_names:
department, _ = Department.objects.get_or_create(name=department_name)
TeamMemberPageDepartment.objects.create(
page=staff_page,
department=department
)
self.stdout.write(self.style.SUCCESS('Successfully imported staff profiles.'))
if blog_index_page is not None:
with open(options['blogs_file']) as blogs_file:
blog_datasets = json.load(blogs_file)
for blog_dataset in blog_datasets:
slug = blog_dataset['url'].split('/')[-2]
blog_check = BlogArticlePage.objects.filter(slug=slug)
if not blog_check and blog_dataset['body'] != "":
blog_page = BlogArticlePage(
title=blog_dataset['title'],
slug=slug,
hero_text=blog_dataset['description'],
body=json.dumps([{'type': 'paragraph_block', 'value': blog_dataset['body']}]),
)
other_authors = []
author_names = blog_dataset["author"]
if author_names:
author_name = author_names[0]
internal_author_page_qs = TeamMemberPage.objects.filter(name=author_name)
if internal_author_page_qs:
blog_page.internal_author_page = internal_author_page_qs.first()
else:
author_obj = {"type": "external_author", "value": {"name": author_name, "title": "", "photograph": None, "page": ""}}
other_authors.append(author_obj)
if len(author_names) > 1:
for author_name in author_names[1:]:
internal_author_page_qs = TeamMemberPage.objects.filter(name=author_name)
if internal_author_page_qs:
author_obj = {"type": "internal_author", "value": internal_author_page_qs.first().pk}
else:
author_obj = {"type": "external_author", "value": {"name": author_name, "title": "", "photograph": None, "page": ""}}
other_authors.append(author_obj)
if other_authors:
blog_page.other_authors = json.dumps(other_authors)
blog_index_page.add_child(instance=blog_page)
blog_page.save_revision().publish()
blog_page.first_published_at = pytz.utc.localize(datetime.datetime.strptime(blog_dataset['date'], "%d %b %Y"))
blog_page.published_date = pytz.utc.localize(datetime.datetime.strptime(blog_dataset['date'], "%d %b %Y"))
blog_page.save_revision().publish()
Redirect.objects.create(
site=blog_page.get_site(),
old_path="/post/{}".format(slug),
redirect_page=blog_page
)
self.stdout.write(self.style.SUCCESS('Successfully imported blogs.'))
if news_index_page is not None:
with open(options['news_file']) as news_file:
news_datasets = json.load(news_file)
for news_dataset in news_datasets:
slug = news_dataset['url'].split('/')[-2].replace("%e2%88%92", "-")
news_check = NewsStoryPage.objects.filter(slug=slug)
if not news_check and news_dataset['body'] != "":
news_page = NewsStoryPage(
title=news_dataset['title'],
slug=slug,
hero_text=news_dataset['description'],
body=json.dumps([{'type': 'paragraph_block', 'value': news_dataset['body']}]),
)
news_index_page.add_child(instance=news_page)
news_page.save_revision().publish()
news_page.first_published_at = pytz.utc.localize(datetime.datetime.strptime(news_dataset['date'], "%d %b %Y"))
news_page.save_revision().publish()
try:
Redirect.objects.create(
site=news_page.get_site(),
old_path="/post/{}".format(news_dataset['url'].split('/')[-2]),
redirect_page=news_page
)
except IntegrityError:
pass # Sometimes a post was simultaneously news and a blog. In these cases retain both but don't have two redirects
self.stdout.write(self.style.SUCCESS('Successfully imported news.'))
if publication_index_page is not None:
with open(options['pubs_file']) as pubs_file:
publication_datasets = json.load(pubs_file)
for publication_dataset in publication_datasets:
publication_type, _ = PublicationType.objects.get_or_create(name=publication_dataset['format'].split(";")[0])
slug = publication_dataset['url'].split('/')[-2]
pub_check = LegacyPublicationPage.objects.filter(slug=slug)
if not pub_check and publication_dataset['body'] != "":
clean_body = re.sub(r'Modal[\s\S]*\/Modal', '', publication_dataset['body'])
clean_body = re.sub('btn btn--dark pdf-download', 'button', clean_body)
pub_page = LegacyPublicationPage(
title=publication_dataset['title'],
slug=slug,
hero_text=publication_dataset['description'],
raw_content=clean_body,
publication_type=publication_type
)
authors = []
author_names = publication_dataset["authors"]
for author_name in author_names:
internal_author_page_qs = TeamMemberPage.objects.filter(name=author_name)
if internal_author_page_qs:
author_obj = {"type": "internal_author", "value": internal_author_page_qs.first().pk}
else:
author_obj = {"type": "external_author", "value": {"name": author_name, "title": "", "photograph": None, "page": ""}}
authors.append(author_obj)
if authors:
pub_page.authors = json.dumps(authors)
publication_index_page.add_child(instance=pub_page)

self.stdout.write(self.style.SUCCESS('Successfully imported blogs.'))
if news_index_page is not None:
with open(options['news_file']) as news_file:
news_datasets = json.load(news_file)
for news_dataset in news_datasets:
slug = news_dataset['url'].split('/')[-2].replace("%e2%88%92", "-")
news_check = NewsStoryPage.objects.filter(slug=slug)
if not news_check and news_dataset['body'] != "":
news_page = NewsStoryPage(
title=news_dataset['title'],
slug=slug,
hero_text=news_dataset['description'],
body=json.dumps([{'type': 'paragraph_block', 'value': news_dataset['body']}]),
)
news_index_page.add_child(instance=news_page)
news_page.save_revision().publish()
news_page.first_published_at = pytz.utc.localize(datetime.datetime.strptime(news_dataset['date'], "%d %b %Y"))
news_page.save_revision().publish()
try:
Redirect.objects.create(
site=news_page.get_site(),
old_path="/post/{}".format(news_dataset['url'].split('/')[-2]),
redirect_page=news_page
)
except IntegrityError:
pass # Sometimes a post was simultaneously news and a blog. In these cases retain both but don't have two redirects
self.stdout.write(self.style.SUCCESS('Successfully imported news.'))
if publication_index_page is not None:
with open(options['pubs_file']) as pubs_file:
publication_datasets = json.load(pubs_file)
for publication_dataset in publication_datasets:
publication_type, _ = PublicationType.objects.get_or_create(name=publication_dataset['format'].split(";")[0])
slug = publication_dataset['url'].split('/')[-2]
pub_check = LegacyPublicationPage.objects.filter(slug=slug)
if not pub_check and publication_dataset['body'] != "":
clean_body = re.sub(r'Modal[\s\S]*\/Modal', '', publication_dataset['body'])
clean_body = re.sub('btn btn--dark pdf-download', 'button', clean_body)
pub_page = LegacyPublicationPage(
title=publication_dataset['title'],
slug=slug,
hero_text=publication_dataset['description'],
raw_content=clean_body,
publication_type=publication_type
)
authors = []
author_names = publication_dataset["authors"]
for author_name in author_names:
internal_author_page_qs = TeamMemberPage.objects.filter(name=author_name)
if internal_author_page_qs:
author_obj = {"type": "internal_author", "value": internal_author_page_qs.first().pk}
else:
author_obj = {"type": "external_author", "value": {"name": author_name, "title": "", "photograph": None, "page": ""}}
authors.append(author_obj)
if authors:
pub_page.authors = json.dumps(authors)
publication_index_page.add_child(instance=pub_page)
pub_page.save_revision().publish()
pub_page.published_date = pytz.utc.localize(datetime.datetime.strptime(publication_dataset['date'], "%d %b %Y"))
pub_page.save_revision().publish()
Redirect.objects.create(
site=pub_page.get_site(),
old_path="/post/{}".format(slug),
redirect_page=pub_page
)
self.stdout.write(self.style.SUCCESS('Successfully imported publications.'))
if event_index_page is not None:
with open(options['events_file']) as events_file:

FieldPanel('hero_text'),
FieldPanel('body'),
MultiFieldPanel([
FieldPanel('other_pages_heading'),
InlinePanel('other_pages', label='Related pages')
], heading='Other Pages/Related Links')
]
def is_filtering(self, request):
get = request.GET.get
return get('topic', None) or get('country', None) or get('source', None) or get('report', None)
def fetch_all_data(self):
return DatasetPage.objects.live().specific()
def fetch_filtered_data(self, context):
topic = context['selected_topic']
country = context['selected_country']
source = context['selected_source']
report = context['selected_report']
if topic:
datasets = DatasetPage.objects.live().specific().filter(dataset_topics__topic__slug=topic)
else:
datasets = self.fetch_all_data()
if country:
if 'all--' in country:
try:
region = re.search('all--(.*)', country).group(1)
datasets = datasets.filter(page_countries__country__region__name=region)
except AttributeError:
pass
else:
datasets = datasets.filter(page_countries__country__slug=country)
if source:
datasets = datasets.filter(dataset_sources__source__slug=source)
if report:
pubs = Page.objects.filter(
models.Q(publicationpage__publication_datasets__item__slug=report) |
models.Q(publicationsummarypage__publication_datasets__item__slug=report) |
models.Q(publicationappendixpage__publication_datasets__item__slug=report) |
models.Q(publicationchapterpage__publication_datasets__item__slug=report) |
models.Q(legacypublicationpage__publication_datasets__item__slug=report) |
models.Q(shortpublicationpage__publication_datasets__item__slug=report)
).first()
if (pubs and pubs.specific.publication_datasets):
filtered_datasets = Page.objects.none()
for dataset in pubs.specific.publication_datasets.all():
results = datasets.filter(slug__exact=dataset.dataset.slug)
if results:
filtered_datasets = filtered_datasets | results
datasets = filtered_datasets
else:
datasets = None
return datasets
def get_active_countries(self):
active_countries = []
datasets = DatasetPage.objects.all()
for dataset in datasets:
countries = dataset.page_countries.all()
for country in countries:
active_country = Country.objects.get(id=country.country_id)
if active_country not in active_countries:
active_countries.append(active_country)
return active_countries
def get_context(self, request, *args, **kwargs):
context = super(DataSetListing, self).get_context(request, *args, **kwargs)
page = request.GET.get('page', None)
context['selected_topic'] = request.GET.get('topic', None)
context['selected_country'] = request.GET.get('country', None)
context['selected_source'] = request.GET.get('source', None)
context['selected_report'] = request.GET.get('report', None)
if not self.is_filtering(request):
datasets = self.fetch_all_data()
is_filtered = False
else:
is_filtered = True
datasets = self.fetch_filtered_data(context)
datasets = datasets.order_by('-first_published_at') if datasets else []
context['is_filtered'] = is_filtered
paginator = Paginator(datasets, MAX_PAGE_SIZE)
try:
context['datasets'] = paginator.page(page)
except PageNotAnInteger:
context['datasets'] = paginator.page(1)
except EmptyPage:
context['datasets'] = paginator.page(paginator.num_pages)
context['paginator_range'] = get_paginator_range(paginator, context['datasets'])
context['topics'] = [page_orderable.topic for page_orderable in DatasetPageTopic.objects.all().order_by('topic__name') if page_orderable.page.live]
context['countries'] = self.get_active_countries()
context['sources'] = DataSource.objects.all()
context['reports'] = Page.objects.live().filter(
models.Q(publicationpage__publication_datasets__isnull=False) |
models.Q(publicationsummarypage__publication_datasets__isnull=False) |
models.Q(publicationappendixpage__publication_datasets__isnull=False) |
models.Q(legacypublicationpage__publication_datasets__isnull=False) |
models.Q(publicationchapterpage__publication_datasets__isnull=False) |
models.Q(shortpublicationpage__publication_datasets__isnull=False)
).distinct().order_by('title')
return context

I also found the following external resources that might be helpful:

Summaries of links found in the content:

https://github.com/devinit/DIwebsite-redesign/blob/develop/di_website/publications/models.py:

The page does not contain any content or code snippets relevant to the problem.

https://github.com/devinit/DIwebsite-redesign/blob/develop/di_website/publications/mixins.py:

The page does not contain any content or code snippets relevant to the problem.


Step 2: 🧐 Snippet Analysis

From looking through the relevant snippets, I decided to make the following modifications:

File Path Proposed Changes
di_website/publications/mixins.py In the PublicationPageSearchMixin class, update the index.SearchField to use an analyzer that tokenizes the text into individual terms. This can be done by passing an es_extra dict to the index.SearchField with the analyzer key set to the name of the new analyzer.

Step 3: 📝 Planning

I have created a plan for writing the pull request. I am now working my plan and coding the required changes to address this issue. Here is the planned pull request:

Fix search to only match whole words
sweep/fix-search-whole-words

Description

This PR addresses the issue of the search functionality in the PublicationPageSearchMixin class in di_website/publications/mixins.py matching partial words. The current implementation is using the default analyzer, which results in undesired matches. To fix this, the code has been modified to use a new analyzer that tokenizes the text into individual terms, each term corresponding to a word. This ensures that the search only returns results that match the whole words.

Summary of Changes

  • Updated the index.SearchField in the PublicationPageSearchMixin class in di_website/publications/mixins.py to use a new analyzer.
  • Added an es_extra dict to the index.SearchField with the analyzer key set to the name of the new analyzer.
  • The new analyzer tokenizes the text into individual terms, each term corresponding to a word, ensuring that the search only matches whole words.

Please review and merge this PR. Thank you!


Step 4: ⌨️ Coding

I have finished coding the issue. I am now reviewing it for completeness.


Step 5: 🔁 Code Review

Success! 🚀


To recreate the pull request, leave a comment prefixed with "sweep:" or edit the issue.
Join Our Discord

@stale
Copy link

stale bot commented Oct 15, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sweep Assigns Sweep to an issue or pull request. wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant