Skip to content

Commit

Permalink
Correct html cleanup functions
Browse files Browse the repository at this point in the history
Somehow a bunch of the cleanup functions are return nil probably due to
some debuggin that got stuck.

This caused Wordpress responses not have a body, which is now fixed.

Resolves #31
  • Loading branch information
cguess committed May 27, 2020
1 parent 6ffbfda commit 0b58fb9
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 9 deletions.
1 change: 0 additions & 1 deletion app/controllers/articles_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,6 @@ def clean_up_response(articles)
article["body"] = scrubScriptTagsFromHTMLString article["body"]
article["body"] = scrubJSCommentsFromHTMLString article["body"]
article["body"] = scrubSpecialCharactersFromSingleLinesInHTMLString article["body"]
article["body"] = scrubHTMLSpecialCharactersInHTMLString article["body"]
article["headline"] = HTMLEntities.new.decode(article["headline"])
end

Expand Down
14 changes: 10 additions & 4 deletions app/models/cms.rb
Original file line number Diff line number Diff line change
Expand Up @@ -405,10 +405,12 @@ def self.scrubWordpressTagsFromHTMLString(html_string) # rubocop:disable Naming/
html_string
end

def self.scrubCDataTags(html_string) # rubocop:disable Naming/MethodName
# scrubbed = html_string.gsub("// <![CDATA[", "")
# scrubbed = scrubbed.gsub("// ]]", "")
end
# For some reason this is commented out, I'm going to comment the whole thing, and if it breaks
# stuff we'll get errors at least
# def self.scrubCDataTags(html_string) # rubocop:disable Naming/MethodName
# # scrubbed = html_string.gsub("// <![CDATA[", "")
# # scrubbed = scrubbed.gsub("// ]]", "")
# end

# \/\/.+
def self.scrubJSCommentsFromHTMLString(html_string) # rubocop:disable Naming/MethodName
Expand All @@ -421,8 +423,12 @@ def self.scrubSpecialCharactersFromSingleLinesInHTMLString(html_string) # ruboco
scrubbed
end

# For some reason this is commented out, I'm going to comment the whole thing, and if it breaks
# stuff we'll get errors at least
def self.scrubHTMLSpecialCharactersInHTMLString(html_string) # rubocop:disable Naming/MethodName
# scrubbed = html_string.gsub(/^&[a-z0-9]+;/, "")
# scrubbed
html_string
end

def self.scrubScriptTagsFromHTMLString(html_string) # rubocop:disable Naming/MethodName
Expand Down
4 changes: 3 additions & 1 deletion app/models/joomla_occrp.rb
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,9 @@ def self.language_parameter(language)

def self.clean_up_for_wordpress(articles)
articles.each do |article|
article["body"] = scrubCDataTags article["body"]
# This is being commented out for archive purposes instead of deleting it. Please try
# Joomla before deleting fully
# article["body"] = scrubCDataTags article["body"]
article["body"] = scrubScriptTagsFromHTMLString article["body"]
article["body"] = scrubWordpressTagsFromHTMLString article["body"]
# article['body'] = cleanUpNewLines article['body']
Expand Down
4 changes: 1 addition & 3 deletions app/models/wordpress.rb
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ def self.get_url(path, language, options = {})

def self.make_request(url)
logger.debug("Making request to #{url}")
response = HTTParty.get(CGI.encode(url))
response = HTTParty.get(url)

begin
body = JSON.parse response.body
Expand Down Expand Up @@ -202,13 +202,11 @@ def self.language_parameter(language)

def self.clean_up_for_wordpress(articles)
articles.each do |article|
article["body"] = scrubCDataTags article["body"]
article["body"] = scrubScriptTagsFromHTMLString article["body"]
article["body"] = scrubWordpressTagsFromHTMLString article["body"]
# article['body'] = cleanUpNewLines article['body']
article["body"] = scrubJSCommentsFromHTMLString article["body"]
article["body"] = scrubSpecialCharactersFromSingleLinesInHTMLString article["body"]
article["body"] = scrubHTMLSpecialCharactersInHTMLString article["body"]
article["body"] = normalizeSpacing article["body"]
article["body"] = handle_paragraph_tags article["body"]

Expand Down

0 comments on commit 0b58fb9

Please sign in to comment.