You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 4, 2022. It is now read-only.
We need to build new researches that can provide data for all assessments that require keyphrase matching in the tree. This excludes all assessments that require keyphrase matching in meta data (e.g., the title, slug, meta description etc.), since these won't operate on the new tree structure.
The user-facing behavior of all assessments should be identical to the pre-tree behavior. For information on the current functionality, see the SEO scoring overview.
This is the list of assessments for which we need custom researches:
Keyphrase in introduction
Keyphrase density
Keyphrase in subheading
Keyphrase in image alt attributes
Keyphrase distribution
Text competing link assessment
Base keyphraseResearch providing keyphrase matches in sentences
This base keyphrase research can serve as the data source for all other keyphrase-dependent researches. It runs on given leaf nodes (e.g., paragraphs, headings) and returns sentences with found keyphrases within these leaf nodes.
Specifically, it provides the following information:
References to the sentence object
this includes indices (required for markings)
References to the words matched
we need references to the individual words because in some cases we need to aggregate over sentences (e.g., for the keyphrase in distribution research)
Percentage of the keyphrase matched in the sentence
required to determine whether enough words of the keyphrase were used in the sentence to constitute a match
Example output:
Keyphrase: apple and banana
Text: "An apple an apple and a banana."
[
{
sentence: { Sentence Object "An apple an apple and a banana." },
matchesKeyphrase: {
apple: [
{ Word Object "apple" 1st instance },
{ Word Object "apple" 2nd instance },
],
banana:[
{ Word Object "banana" }
],
pear: []
},
matchesSynonyms: [ {
orange: [
{ Word Object "orange" 1st instance },
{ Word Object "orange" 2nd instance },
],
mango:[
{ Word Object "mango" }
]
},
],
percentWordMatchesKeyphrase: 100 (?)
percentWordMatchesSynonyms: [ 100, ... ] (?)
}
,
...
]
The matching mechanism can stay the same as the current implementation
mergeChildrenResults can use the default strategy
See findKeywordFormsInString.js for inspiration for e.g. how to calculate percentWordMatches.
Needs access to morphological forms
Researches assessments operating on leaf nodes
Keyphrase in introduction
Steps
Get base research results for 1st paragraph
Check whether there is at least 1 sentence with percentWordMatches: 100
If there is no sentence with percentWordMatches: 100: merge the matches of all sentence objects per keyphrase word
Check if all keyphrase words have at least one match
Needs to use aggregated keyphrase + synonym data.
Keyphrase density
Steps
Get base research result for whole text
For each sentence, caculate how many full keyphrase occurrences there are.
An occurrence is counted when all keywords of the key phrase are contained within the sentence.
A sentence can contain multiple key phrases (e.g., "The apple potato is an apple and a potato." has two occurrences of the key phrase "apple potato").
Return the number of occurrences.
See keywordCount.js for inspiration.
Keyphrase in subheading
Steps
Run the base research for each subheading.
Merge results for subheadings containing multiple sentences.
Calculate percentWordmatches for each subheading.
If it's a language with function word support, return the number of subheadings with 100% matches; if it's a language without function word support, return the number of subheadings with >50% matches.
Needs to use aggregated keyphrase + synonym data.
Keyphrase distribution
Adapt the functionality in keyphraseDistribution.js to run it on the data returned by the new keyphrase base research. This includes:
computing per-sentence score based on percentageWordMatches
determining continuous stretches of sentences with low per-sentences scores
Needs to use aggregated keyphrase + synonym data.
Researches for individual assessments operating on formatting elements
The assessments below require similar functionality:
Get a certain type of formatting element.
Check whether there is at least 1 formatting element containing all the content words from the keyphrase.
The base class outlined above assumes that we always split text into sentences and that we check the keyphrase matches per sentence. For the assessments operating on non-leaf nodes, a pragmatic solution is to create a paragraph node with the contents of the formatting elements and use the base research on this paragraph node.
Note: we either need to save a reference of the newly created paragraph data on the original formatting elements, or the other way around. It's necessary to maintain a reference between the original and the converted data, because the keyphrase research operates on the original data, but we want to return references to the original formatting elements in the results.
Keyphrase in image alt attributes
Steps
Get all images from image research. (to-do: make issue for image research)
Convert the alt tags of all images into paragraph nodes (see above).
Run keyphrase base research on the newly created paragraph nodes.
Return all images that have an alt tag with percentWordMatches = 100.
Convert the link text into paragraph nodes (see above).
Run keyphrase base research on the newly created paragraph nodes.
Return all links with percentWordMatches = 100.
Needs to use aggregated keyphrase + synonym data.
Keyphrase-synonym aggregator
We run the base research separately for the keyphrase and the associated synonyms. For the individual researches that also require synonyms, we need to aggregate this data. It's not necessary to know whether a match was a keyphrase or a synonym match, since we don't make a distinction between these two kinds of matches in the assessment results.
matches can be the combination of all matched synonym and keyphrase matches.
For percentWordMatches, the highest value can be used.
Example output:
Keyphrase: cat and dog
Synonym: canine and feline
Text: Here's a cat and another cat and a dog and a canine.
Output base research keyphrase:
[
{
sentence: { Sentence Object "Here's a cat and another cat and a dog and a canine." },
matches: {
{ Word Object "cat" 1st instance },
{ Word Object "cat" 2nd instance },
{ Word Object "dog" }
},
percentWordMatches: 100
}
]
Output base research synonym:
[
{
sentence: { Sentence Object "Here's a cat and another cat and a dog and a canine." },
matches: {
{ Word Object "canine" }
},
percentWordMatches: 50
}
]
Aggregated keyphrase & synonym data:
[
{
sentence: { Sentence Object "Here's a cat and another cat and a dog and a canine." },
matches: {
{ Word Object "cat" 1st instance },
{ Word Object "cat" 2nd instance },
{ Word Object "dog" }
{ Word Object "canine" }
},
percentWordMatches: 100
}
]
The text was updated successfully, but these errors were encountered:
Goals
Base
keyphraseResearch
providing keyphrase matches in sentencesThis base keyphrase research can serve as the data source for all other keyphrase-dependent researches. It runs on given leaf nodes (e.g., paragraphs, headings) and returns sentences with found keyphrases within these leaf nodes.
Specifically, it provides the following information:
Example output:
Keyphrase:
apple and banana
Text:
"An apple an apple and a banana."
mergeChildrenResults
can use the default strategyfindKeywordFormsInString.js
for inspiration for e.g. how to calculatepercentWordMatches
.Researches assessments operating on leaf nodes
Keyphrase in introduction
percentWordMatches: 100
percentWordMatches: 100
: merge the matches of all sentence objects per keyphrase wordKeyphrase density
keywordCount.js
for inspiration.Keyphrase in subheading
percentWordmatches
for each subheading.Keyphrase distribution
keyphraseDistribution.js
to run it on the data returned by the new keyphrase base research. This includes:percentageWordMatches
Researches for individual assessments operating on formatting elements
The assessments below require similar functionality:
Keyphrase in image alt attributes
percentWordMatches
= 100.Text competing link assessment
percentWordMatches
= 100.Keyphrase-synonym aggregator
matches
can be the combination of all matched synonym and keyphrase matches.percentWordMatches
, the highest value can be used.Example output:
Keyphrase:
cat and dog
Synonym:
canine and feline
Text:
Here's a cat and another cat and a dog and a canine.
Output base research keyphrase:
Output base research synonym:
Aggregated keyphrase & synonym data:
The text was updated successfully, but these errors were encountered: