Objective: Write an 8086 assembly program to find the similarity between two sentences.
Procedure:
- Read sentence S1 from S1.txt file and read sentence S2 from S2.txt file.
- Remove punctuation marks from both sentences.
- Make all characters as a small letters.
- Remove Stop words from both sentences. Stop Words are words, which do not contain important information. Use the following list of these words: [I, a, an, as, at, the, by, in, for, of, on, that]
- Remove the duplication of words from both sentences. In other words, each word will appear once per sentence.
- Calculate the similarity as the size of the intersection of words between the two processed sentences divided by the size of the union of the two processed sentences:
A value “0” means the two sentences are completely dissimilar, “1” that they are identical, and values
between 0 and 1 representing a degree of similarity.