Skip to content

Commit

Permalink
[#1118] Remove authorship tag validation regex (#1857)
Browse files Browse the repository at this point in the history
GitHub's IDs have certain requirements to be valid, but Git does not
have such requirements. This is evident in .gitconfig files, where 
the username can be set to anything the user may like.

The current validation regex is based off GitHub's ID requirement. 
However, we are currently relying on Git author names found in 
.gitconfig files to attribute code authorship. This may result in 
missing code authorship attribution, when the Git author names do 
not match GitHub's ID requirements.

Let's remove this restriction to allow any valid author names to be
accepted and properly attributed.
  • Loading branch information
sikai00 authored Jan 22, 2023
1 parent 7dccf1f commit c6a116b
Show file tree
Hide file tree
Showing 10 changed files with 84 additions and 57 deletions.
13 changes: 6 additions & 7 deletions docs/ug/usingAuthorTags.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
If you want to override the code authorship deduced by RepoSense (which is based on Git blame/log data), you can use `@@author` tags to specify certain code segments that should be credited to a certain author irrespective of git history. An example scenario where this is useful is when a method was originally written by one author but a second author did some minor refactoring to it; in this case, RepoSense might attribute the code to the second author while you may want to attribute the code to the first author.

There are 2 types of `@@author` tags:
- Start Tags (format: `@@author AUTHOR_GITHUB_ID`): A start tag indicates the start of a code segment written by the author identified by the `AUTHOR_GITHUB_ID`.
- End Tags (format: `@@author`): Optional. An end tag indicates the end of a code segment written by the author identified by the `AUTHOR_GITHUB_ID` of the start tag.
- Start Tags (format: `@@author AUTHOR_GIT_AUTHOR_NAME`): A start tag indicates the start of a code segment written by the author identified by the `AUTHOR_GIT_AUTHOR_NAME`.
- End Tags (format: `@@author`): Optional. An end tag indicates the end of a code segment written by the author identified by the `AUTHOR_GIT_AUTHOR_NAME` of the start tag.

<box type="info" seamless>

Expand All @@ -28,7 +28,7 @@ If an end tag is not provided, the code till the next start tag (or the end of t
If an end tag is provided without a corresponding start tag, the code until the next start tag, the next end tag, or the end of the file, will not be attributed to any author. This should only be used if the code should not be attributed to any author.
</box>

The `@@author` tags should be enclosed within a comment, using the comment syntax of the file in concern. Below are some examples:
The `@@author` tags should be enclosed within a single-line comment, using the comment syntax of the file in concern. Below are some examples:

![author tags](../images/add-author-tags.png)

Expand All @@ -46,11 +46,10 @@ Currently, the following comment formats are supported:

<box type="info" seamless>

First, RepoSense checks whether the line matches the supported comment formats. If the line does not match the formats,
RepoSense treats it as a normal line. Else, it continues to check whether the GitHub username is in valid format.
RepoSense checks whether the line matches the supported comment formats. If the line does not match the formats,
RepoSense treats it as a normal line.

If the username is valid, the code till the next start tag, the end tag, or the end of file will be attributed to that author.
Otherwise, the code will not be attributed to any author.
The code until the next start tag, the end tag, or the end of file will be attributed to that author.
</box>

Note: Remember to **commit** the files after the changes. (reason: RepoSense can see committed code only)
Expand Down
22 changes: 12 additions & 10 deletions src/main/java/reposense/authorship/analyzer/AnnotatorAnalyzer.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,14 @@
*/
public class AnnotatorAnalyzer {
private static final String AUTHOR_TAG = "@@author";
// GitHub username format
private static final String REGEX_AUTHOR_NAME_FORMAT = "^[a-zA-Z0-9](?:[a-zA-Z0-9]|-(?=[a-zA-Z0-9])){0,38}$";
private static final Pattern PATTERN_AUTHOR_NAME_FORMAT = Pattern.compile(REGEX_AUTHOR_NAME_FORMAT);
private static final String REGEX_AUTHOR_TAG_FORMAT = "@@author(\\s+[^\\s]+)?";
private static final String REGEX_AUTHOR_TAG_FORMAT = "@@author(\\s+.*)?";

private static final String[][] COMMENT_FORMATS = {
{"//", "\\s"},
{"//", null},
{"/\\*", "\\*/"},
{"#", "\\s"},
{"#", null},
{"<!--", "-->"},
{"%", "\\s"},
{"%", null},
{"\\[.*]:\\s*#\\s*\\(", "\\)"},
{"<!---", "--->"}
};
Expand Down Expand Up @@ -106,18 +103,23 @@ public static Optional<String> extractAuthorName(String line) {
.map(l -> l.split(AUTHOR_TAG))
.filter(array -> array.length >= 2)
// separates by end-comment format to obtain the author's name at the zeroth index
.map(array -> array[1].trim().split(COMMENT_FORMATS[getCommentTypeIndex(line)][1]))
.map(array -> COMMENT_FORMATS[getCommentTypeIndex(line)][1] != null
? array[1].trim().split(COMMENT_FORMATS[getCommentTypeIndex(line)][1])
: new String[]{ array[1].trim() })
.filter(array -> array.length > 0)
.map(array -> array[0].trim())
// checks if the author name is valid
.filter(trimmedParameters -> PATTERN_AUTHOR_NAME_FORMAT.matcher(trimmedParameters).find());
// checks if the author name is not empty
.filter(trimmedParameters -> !trimmedParameters.isEmpty());
}

/**
* Generates regex for valid comment formats in which author tag is found, with {@code REGEX_AUTHOR_TAG_FORMAT}
* flanked by {@code commentStart} and {@code commentEnd}.
*/
private static String generateCommentRegex(String commentStart, String commentEnd) {
if (commentEnd == null) {
return "^[\\s]*" + commentStart + "[\\s]*" + REGEX_AUTHOR_TAG_FORMAT + "[\\s]*$";
}
return "^[\\s]*" + commentStart + "[\\s]*" + REGEX_AUTHOR_TAG_FORMAT + "[\\s]*(" + commentEnd + ")?[\\s]*$";
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1 @@
[{"path":"README.md","fileType":"md","lines":[{"lineNumber":1,"author":{"gitId":"eugenepeh"},"content":"This is a test repository for [RepoSense](https://github.com/reposense/RepoSense)."}],"authorContributionMap":{"eugenepeh":1}},{"path":"_reposense/config.json","fileType":"json","lines":[{"lineNumber":1,"author":{"gitId":"Eugene Peh"},"content":"{"},{"lineNumber":2,"author":{"gitId":"Eugene Peh"},"content":" \"ignoreGlobList\": [\"about-us/**\", \"**index.html\"],"},{"lineNumber":3,"author":{"gitId":"Eugene Peh"},"content":" \"formats\": [\"html\", \"css\"],"},{"lineNumber":4,"author":{"gitId":"FH-30"},"content":" \"ignoreCommitsList\": [\"\", \"67890def\"],"},{"lineNumber":5,"author":{"gitId":"Eugene Peh"},"content":" \"authors\":"},{"lineNumber":6,"author":{"gitId":"Eugene Peh"},"content":" ["},{"lineNumber":7,"author":{"gitId":"Eugene Peh"},"content":" {"},{"lineNumber":8,"author":{"gitId":"Eugene Peh"},"content":" \"githubId\": \"alice\","},{"lineNumber":9,"author":{"gitId":"Eugene Peh"},"content":" \"displayName\": \"Alice T.\","},{"lineNumber":10,"author":{"gitId":"Eugene Peh"},"content":" \"authorNames\": [\"AT\", \"A\"],"},{"lineNumber":11,"author":{"gitId":"Eugene Peh"},"content":" \"ignoreGlobList\": [\"**.css\"]"},{"lineNumber":12,"author":{"gitId":"Eugene Peh"},"content":" },"},{"lineNumber":13,"author":{"gitId":"Eugene Peh"},"content":" {"},{"lineNumber":14,"author":{"gitId":"Eugene Peh"},"content":" \"githubId\": \"bob\""},{"lineNumber":15,"author":{"gitId":"Eugene Peh"},"content":" }"},{"lineNumber":16,"author":{"gitId":"Eugene Peh"},"content":" ]"},{"lineNumber":17,"author":{"gitId":"Eugene Peh"},"content":"}"}],"authorContributionMap":{"FH-30":1,"Eugene Peh":16}},{"path":"annotationTest.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"fakeAuthor"},"content":"fake all the lines in this file is writtened by fakeAuthor"},{"lineNumber":2,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":3,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":4,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":5,"author":{"gitId":"harryggg"},"content":"//@@author harryggg"},{"lineNumber":6,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":7,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":8,"author":{"gitId":"harryggg"},"content":"line 3"},{"lineNumber":9,"author":{"gitId":"harryggg"},"content":"//@@author"},{"lineNumber":10,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":11,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":12,"author":{"gitId":"-"},"content":"//@@author -invalidGitUsername_TreatedAsUnknownUser"},{"lineNumber":13,"author":{"gitId":"-"},"content":"unknown"},{"lineNumber":14,"author":{"gitId":"-"},"content":"System.out.println(\"//@@author invalidAuthorLineFormat\"); unknown"},{"lineNumber":15,"author":{"gitId":"-"},"content":"unknown"},{"lineNumber":16,"author":{"gitId":"-"},"content":"//@@author"},{"lineNumber":17,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":18,"author":{"gitId":"fakeAuthor"},"content":"//@@author harryggg invalidAuthorLineFormat"},{"lineNumber":19,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":20,"author":{"gitId":"-"},"content":"//@@author"},{"lineNumber":21,"author":{"gitId":"-"},"content":"unknown"},{"lineNumber":22,"author":{"gitId":"-"},"content":"unknown"}],"authorContributionMap":{"fakeAuthor":9,"harryggg":5,"-":8}},{"path":"blameTest.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":3,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":4,"author":{"gitId":"harryggg"},"content":"line 3"}],"authorContributionMap":{"fakeAuthor":1,"harryggg":3}},{"path":"newFile.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"}],"authorContributionMap":{"harryggg":2}},{"path":"newPos/movedFile.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":3,"author":{"gitId":"harryggg"},"content":"line 3"},{"lineNumber":4,"author":{"gitId":"harryggg"},"content":"line 4"}],"authorContributionMap":{"harryggg":4}},{"path":"space test.txt","fileType":"txt","lines":[{"lineNumber":1,"author":{"gitId":"chan-j-d"},"content":"1"}],"authorContributionMap":{"chan-j-d":1}}]
[{"path":"README.md","fileType":"md","lines":[{"lineNumber":1,"author":{"gitId":"eugenepeh"},"content":"This is a test repository for [RepoSense](https://github.com/reposense/RepoSense)."}],"authorContributionMap":{"eugenepeh":1}},{"path":"_reposense/config.json","fileType":"json","lines":[{"lineNumber":1,"author":{"gitId":"Eugene Peh"},"content":"{"},{"lineNumber":2,"author":{"gitId":"Eugene Peh"},"content":" \"ignoreGlobList\": [\"about-us/**\", \"**index.html\"],"},{"lineNumber":3,"author":{"gitId":"Eugene Peh"},"content":" \"formats\": [\"html\", \"css\"],"},{"lineNumber":4,"author":{"gitId":"FH-30"},"content":" \"ignoreCommitsList\": [\"\", \"67890def\"],"},{"lineNumber":5,"author":{"gitId":"Eugene Peh"},"content":" \"authors\":"},{"lineNumber":6,"author":{"gitId":"Eugene Peh"},"content":" ["},{"lineNumber":7,"author":{"gitId":"Eugene Peh"},"content":" {"},{"lineNumber":8,"author":{"gitId":"Eugene Peh"},"content":" \"githubId\": \"alice\","},{"lineNumber":9,"author":{"gitId":"Eugene Peh"},"content":" \"displayName\": \"Alice T.\","},{"lineNumber":10,"author":{"gitId":"Eugene Peh"},"content":" \"authorNames\": [\"AT\", \"A\"],"},{"lineNumber":11,"author":{"gitId":"Eugene Peh"},"content":" \"ignoreGlobList\": [\"**.css\"]"},{"lineNumber":12,"author":{"gitId":"Eugene Peh"},"content":" },"},{"lineNumber":13,"author":{"gitId":"Eugene Peh"},"content":" {"},{"lineNumber":14,"author":{"gitId":"Eugene Peh"},"content":" \"githubId\": \"bob\""},{"lineNumber":15,"author":{"gitId":"Eugene Peh"},"content":" }"},{"lineNumber":16,"author":{"gitId":"Eugene Peh"},"content":" ]"},{"lineNumber":17,"author":{"gitId":"Eugene Peh"},"content":"}"}],"authorContributionMap":{"FH-30":1,"Eugene Peh":16}},{"path":"annotationTest.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"fakeAuthor"},"content":"fake all the lines in this file is writtened by fakeAuthor"},{"lineNumber":2,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":3,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":4,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":5,"author":{"gitId":"harryggg"},"content":"//@@author harryggg"},{"lineNumber":6,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":7,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":8,"author":{"gitId":"harryggg"},"content":"line 3"},{"lineNumber":9,"author":{"gitId":"harryggg"},"content":"//@@author"},{"lineNumber":10,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":11,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":12,"author":{"gitId":"-invalidGitUsername_TreatedAsUnknownUser"},"content":"//@@author -invalidGitUsername_TreatedAsUnknownUser"},{"lineNumber":13,"author":{"gitId":"-invalidGitUsername_TreatedAsUnknownUser"},"content":"unknown"},{"lineNumber":14,"author":{"gitId":"-invalidGitUsername_TreatedAsUnknownUser"},"content":"System.out.println(\"//@@author invalidAuthorLineFormat\"); unknown"},{"lineNumber":15,"author":{"gitId":"-invalidGitUsername_TreatedAsUnknownUser"},"content":"unknown"},{"lineNumber":16,"author":{"gitId":"-invalidGitUsername_TreatedAsUnknownUser"},"content":"//@@author"},{"lineNumber":17,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":18,"author":{"gitId":"harryggg invalidAuthorLineFormat"},"content":"//@@author harryggg invalidAuthorLineFormat"},{"lineNumber":19,"author":{"gitId":"harryggg invalidAuthorLineFormat"},"content":"fake"},{"lineNumber":20,"author":{"gitId":"harryggg invalidAuthorLineFormat"},"content":"//@@author"},{"lineNumber":21,"author":{"gitId":"fakeAuthor"},"content":"unknown"},{"lineNumber":22,"author":{"gitId":"fakeAuthor"},"content":"unknown"}],"authorContributionMap":{"fakeAuthor":9,"-invalidGitUsername_TreatedAsUnknownUser":5,"harryggg":5,"harryggg invalidAuthorLineFormat":3}},{"path":"blameTest.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":3,"author":{"gitId":"fakeAuthor"},"content":"fake"},{"lineNumber":4,"author":{"gitId":"harryggg"},"content":"line 3"}],"authorContributionMap":{"fakeAuthor":1,"harryggg":3}},{"path":"newFile.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"}],"authorContributionMap":{"harryggg":2}},{"path":"newPos/movedFile.java","fileType":"java","lines":[{"lineNumber":1,"author":{"gitId":"harryggg"},"content":"line 1"},{"lineNumber":2,"author":{"gitId":"harryggg"},"content":"line 2"},{"lineNumber":3,"author":{"gitId":"harryggg"},"content":"line 3"},{"lineNumber":4,"author":{"gitId":"harryggg"},"content":"line 4"}],"authorContributionMap":{"harryggg":4}},{"path":"space test.txt","fileType":"txt","lines":[{"lineNumber":1,"author":{"gitId":"chan-j-d"},"content":"1"}],"authorContributionMap":{"chan-j-d":1}}]
Loading

0 comments on commit c6a116b

Please sign in to comment.