uniquemask(≠) tests (#52)

* add basic uniquemask tests * add empty tests * add variation for shape * add shuffled data tests * add cross datatype tests * updates to cross data type tests * add comparision tollerance tests * add code coverage focused tests * add blog style docs on how to write tests * Update contributing.md * refactored how-to-add-tests.md * spelling errors * add model function docs to the how to write tests blog * fix formating * update contributing.md with howtowritetests blog url * new model for unique mask * use the util functions * add code coverage focused tests * use the same code coverage case for unique * add decision docs with the big cc explaination
Dyalog · Feb 14, 2024 · 6f53d3e · 6f53d3e
1 parent 245950d
commit 6f53d3e
Show file tree

Hide file tree

Showing 12 changed files with 443 additions and 41 deletions.
diff --git a/assets/unique-email.png b/assets/unique-email.png
diff --git a/contributing.md b/contributing.md
@@ -6,7 +6,7 @@ Every small change of refactoring, documentation, code commenting, questions, su
 
 ## Basic Setup
 
-To contribute, make sure your set up:
+To contribute, make sure you set up:
 
 - Your username + email
 - Your ~/.gitconfig
@@ -24,9 +24,9 @@ cd ullu
 git remote add REMOTE_NAME https://github.com/YOUR_GITHUB_USERNAME/ullu.git
 ```
 
-`REMOTE_NAME` is the name of your remote repository and could be any name you like, for example your first name.
+`REMOTE_NAME` is the name of your remote repository and could be any name you like, for example, your first name.
 
-`YOUR_GITHUB_USERNAME` is your user name on GitHub and should be part of your account path.
+`YOUR_GITHUB_USERNAME` is your username on GitHub and should be part of your account path.
 
 You can use `git remote -v` to check if the new remote is set up correctly.
 
@@ -38,8 +38,9 @@ Step 1. Create a new branch
 git checkout -b fix1
 ```
 
-Step 2. Make changes in relevant file(s)
-Step 3. Commit the changes:
+Step 2. Make changes in the relevant file(s)
+
+Step 3. Commit the changes
 
 ```
 git add FILE1 (FILE2 ...)
@@ -58,42 +59,36 @@ Step 5. Send the pull request
 git push REMOTE_NAME fix1
 ```
 
-The command will push the new branch fix1 into your remote repository REMOTE_NAME that you created earlier. Additionally, it will also display a link that you can click on to open the new pull request. After clicking on the link, write a title and a concise description then click the “Create” button.
+The command will push the new branch `fix1` into your remote repository `REMOTE_NAME` that you created earlier. Additionally, it will also display a link that you can click on to open the new pull request. After clicking on the link, write a title and a concise description then click the “Create” button.
 
-Yay you are now all set. ٩(ˊᗜˋ*)و
+Yay! Now, you are all set. ٩(ˊᗜˋ*)و
 
 ## Contributing Code
 
 ### Adding a primitive
 
 A primitive is a built-in function or operator which is a core part of the language. It is represented by a glyph, which it may share with another primitive. More information [here](https://aplwiki.com/wiki/Primitive)
 
-Ullu tests the primitives one-by-one covering all the code written in the sources of Dyalog APL, all possible cases including edge cases and all types of inputs it can receive. 
-
-<!-- how to initialize the test files -->
+Ullu tests the primitives one by one, covering all the code written in the sources of Dyalog APL, all possible cases, including edge cases, and all types of inputs it can receive. 
 
 <!-- demo for a primitive (blog) -->
-A workflow demonstration blog on how to write tests is upcomming. Progress can be tracked [here](https://github.com/Dyalog/ullu/issues/50)
-
-<!-- ### Adding a test -->
-
-<!-- types of test cases -->
+A workflow demonstration blog on how to write tests is present in [docs/how-to-add-tests.md](https://github.com/Dyalog/ullu/blob/main/docs/how-to-add-tests.md)
 
 ## Contributing Docs
 
 ### Decision docs
 
 <!-- what it is -->
-Decision Docs are records detailing key decisions, fostering transparency and aiding future collaboration by providing a structured account of the decision-making process. Documentation about why certain decisions were taken in the codebase, it basically explains the mindset of the developer writing the tests and it also documents all the anomalies in the codebase.
+Decision Docs are records for detailing key decisions, fostering transparency and aiding future collaboration by providing a structured account of the decision-making process. Documenting why certain decisions were taken in the codebase, they explain the mindset of the developer writing the tests and also help document any anomalies in the codebase.
 
 It can be found [here](https://github.com/Dyalog/ullu/tree/docs-revamp/docs/decision)
 
 <!-- how to write -->
-In the decision docs, you need to mention the types of test cases included in the tests, description of all the variations of the tests and all the edge cases that were faced/handled. It needs to have all the information that a person in the future would need to expand the same tests or write new related ones. All the decisions taken while writing the tests.
+In the decision docs, you need to mention the types of test cases included in the tests, a description of all the variations of the tests, and all the edge cases that were faced/handled. It needs to have all the information that a person in the future would need to expand on the same tests or write new related ones.
 
 <!-- example -->
-Decision docs for the primitive Magnitude are a good example of this. It can be found [here](https://github.com/Dyalog/ullu/blob/docs-revamp/docs/decision/primitive-functions/scalar-monadic.md#magnitude-rydocs)
+Decision docs for the primitive Magnitude are a good example of this. They can be found [here](https://github.com/Dyalog/ullu/blob/docs-revamp/docs/decision/primitive-functions/scalar-monadic.md#magnitude-rydocs)
 
 ---
 
-Note: By submitting a PR you agree to license your contribution under the ullu’s MIT [license](https://github.com/Dyalog/ullu/blob/main/LICENSE) unless explicitly noted otherwise.
+Note: By submitting a PR you agree to license your contribution under the ullu’s MIT [license](https://github.com/Dyalog/ullu/blob/main/LICENSE) unless explicitly noted otherwise.
diff --git a/docs/decision/primitive-functions/non-scalar-selection.md b/docs/decision/primitive-functions/non-scalar-selection.md
@@ -0,0 +1,43 @@
+# Non Scalar Selection functions
+
+## Unique (`R←∪Y`)([docs](https://help.dyalog.com/latest/#Language/Primitive%20Functions/Unique.htm))
+Same as unqiue mask below
+
+## Unique Mask (`R←≠Y`)([docs](https://help.dyalog.com/latest/#Language/Primitive%20Functions/Unique%20Mask.htm))
+
+Most of it is very similar to other tests so only documenting the different parts here. One thing that bugged me for a very long time was a switch case at `allos/src/same.c.html#L1311` in function `tolerant_nubsieve(void)` where the lines of code were not hit with a the normal cases.
+
+Exerpt from ROS's email:
+
+```
+This wasn’t easy to figure out ☹
+
+I think the important factors here are the leading shape (“s”) of the array, and the number of unique elements (“u”). This creates such a thing:
+
+s 2⍴?u⍴0
+
+(It assumes two identical random floats won’t be created, which is nearly but not quite safe to do.)
+
+Now, the cluster index seems to be a vector with a length s and range of values dependent on u.
+
+The grade-up index of the cluster index will also have a length s, but s unique values.
+
+ct is the element type of the cluster index, so its (squeezed) type is dependent on the number of unique values. It appears it can be Boolean or 1, 2 or 4 bytes (unsigned), encoded 1, 2, 3 or 4 (referred in big switch statement as APLBOOL, APLSINT, APLINTG and APLLONG, but that’s misleading because those are signed).
+
+gt is the element type of the grade-up index, so its squeezed type is dependent on the leading shape. It can be 1, 2 or 4 bytes (unsigned), encoded 2, 3 or 4 (APLSINT, APLINTG, APLLONG)
+
+This gives the possible combinations (ignore the colouring for now):
+```
+![unique email](../../../assets/unique-email.png)
+``` 
+However, there can’t be more unique elements than elements, so I think the red lines are impossible.
+
+Also, it appears that the generated cluster index doesn’t necessarily consist of the smallest element type – in particular, if gt is APLLONG then ct is always APLLONG too. That makes the orange lines impossible.
+
+You can get the remaining ones (those in black) by evaluating ≠s 2⍴?u⍴0, for the values u and s in the table.
+
+(You can also test the orange cases by squeezing the value you get back from cluster_index(), but that’s not something you can do in a “standard” interpreter.)
+
+Regards,
+Richard
+```
diff --git a/docs/decision/primitive-functions/non-scalar-selector.md b/docs/decision/primitive-functions/non-scalar-selector.md
@@ -5,7 +5,7 @@
 The tests include:
 - Datatype tests: tests for found and indexed/not-found variations for all the available datatypes
 - Cross-datatype tests: tests for found and indexed/not-found across datatypes, concatenating expressions and results to find any errors.
-- Tests based on Comparision tolerance(`⎕CT` & `⎕DCT`): tests to check if `d=d+1` on larger values of double, floating and complex numbers based on comparision tolerance values(default or 0).
+- Tests based on comparison tolerance(`⎕CT` & `⎕DCT`): tests to check if `d=d+1` on larger values of double, floating and complex numbers based on comparison tolerance values(default or 0).
 - Tests based on Floating point representation(`⎕FR`): All the tests run with values of `⎕FR` as 645 and 1287.
 - Separate tests for boolean values: Booleans need special tests because they only have 2 elements and since `i1` and `bool` have overlapping values.
 
@@ -23,7 +23,7 @@ Code Coverage report: NA
 The tests include:
 - Datatype tests: tests for found/not-found variations for all the available datatypes.
 - Cross-datatype tests: tests for found/not-found across datatypes, concatenating expressions and results to find any errors.
-- Tests based on Comparision tolerance(`⎕CT` & `⎕DCT`): tests to check if `d=d+1` on larger values of double, floating and complex numbers based on comparision tolerance values(default or 0).
+- Tests based on comparison tolerance(`⎕CT` & `⎕DCT`): tests to check if `d=d+1` on larger values of double, floating and complex numbers based on comparison tolerance values(default or 0).
 - Tests based on Floating point representation(`⎕FR`): All the tests run with values of `⎕FR` as 645 and 1287.
 - Separate tests for boolean values: Booleans need special tests because they only have 2 elements and since `i1` and `bool` have overlapping values.
 

diff --git a/docs/decision/primitive-functions/scalar-dyadic-arithmetic.md b/docs/decision/primitive-functions/scalar-dyadic-arithmetic.md
@@ -5,7 +5,7 @@
 The tests include:
 - Datatype tests: tests for positive and negative for all the available numeric datatypes
 - Tests based on Floating point representation(`⎕FR`): All the tests run with values of `⎕FR` as 645 and 1287.
-- Tests based on Comparison Tollerance(`⎕CT` and `⎕DCT`): All the tests run with default and zero values.
+- Tests based on Comparison tolerance(`⎕CT` and `⎕DCT`): All the tests run with default and zero values.
 - Edge Cases:
     - Separate cases had to be added for 0, ¯1, 0J0, 0.0
     - Separate cases had to be added to residue propogated using a scan to target certain sections of sources.

diff --git a/docs/how-to-add-tests.md b/docs/how-to-add-tests.md
@@ -0,0 +1,174 @@
+# How to add tests
+
+### Make a namespace
+
+Make a namespace titled the primitive being tested
+
+Eg: uniquemask
+
+```APL
+:Namespace uniquemask
+    ⍝...
+:EndNamespace
+```
+
+### Main function
+
+Start with making the main function titled `test_functionname` like `test_uniquemask`. Here `test_` is important because the `./unittest.apln` recognises the main function of the test suite of the primitive with the `test_` keyword.
+
+### Initialise variables
+
+Primitives depend on ⎕CT/⎕DCT, ⎕FR and ⎕IO, so all default values of these can be initialised:
+
+```APL
+ct_default←1E¯14
+dct_default←1E¯28
+fr_dbl←645
+fr_decf←1287
+io_default←1
+io_0←0
+```
+
+Then we need to get some specific data that we can manipulate to give us expected results to some testcases to logically/mathematically check the correct output. This is meant as a very basic fallback for testing with model functions fail.
+
+This can look something like this:
+
+This is an example from [unique mask](tests\uniquemask.apln) (≠)
+
+```APL
+⍝ All data generated is unique
+bool←0 1                                      ⍝ 11: 1 bit Boolean type arrays
+i1←¯60+⍳120                                   ⍝ 83: 8 bits signed integer
+char1←⎕UCS (100+⍳100)                         ⍝ 80: 8 bits character
+char2←⎕UCS (1000+⍳100)                        ⍝ 160: 16 bits character
+i2←{⍵,-⍵}10000+⍳100                           ⍝ 163: 16 bits signed integer
+char3←⎕UCS (100000+⍳100)                      ⍝ 320: 32 bits character
+i3←{⍵,-⍵}100000+⍳100                          ⍝ 323: 32 bits signed  integer
+ptr←(13↑⎕a) (13↓⎕a)                           ⍝ 326: Pointer (32-bit or 64-bit as appropriate)
+dbl←{⍵,-⍵}i3+0.1                              ⍝ 645: 64 bits Floating
+cmplx←{⍵,-⍵}(0J1×⍳100)+⌽⍳100                  ⍝ 1289: 128 bits Complex
+Hcmplx←{⍵,-⍵}(1E14J1E14×⍳20)                  ⍝ 1289 but larger numbers to test for CT value
+⍝ Hdbl is 645 but larger numbers to test for CT value
+⍝ intervals of 2 are chosen because CT for these numbers +1 and -1
+⍝ come under the region of tolerant equality
+Hdbl←{⍵,-⍵}1E14+(2×⍳50)
+
+⍝ This is needed for a case that can be hit if we have a lot of small numbers 
+⍝ which produce a hash collision
+⍝ Occurrence: same.c.html#L1153
+Sdbl←{⍵,-⍵}(⍳500)÷1000
+
+⍝ Hfl is 1287 but larger numbers to test for CT value
+⍝ far intervals are chosen for non overlap 
+⍝ with region of tolerant equality
+⎕FR←fr_decf
+fl←{⍵,-⍵}i3+0.01                              ⍝ 1287: 128 bits Decimal
+Hfl←{⍵,-⍵}2E29+(1E16×⍳10)
+⎕FR←fr_dbl
+```
+
+### Initialise test description
+
+Test description gives information about the `testID`, datatypes being tested on, the [test variaiton](todo: add link to variation section), and the different setting values.
+
+```APL
+testDesc←{'for ',case,{0∊⍴case2:'',⍵⋄' , ', case2,⍵},' & ⎕CT ⎕DCT:',⎕CT,⎕DCT, '& ⎕FR:', ⎕FR, '& ⎕IO:', ⎕IO}
+```
+
+### Testing functions
+
+#### `Assert`
+
+Assert is a function described in `./unittest.apln` that takes in a test expression that gives a boolean result and evaluates the output of the result and gives the instructions to pretty print the result based on the user settings of the test suite.
+
+#### `RunVariations`
+
+RunVariations is a function described in each test file which takes the expressions to be evaluated and does the following:
+- tests using the standard form it comes in
+- tests a scalar element from the data it gets
+- tests an empty array derived from the input
+- applies a different shape to the input and evaluates
+- creates a different shape that has a 0 in the shape of the input
+
+#### Model function
+
+A model function replicates the behavior of an existing function by employing alternative primitives or computational steps. Model functions are used to test outputs of tests that can give not very intuitively computable results. Model functions here try to use primitives that are least related to the primitive being tested(this is mainly related so that it can be easily pin pointed which primitive is failing because shared code can be difficult to deal with). Model functions look like:
+
+```APL
+    modelMagnitude←{⍵×(¯1@(∊∘0)(⍵>0))}
+```
+
+```APL
+    modelUnique←{0=≢⍵:⍵ ⋄ ↑,⊃{⍺,(∧/⍺≢¨⍵)/⍵}⍨/⌽⊂¨⊂⍤¯1⊢⍵}
+```
+
+### The tests
+
+All tests should run with all types of ⎕CT/⎕DCT, ⎕FR and ⎕IO values depending on which settings are implicit arguments of the primitive, ie. all of the settings that they depend on.
+```APL
+:For io :In io_default io_0
+    ⎕IO←io
+
+    :For ct :In 1 0 
+        (⎕CT ⎕DCT)←ct × ct_default dct_default ⍝ set comparison tolerance
+
+        :For fr :In fr_dbl fr_decf
+            ⍝ ...
+        :EndFor
+    :EndFor
+:EndFor
+```
+
+#### Types of tests
+
+The general structure followed with all tests is as follows:
+
+##### General tests
+
+General tests are tests that test information other than if the primitive gives the correct output. Some examples of uniquemask:
+
+- uniquemask cannot return a result that exceeds the number of elements of the input
+    ```APL
+    r,← 'TGen1' desc Assert (≢data)≥≢≠data
+    ```
+
+- datatype of the result will always be boolean in nature
+    ```APL
+    r,← 'TGen2' desc Assert 11≡⎕dr ≠data intertwine data ⍝ intertwine is a util function that intertwines the data like (1 1 1 1) intertwine (0 0 0 0) gives 1 0 1 0 1 0 1 0
+    ```
+
+##### Logical/mathematical tests
+
+These are tests that evaluate the result of the primitive with a very logical straightforward approach and try to depend on as few primitives as possible to reduce the number of false failures if the dependent primitives fail. Some examples of unique mask:
+
+- all elements of data are unique so the result would be all 1s
+    ```APL
+    r,← 'T1' desc RunVariations (1⍨¨data) data
+    ```
+
+- all elements are perfectly intertwined so the result would be 1 0 1 0 1 0...
+    ```APL
+    r,← 'T3' desc RunVariations ((1⍨¨data) intertwine (0⍨¨data)) (data intertwine data)
+    ```
+
+##### Cross datatype tests
+
+Cross data type tests deal with the primitive handling 2 datatypes at a time in the same input. Each datatype must be tested with every other datatype for a more accurate result.
+
+##### comparison tolerance tests
+
+Comparison tolerance tests deal with the primitive getting inputs which are believed to be in the tolerance range of numbers. The inputs are generally numbers slightly bigger and smaller than the original number that is treated to be equal at default ⎕CT and ⎕DCT values and should be treated differently when ⎕CT and ⎕DCT are zero.
+
+More information about comparison tolerance here: https://help.dyalog.com/latest/Content/Language/System%20Functions/ct.htm
+
+and here: https://www.dyalog.com/uploads/documents/Papers/tolerant_comparison/tolerant_comparison.htm
+
+##### Independent tests
+
+Independent tests are tests for special cases that either have optimisations in the sources or have a special need that cannot be covered in general data types and only work on certain specific values. For example:
+- The special case can be hit if we have two 8 bit int numbers in the input: a & b, and a is b-⎕CT. That means, that when we get to element b in the loop, we will find element a and hit the case.
+Occurrence: same.c.html#L1152
+    ```APL
+                d←i1[?≢i1]
+                r,←'TCTI1' desc Assert (1 0)≡(≠ (d-({fr-1:⎕dct⋄⎕ct}⍬)) d)
+    ```
diff --git a/tests/floor.apln b/tests/floor.apln
@@ -95,7 +95,7 @@
         testDesc←{'for ',case,{0∊⍴case2:'',⍵⋄' , ', case2,⍵},' & ⎕CT ⎕DCT:',⎕CT,⎕DCT, '& ⎕FR:', ⎕FR}
 
         :For ct :In 0 1
-            (⎕CT ⎕DCT)←ct × ct_default dct_default ⍝ set comparision tolerance
+            (⎕CT ⎕DCT)←ct × ct_default dct_default ⍝ set comparison tolerance
             :For fr :In 2 1
                 ⎕FR←fr⊃fr_dbl fr_decf ⍝ set type of floating-point computations
                 :For case :In 'zero' 'bool' 'i1' 'i2' 'i3' 'dbl' 'fl' 'Hdbl' 'Hfl'
@@ -124,7 +124,7 @@
                         r,← 'TDbl' desc RunVariations ({(⍵-0.5),0,-0.5+⍵}(halfLen↑data)) data   ⍝ floor of dbl is removing the 0.5 from the number
                     :EndIf
 
-                    ⍝ tests with comparision tolerance
+                    ⍝ tests with comparison tolerance
                     d1←data[?≢data]
                     almostd1←d1×1-fr⊃1E¯2×ct_default dct_default ⍝ infinitesimally close to d1 but smaller
                     :If ct ⍝ tolerant
@@ -174,7 +174,7 @@
         testDesc←{'for ',case,' & ⎕CT ⎕DCT:',⎕CT,⎕DCT, '& ⎕FR:', ⎕FR}
 
         :For ct :In 0 1
-            (⎕CT ⎕DCT)←ct × ct_default dct_default ⍝ set comparision tolerance
+            (⎕CT ⎕DCT)←ct × ct_default dct_default ⍝ set comparison tolerance
             :For fr :In 1 2
                 ⎕FR←fr⊃fr_dbl fr_decf
                 :For case :In 'cmplx' 'Hcmplx'

diff --git a/tests/indexof.apln b/tests/indexof.apln
@@ -75,7 +75,7 @@
             ⎕IO←io
 
             :For ct :In 1 0 
-                (⎕CT ⎕DCT)←ct × ct_default dct_default ⍝ set comparision tolerance
+                (⎕CT ⎕DCT)←ct × ct_default dct_default ⍝ set comparison tolerance
 
                 :For fr :In fr_dbl fr_decf
                     ⎕FR←fr