Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow functions to process arrays #185

Open
JohnBrinkman opened this issue Nov 5, 2024 · 0 comments
Open

Allow functions to process arrays #185

JohnBrinkman opened this issue Nov 5, 2024 · 0 comments

Comments

@JohnBrinkman
Copy link
Collaborator

JohnBrinkman commented Nov 5, 2024

We often have cases where we want to execute a function on each element of an array.

For example, if we execute this expression: [111, 222, 333] * 1.05 we get:

[
  116.55000000000001,
  233.10000000000002,
  349.65000000000003
]

and want to round the results.

There area a couple things that a user might try that will not work:

round([111, 222, 333] * 1.05) (results in a TypeError, since round() is not expecting an array.

([111, 222, 333] * 1.05).round(@) (same TypeError).

There are two ways we can round the values:

1. use map()

map() can be used to process each element of the array: [111, 222, 333] * 1.05 | map(@, &round(@, 2)) :

[
  116.55,
  233.1,
  349.65
]

2. Use a projection

If we turn the expression into a projection, we can apply the round() function to each element in the projection:

( [111, 222, 333] * 1.05 )[].round(@, 2)) will also produce: [116.55, 233.1, 349.65]

However, both of these are advanced usage -- not easily discovered.

Allow functions to accept arrays

To make this easier, we could allow round() to accept either a number or an array of numbers as its first parameter. When given a number, will produce a number result, when given an array, will produce an array result:

round([111, 222, 333] * 1.05, 2) or ([111, 222, 333] * 1.05).round(@, 2)

will produce: [1.12, 2.35, 3.46].

Consistency with Excel

Excel allows ranges of values to be passed to functions. e.g. The first parameter to round() can be a single cell or a range of cells.
In fact, Excel allows either (or both) parameters to be arrays. e.g. Excel supports the equivalent of:

round([1.111, 2.222, 3.333], 1) => [1.1, 2.2, 3.3]
round([1.111, 2.222, 3.333], [1,2,3]) => [1.1, 2.22, 3.333]
round(1.111, [1,2,3]) => [1.1, 1.11, 1.111]

Breaking change

Note that this would be a breaking change to json-formula, since it means we could no longer coerce values for many functions.
e.g. today the expression: round("7.2") will produce 7. We coerce the string "7.2" to a number, because we know the first parameter must be a number.

However, with this change, we don't know whether to coerce to a number or array, so the call will fail. A breaking change means we'd need to bump our version number to 2.0.0

Change Coercion Rules?

If we're introducing a breaking change, we could break in a different manner and loosen the rules about function parameter coercion.

Today we refuse to attempt coercion when there are multiple allowable parameter types.
That means that this call would fail: round(["7.2", "8.1"]), since the parameter is neither a number or array of numbers.

However, in this case we should be able to coerce the array of strings to an array of numbers since there is no available coercion to turn an array of strings into a number.

Changes to coercion rules will be proposed as a separate issue.

Consistency with operators

This change would be consistent with our array operators.
e.g. today we support

[1,2,3] + [2,3,4] => [3,5,7]
[1,2,3] + 3 => [4,5,6]

i.e. we allow scalar operands to match with array operands.

In regards to array operators, the spec states:

When both operands are arrays, a new array is returned where the elements are populated by applying the operator on each element of the left operand array with the corresponding element from the right operand array

If both operands are arrays and they do not have the same size, the shorter array is padded with null values

If one operand is an array and one is a scalar value, a new array is returned where the operator is applied with the scalar against each element in the array

For operands to functions the rules need to be stated differently, since there can be more than 2 operands:

When any operand is an array then:

  • All operands will be treated as arrays
  • Any scalar operands will be converted to an array by repeating the scalar value
  • All arrays will be padded to the length of the longest array by adding null values

Note that we would not extend this to unary operators.
e.g. ![true(), false()] returns false, not [false, true]

Note that while Excel supports ranges with the not() function, we propose that json-formula doesn't -- for the reason that we do not want to change the behavior of unary not (!) and do not want unary not and the function not() to be inconsistent. Excel doesn't have this issue, because it does not have unary operators.

Proposal

Where appropriate, change function definitions to allow them to process arrays of values instead of single values.
The affected functions include:

  • abs
  • acos
  • asin
  • atan2
  • casefold
  • ceil
  • codePoint
  • cos
  • datedif
  • day
  • endsWith
  • eomonth
  • exp
  • find
  • floor
  • fround
  • hour
  • log
  • log10
  • lower
  • millisecond
  • minute
  • mod
  • month
  • power
  • proper
  • rept
  • round
  • search
  • second
  • sign
  • sin
  • split
  • sqrt
  • startsWith
  • substitute
  • tan
  • trim
  • trunc
  • upper
  • weekday
  • year

unaffected functions

We don't want to change these categories of functions:

  • Functions that already take an array as a parameter. i.e. we're not looking to process arrays of arrays (avg, min, max, stdev, stdevp, join, length, sort, unique ...)

  • Functions that allow a variable number of parameters (merge, notNull ...)

  • logic functions (and, not, if, or)

  • functions without parameters (null, true, false, now, today, random...)

  • introspection/transformation functions (value, keys, values, hasProperty, entries, fromEntries, toArray, toDate, toNumber, toString, zip...)

  • and

  • avg

  • contains

  • datetime

  • debug

  • deepScan

  • entries

  • false

  • fromEntries

  • hasProperty

  • if

  • join

  • keys

  • left

  • length

  • map

  • max

  • merge

  • mid

  • min

  • not

  • now

  • notNull

  • null

  • or

  • random

  • reduce

  • register

  • replace

  • reverse

  • right

  • sort

  • sortBy

  • stdev

  • stdevp

  • sum

  • time

  • toArray

  • toDate

  • today

  • toNumber

  • toString

  • true

  • type

  • unique

  • value

  • values

  • zip

Change fromCodePoint()

The change to allowing arrays in functions highlights a separate change we'd prefer to make to fromCodePoint().
The fromCodePoint() function currently takes a single integer parameter to produce a character. e.g. fromCodePoint(65) => "A"
However the JavaScript version allows multiple parameters to create a string. e.g. String.fromCodePoint(65, 66, 67) => "ABC"

For our updated fromCodePoint() method, we should have the array version return a string instead of an array of characters.
i.e. fromCodePoint([65, 66, 67]) => "ABC"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant