Skip to content
Ned Taylor edited this page Apr 12, 2024 · 3 revisions
split(
   data=None,
   list=None,
   left_data,
   right_data,
   left_label,
   right_label,
   dim,
   left_size,
   right_size,
   shuffle=.false.,
   seed=0,
   split_list
)

The split interface offers a method of splitting a dataset into left and right sets (often used for splitting a dataset into separate training and testing datasets).

Split is an interface to multiple procedures, depending on the types, and whether input and label data is provided.

Arguments

  • data: An integer or real array of dimensions n (n=3 or 5). The input features dataset.
  • label: An integer or real array of dimension 1. The input dataset labels (expected output).
  • left_data: Output left split of data.
  • right_data: Output right split of data.
  • left_label: Output left split of label.
  • right_label: Output right split of label.
  • dim: Dimension along which to split data (i.e. the sample index dimension).
  • left_size: Fractional size of left data split. WARNING: only provide left_size or right_size. If both are provided and do not sum to 1, right_size will be readjusted to meet this criteria.
  • right_size: Fractional size of right data split. WARNING: only provide left_size or right_size. If both are provided and do not sum to 1, right_size will be readjusted to meet this criteria
  • shuffle: Boolean whether to shuffle dataset.
  • seed: An integer scalar. Random number generator seed. Default=0.
  • split_list: Optional: An integer list. An output list of length equal to the number of samples/records in the dataset. Each element contains either a 1 or a 2, referring to whether the original data has been put into the left_ or right_ storage, respectively.