Skip to content

Commit

Permalink
added condtructor function for transactions.
Browse files Browse the repository at this point in the history
  • Loading branch information
mhahsler committed May 17, 2021
1 parent 7d6cd0c commit 4567cba
Show file tree
Hide file tree
Showing 10 changed files with 93 additions and 42 deletions.
3 changes: 2 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@ export(
"discretize",
"discretizeDF",
"addAggregate",
"filterAggregate"
"filterAggregate",
"transactions"
)

exportClasses(
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# arules 1.6-7.1 (xx/xx/2021)

## New Feature
* transactions have now a constructor function called transactions().
* Added new method compatible to itemMatrix to check if the item coding is compatible
between two objects.
* c() now produces a warning if two itemMatrices with different itemCoding are combined.
Expand Down
10 changes: 9 additions & 1 deletion R/transactions.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,16 @@
## transaction data

##*****************************************************
## coercions
## constructor
transactions <- function(x, itemLabels = NULL, transactionInfo = NULL) {
trans <- as(x, "transactions")
if(!is.null(itemLabels)) trans <- recode(trans, itemLabels = itemLabels)
if(!is.null(transactionInfo)) transactionInfo(trans) <- transactionInfo
trans
}

##*****************************************************
## coercions
setAs("matrix", "transactions",
function(from)
new("transactions", as(from, "itemMatrix"),
Expand Down
2 changes: 1 addition & 1 deletion man/Adult.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ AdultUCI[[ "capital-loss"]] <- ordered(cut(AdultUCI[[ "capital-loss"]],
Inf)), labels = c("None", "Low", "High"))

## create transactions
Adult <- as(AdultUCI, "transactions")
Adult <- transactions(AdultUCI)
Adult

}
Expand Down
2 changes: 1 addition & 1 deletion man/Income.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ IncomeESL[["number of children"]] <- factor(
levels = 0 : 1 , labels = c("0", "1+"))

## creating transactions
Income <- as(IncomeESL, "transactions")
Income <- transactions(IncomeESL)
Income
}
\keyword{datasets}
28 changes: 23 additions & 5 deletions man/apriori.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ apriori(data, parameter = NULL, appearance = NULL, control = NULL)
}
\details{
\bold{Warning about automatic conversion of matrices or data.frames to transactions.}
It is preferred to coerce data to transactions manually before calling \code{apriori} to have control over item coding. This is especially important when you are working with multiple datasets or several subsets of the same dataset. To read about item coding, see
It is preferred to create transactions manually before calling \code{apriori} to have control over item coding. This is especially important when you are working with multiple datasets or several subsets of the same dataset. To read about item coding, see
\code{\link{itemCoding}}.

If a data.frame is specified as \code{x}, then the data is automatically converted
Expand Down Expand Up @@ -101,12 +101,30 @@ apriori(data, parameter = NULL, appearance = NULL, control = NULL)
}
\author{Michael Hahsler and Bettina Gruen}
\examples{
## Example 1: Create transaction data and mine association rules
a_list <- list(
c("a","b","c"),
c("a","b"),
c("a","b","d"),
c("c","e"),
c("a","b","d","e")
)
## Set transaction names
names(a_list) <- paste("Tr",c(1:5), sep = "")
a_list
## Use the constructor to create transactions
trans1 <- transactions(a_list)
trans1
rules <- apriori(trans1)
inspect(rules)
## Example 2: Mine association rules from an existing transactions dataset
## using different minimum support and minimum confidence thresholds
data("Adult")
## Note: Adult is alread a transactions dataset if you are using a data.frame then
## you should coerce it first to transactions using:
## yourTrans <- as(yourData, "transactions")
## Mine association rules.
rules <- apriori(Adult,
parameter = list(supp = 0.5, conf = 0.9, target = "rules"))
summary(rules)
Expand Down
78 changes: 49 additions & 29 deletions man/transactions-class.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,22 @@
}

\details{
Transactions are a direct extension of class
Transactions are a direct extension of class
\code{\linkS4class{itemMatrix}} to store a binary incidence
matrix, item labels, and optionally transaction IDs and user IDs.

Transactions can be created by coercion from lists
containing transactions, but also from matrix and data.frames.
However, you will need to prepare your data first (see coercion methods in the
Transactions can be created from a list
containing transactions or a matrix or data.frames
using
\itemize{
\item the constructor function \code{transactions(x, itemLabels = NULL, transactionInfo = NULL)}, or
\item S4 coercion with \code{as(x, "transactions")}).}

\code{itemLabels} and
\code{transactionInfo} are by default created from information in \code{x} (e.g., from row and column names). In the constructor function, the user can specify for itemLabels a vector of all possible item labels (character)
or another transactions object to copy the item coding (see \code{\link{itemCoding}} for details).

Note that you will need to prepare your data first (see coercion methods in the
Methods Section and the Example Section below for details on the needed format).

\bold{Continuous variables:} Association rule mining can only use items and does not work with continuous variables. Continuous variables need to be discretized first. An item resulting from discretization might be \emph{age>18} and the column contains only \code{TRUE} or \code{FALSE}. Alternatively it can be a factor with levels \emph{age<=18}, \emph{50=>age>18} and \emph{age>50}. These will be automatically converted into 3 items, one for each level. Have a look at the function \code{\link{discretize}} for automatic discretization.
Expand Down Expand Up @@ -160,7 +169,7 @@ See \code{\link{itemCoding}} to learn how to encode and recode transaction sets.
}
\author{Michael Hahsler}
\examples{
## example 1: creating transactions form a list
## Example 1: creating transactions form a list
a_list <- list(
c("a","b","c"),
c("a","b"),
Expand All @@ -169,18 +178,20 @@ a_list <- list(
c("a","b","d","e")
)

## set transaction names
## Set transaction names
names(a_list) <- paste("Tr",c(1:5), sep = "")
a_list

## coerce into transactions
trans1 <- as(a_list, "transactions")
## Use the constructor to create transactions
## Note: S4 coercion does the same trans1 <- as(a_list, "transactions")
trans1 <- transactions(a_list)
trans1

## analyze transactions
## Analyze the transactions
summary(trans1)
image(trans1)

## example 2: creating transactions from a matrix
## Example 2: creating transactions from a matrix
a_matrix <- matrix(c(
1,1,1,0,0,
1,1,0,0,0,
Expand All @@ -189,38 +200,41 @@ a_matrix <- matrix(c(
1,1,0,1,1
), ncol = 5)

## set dim names
dimnames(a_matrix) <- list(c("a","b","c","d","e"),
paste("Tr",c(1:5), sep = ""))
## Set item names (columns) and transaction labels (rows)
colnames(a_matrix) <- c("a","b","c","d","e")
rownames(a_matrix) <- paste("Tr",c(1:5), sep = "")

a_matrix

## coerce
trans2 <- as(a_matrix, "transactions")
## Create transactions
trans2 <- transactions(a_matrix)
trans2
inspect(trans2)

## example 3: creating transactions from data.frame
## Example 3: creating transactions from data.frame
a_df <- data.frame(
age = as.factor(c(6, 8, NA, 9, 16)),
grade = as.factor(c("A", "C", "F", NA, "C")),
pass = c(TRUE, TRUE, FALSE, TRUE, TRUE))
## note: factors are translated differently to logicals and NAs are ignored
## Note: factors are translated differently to logicals and NAs are ignored
a_df

## coerce
trans3 <- as(a_df, "transactions")
## Create transactions
trans3 <- transactions(a_df)
inspect(trans3)

## Note that coercing the transactions back to a data.frame does not recreate the
## original data.frame.
as(trans3, "data.frame")

## example 4: creating transactions from a data.frame with
## Example 4: creating transactions from a data.frame with
## transaction IDs and items (by converting it into a list of transactions first)
a_df3 <- data.frame(
TID = c(1,1,2,2,2,3),
item=c("a","b","a","b","c", "b")
)
a_df3
trans4 <- as(split(a_df3[,"item"], a_df3[,"TID"]), "transactions")
trans4 <- transactions(split(a_df3[,"item"], a_df3[,"TID"]))
trans4
inspect(trans4)

Expand All @@ -233,22 +247,28 @@ trans4 <- read.transactions(tmp, format = "single",
close(tmp)
inspect(trans4)

## example 5: create transactions from a dataset with numeric variables
## Example 5: create transactions from a dataset with numeric variables
## using discretization.
data(iris)

irisDisc <- discretizeDF(iris)
head(irisDisc)
trans5 <- as(irisDisc, "transactions")

trans5 <- transactions(irisDisc)
trans5
inspect(head(trans5))

## example 6: create transactions manually (with the same items as in trans5)
trans6 <- as(encode(list(
c("Sepal.Length=[4.3,5.4)", "Species=setosa"),
c("Sepal.Length=[4.3,5.4)", "Species=setosa")
), itemLabels = itemLabels(trans5)
), "transactions")
## Note, creating transactions without discretizing numeric variables will apply the
## default discretization and also create a warning.


## Example 6: create transactions manually (with the same item coding as in trans5)
trans6 <- transactions(
list(
c("Sepal.Length=[4.3,5.4)", "Species=setosa"),
c("Sepal.Length=[4.3,5.4)", "Species=setosa")
), itemLabels = trans5)
trans6

inspect(trans6)
}
Expand Down
2 changes: 1 addition & 1 deletion man/weclat.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ trans <- list(
)

## convert list to transactions
trans <- as(trans, "transactions")
trans <- transactions(trans)

## add weight information
transactionInfo(trans) <- data.frame(weights = c(5, 10, 6, 7, 5, 1))
Expand Down
3 changes: 3 additions & 0 deletions tests/testthat/test-transactions.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ expect_identical(data, as(trans, "list"))
expect_identical(transactionInfo(trans)$transactionID, names(data))
expect_identical(sort(itemInfo(trans)$labels), sort(unique(unique(unlist(data)))))

## test constructor
expect_identical(transactions(data), trans)

## combine
expect_equal(c(trans, trans), as(c(data, data),"transactions"))

Expand Down
6 changes: 3 additions & 3 deletions vignettes/arules.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -1139,6 +1139,7 @@ performing computations on the resulting associations (e.g., comparing or combin
\end{itemize}

The item coding is typically determined when data is coerced to transactions with
\code{transactions(x)} or
\code{as(x, "transactions")} and this process can lead to different item codings for
slightly different data sets. The methods \func{encode} and \func{recode} can be used to create and change the item coding to make the representation of transactions, itemsets and rules compatible. To check if two objects use the same item coding, method \func{compatible}
can be used.
Expand Down Expand Up @@ -1355,12 +1356,11 @@ AdultUCI[[ "capital-loss"]] <- ordered(cut(AdultUCI[[ "capital-loss"]],
labels = c("none", "low", "high"))
@

Now, the data can be coerced to
\class{transactions} resulting in
Now, the data can create transactions by using the constructor function. This results in
a binary incidence matrix appropriate for association rule mining.

<<coerce>>=
Adult <- as(AdultUCI, "transactions")
Adult <- transactions(AdultUCI)
Adult
@

Expand Down

0 comments on commit 4567cba

Please sign in to comment.