Title: | Visualization and Clustering of Data in a Shiny App |
---|---|
Description: | Various visualisations of univariate and multivariate graphs (e.g. mosaic diagram, scatterplot matrix, Andrews curves, parallel coordinate diagram, radar diagram and Chernoff plots) as well as clustering methods (e.g. k-means, agglomerative, EM clustering and DBSCAN) are implemented as a Shiny app. The app allows interactive changes, e.g. of the order of variables. It is intended for use in teaching. |
Authors: | Sigbert Klinke [aut, cre] |
Maintainer: | Sigbert Klinke <[email protected]> |
License: | GPL-3 |
Version: | 0.2.0 |
Built: | 2025-02-02 03:26:05 UTC |
Source: | https://github.com/sigbertklinke/smvgraph |
Andrews curves for visualization of multidimensional data.
step
determines the number of line segments for each curve.
If ymax==NA
then the maximum y coordinate will be determined from the curves.
Note that for type==3
the x range is otherwise
.
Observations containing
NA
, Nan
, -Inf
, or +Inf
will be deleted before plotting
andrews(x, type = 1, step = 100, ..., normalize = 1, ymax = NA)
andrews(x, type = 1, step = 100, ..., normalize = 1, ymax = NA)
x |
data frame or matrix |
type |
type of curve (default:
|
step |
smoothness of curves |
... |
further parameters given to graphics::plot and graphics::lines |
normalize |
integer: normalization method (default:
|
ymax |
numeric: maximum of y coordinate (default: |
nothing
Andrews, D. F. (1972) Plots of High-Dimensional Data. Biometrics, vol. 28, no. 1, pp. 125-136.
Khattree, R., Naik, D. N. (2002) Andrews Plots for Multivariate Data: Some New Suggestions and Applications. Journal of Statistical Planning and Inference, vol. 100, no. 2, pp. 411-425.
andrews(iris[,-5], col=as.factor(iris[,5])) andrews(iris[,-5], type=4, col=as.factor(iris[,5]), ymax=2)
andrews(iris[,-5], col=as.factor(iris[,5])) andrews(iris[,-5], type=4, col=as.factor(iris[,5]), ymax=2)
Create a parameter list or a function call. For a function call fun
must be explicitly given.
as_param(..., fun = NULL) txt(x)
as_param(..., fun = NULL) txt(x)
... |
list of named and unnamed parameters |
fun |
character: |
x |
character: replaces |
a character as parameter list of function call
as_param(letters[1:5]) as_param(txt(letters[1:5])) as_param(a=txt("a")) as_param(txt(letters[1:5]), fun="c")
as_param(letters[1:5]) as_param(txt(letters[1:5])) as_param(a=txt("a")) as_param(txt(letters[1:5]), fun="c")
Returns a data frame with columns about the available plots in smvgraph
:
module
: the internal name used. If you want to call the Shiny app then you might need this.
label
: the label used in the Shiny app
help
: the R help topic for the plot
packages
: packages which are required to make the plot
code
: if code block exists, should always be TRUE
ui
: if plot specific interactive UI elements exists
condition
: the condition when a plot is offered in the Shiny app to the user
availablePlots()
availablePlots()
To understand condition
:
nrow(analysis)
: the number of variables in "Analysis" field
nrow(group)
: the number of variables in "Grouping by" field
xxx$unique
: the number of unique values in a variable, for other elements then unique
see the "Variable" panel of the Shiny app
13
was choosen because twelve has the largest number of divisors below 20 and 43
was choosen because forty-two is the answer of the
ultimate question ;)
a data frame with information about all available plots
availablePlots()
availablePlots()
A non-ggplot2 bagplot based on mrfDepth::bagplot.
bagplot2( x, y = NULL, colorbag = NULL, colorloop = NULL, colorchull = NULL, databag = TRUE, dataloop = TRUE, plot.fence = FALSE, type = "hdepth", sizesubset = 500, extra.directions = FALSE, options = NULL, ... )
bagplot2( x, y = NULL, colorbag = NULL, colorloop = NULL, colorchull = NULL, databag = TRUE, dataloop = TRUE, plot.fence = FALSE, type = "hdepth", sizesubset = 500, extra.directions = FALSE, options = NULL, ... )
x , y
|
the x and y arguments provide the x and y coordinates for the bagplot.
Any reasonable way of defining the coordinates is acceptable. See the function |
colorbag |
The color of the bag (which contains the 50% observations with largest depth). |
colorloop |
The color of the loop (which contains the regular observations). |
colorchull |
When the bagplot is based on halfspace depth, the depth region with maximal depth is plotted. This argument controls its color. |
databag |
Logical indicating whether data points inside the bag need to be plotted. |
dataloop |
Logical indicating whether data points inside the fence need to be plotted. |
plot.fence |
Logical indicating whether the fence should be plotted. |
type |
Determines the depth function used to construct the bagplot: |
sizesubset |
When computing the bagplot based on halfspace depth,
the size of the subset used to perform the main
computations. See Details for more information. |
extra.directions |
Logical indicating whether additional directions should
be considered in the computation of the fence for the
bagplot based on projection depth or skewness-adjusted
projection depth. If set to |
options |
A list of options to pass to the
|
... |
further parameters given to |
The bagplot has been proposed by Rousseeuw et al. (1999) as a generalisation of the boxplot to bivariate data. It is constructed based on halfspace depth and as such is invariant under affine transformations. Similar graphical representations can be obtained by means of other depth functions, as illustrated in Hubert and Van der Veeken (2008) and in Hubert et al. (2015). See mrfDepth::compBagplot for more details.
The deepest point is indicated with a "*" sign, the outlying observations with red points.
Invisibly the result of the call to mrfDepth::compBagplot
Rousseeuw P.J., Ruts I., Tukey, J.W. (1999). The bagplot: a bivariate boxplot. The American Statistician, 53, 382–387.
Hubert M., Van der Veeken S. (2008). Outlier detection for skewed data. Journal of Chemometrics, 22, 235–246.
Hubert M., Rousseeuw P.J., Segaert, P. (2015). Rejoinder to 'Multivariate functional outlier detection'. Statistical Methods & Applications, 24, 269–277.
mrfDepth::compBagplot and mrfDepth::bagplot
bagplot2(iris$Sepal.Length, iris$Sepal.Width) bagplot2(iris[,1:2]) bagplot2(iris[,3:4], title="Bagplot with Tukey depth", xlab="Petal.Length", ylab="Petal.Width") # library("mrfDepth") data("bloodfat") result <- compBagplot(bloodfat) bagplot(result, colorbag = rgb(0.2,0.2,0.2), colorloop = "green")
bagplot2(iris$Sepal.Length, iris$Sepal.Width) bagplot2(iris[,1:2]) bagplot2(iris[,3:4], title="Bagplot with Tukey depth", xlab="Petal.Length", ylab="Petal.Width") # library("mrfDepth") data("bloodfat") result <- compBagplot(bloodfat) bagplot(result, colorbag = rgb(0.2,0.2,0.2), colorloop = "green")
Bins each variable in data
in bins
Bins. It can return a data frame (out="data.frame"
), a table with the counts (out="table"
), or
a table converted to a data frame with an additional variable Freq
. The values can be either the bin mids (val="mids"
) or
the bin numbers (val="interval"
). If possible all variables contain an attribute breaks
with breaks used.
binData( data, bins, out = c("data.frame", "table", "binned"), val = c("mid", "interval"), pretty = TRUE, numeric = TRUE )
binData( data, bins, out = c("data.frame", "table", "binned"), val = c("mid", "interval"), pretty = TRUE, numeric = TRUE )
data |
object: a data.frame or object that can be converted to a data frame with variables to bin |
bins |
integer: number of bins, will be recycled if necessary |
out |
character: output type, either |
val |
character: values for outer, eiter |
pretty |
logical: should be base::pretty used or minimum and maximum (default: |
numeric |
logical: return output a |
a data frame or table with the results
df <- data.frame(x=runif(25), y=runif(25)) binData(df, 5, 'data.frame') binData(df, 5, 'table') binData(df, 5, 'binned')
df <- data.frame(x=runif(25), y=runif(25)) binData(df, 5, 'data.frame') binData(df, 5, 'table') binData(df, 5, 'binned')
Converts a matrix or data frame into a character vector, matrix or data frame. If na.action
is a character then
all NA
s are replaced by na.action
(default: na.action="NA"
). If na.action
is a function then the function will be
applied to the result.
character_data( x, select = NULL, out = c("data.frame", "matrix", "vector"), na.action = "NA", ..., title = NULL )
character_data( x, select = NULL, out = c("data.frame", "matrix", "vector"), na.action = "NA", ..., title = NULL )
x |
vector, matrix or data frame |
select |
vector: indicating columns to select (default: |
out |
output as |
na.action |
function or character: indicates what should happen when the data contain NAs |
... |
unused |
title |
character: title attribute (default |
the desired R object
character_data(iris) character_data(iris, out="matrix") character_data(iris, out="vector")
character_data(iris) character_data(iris, out="matrix") character_data(iris, out="vector")
Checks if a package is installed without loading it. Returns a logical vector with TRUE
or FALSE
for each package checked.
checkPackages( ..., plotmodule = NULL, add = c("tools", "devtools", "formatR", "highlight", "shiny", "shinydashboard", "shinydashboardPlus", "shinyWidgets", "DT", "sortable", "base64enc"), error = FALSE ) installPackages( plotmodule = NULL, add = c("tools", "devtools", "formatR", "highlight", "shiny", "shinydashboard", "shinydashboardPlus", "shinyWidgets", "DT", "sortable", "base64enc") )
checkPackages( ..., plotmodule = NULL, add = c("tools", "devtools", "formatR", "highlight", "shiny", "shinydashboard", "shinydashboardPlus", "shinyWidgets", "DT", "sortable", "base64enc"), error = FALSE ) installPackages( plotmodule = NULL, add = c("tools", "devtools", "formatR", "highlight", "shiny", "shinydashboard", "shinydashboardPlus", "shinyWidgets", "DT", "sortable", "base64enc") )
... |
character: name(s) of package |
plotmodule |
character: name(s) of plot modules to check for packages |
add |
character: names of default packages to check (default: |
error |
logical: should a error thrown if one or more package are missing? (default: |
TRUE
if successful otherweise an error will be thrown
checkPackages("graphics", add=NULL) # checks if 'graphics' is installed if (interactive()) checkPackages("graphics") # checks if 'graphics', 'shiny', ... are installed if (interactive()) installPackages() # installs all packages to show ALL plots
checkPackages("graphics", add=NULL) # checks if 'graphics' is installed if (interactive()) checkPackages("graphics") # checks if 'graphics', 'shiny', ... are installed if (interactive()) installPackages() # installs all packages to show ALL plots
Assigns a color to the data x
based on the color palette colpal
.
color_data(x, colpal = grDevices::hcl.colors, select = NULL, ..., title = NULL)
color_data(x, colpal = grDevices::hcl.colors, select = NULL, ..., title = NULL)
x |
vector, matrix, or data frame |
colpal |
color palette (default: grDevices::hcl.colors) |
select |
vector: indicating columns to select (default: |
... |
further parameters to factor_data |
title |
character: title attribute (default |
a color vector
color_data(iris) color_data(iris$Species)
color_data(iris) color_data(iris$Species)
Determines colors for x
based on stats::hclust. x
is normalized according normalize.
color_hclust( x, normalize = 1, ncol = 2, colpal = grDevices::hcl.colors, dist = "euclidean", na.action = stats::na.pass, ... )
color_hclust( x, normalize = 1, ncol = 2, colpal = grDevices::hcl.colors, dist = "euclidean", na.action = stats::na.pass, ... )
x |
a numeric matrix, data frame or "dist" object. |
normalize |
integer: normalization method (default:
|
ncol |
integer: maximal number colors |
colpal |
color palette: a function which generates "ncol" colors with "colpal(ncol)" (default: grDevices::hcl.colors) |
dist |
the distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra" or "binary"(default: |
na.action |
a function which indicates what should happen when the data contain NAs (default: |
... |
further parameters given to stats::hclust |
a color vector
color_hclust(iris[,-5], ncol=6)
color_hclust(iris[,-5], ncol=6)
Converts an input object (vector, matrix or data frame) to an output asccording to the format in out
.
Variable, Row and column names are set, if possible as well as a attribute title
.
convertTo(x, coln, rown, title, out = c("data.frame", "matrix", "vector"))
convertTo(x, coln, rown, title, out = c("data.frame", "matrix", "vector"))
x |
vector, matrix, or data frame: input |
coln |
character: column names if possible |
rown |
character: row names if possible |
title |
character: for title attribute |
out |
character: either |
the desired output object
str(convertTo(pi, "Col1", "Row1", "Title", out='data.frame')) str(convertTo(pi, "Col1", "Row1", "Title", out='matrix')) str(convertTo(pi, "Col1", "Row1", "Title", out='vector'))
str(convertTo(pi, "Col1", "Row1", "Title", out='data.frame')) str(convertTo(pi, "Col1", "Row1", "Title", out='matrix')) str(convertTo(pi, "Col1", "Row1", "Title", out='vector'))
Creates a single group variable from the data x
.
factor_data( x, select = NULL, out = c("data.frame", "matrix", "vector"), exclude = NULL, na.action = stats::na.pass, ..., title = NULL )
factor_data( x, select = NULL, out = c("data.frame", "matrix", "vector"), exclude = NULL, na.action = stats::na.pass, ..., title = NULL )
x |
vector, matrix, or data frame |
select |
vector: indicating columns to select (default: |
out |
output as |
exclude |
vector: values to be excluded when forming the set of levels (default: |
na.action |
a function which indicates what should happen when the data contain NAs (default: stats::na.pass) |
... |
further parameters to character_data |
title |
character: title attribute (default |
a one-column matrix with the merged groups
factor_data(iris$Species, out="vector") factor_data(iris)
factor_data(iris$Species, out="vector") factor_data(iris)
formatCommands
formatCommands(cmds)
formatCommands(cmds)
cmds |
characater: R code |
HTML code for the splot app
formatCommands('print("Hello World!")')
formatCommands('print("Hello World!")')
Returns a list of available module as a list.
getModules(pattern, path = getShinyOption("smvgraph.path"))
getModules(pattern, path = getShinyOption("smvgraph.path"))
pattern |
character: character string containing a regular expression, currently are used |
path |
character: containing the path where to search the modules |
a list with the modules
library("shiny") getModules('plot_*.R') # get plots getModules('color_*.R') # get colors
library("shiny") getModules('plot_*.R') # get plots getModules('color_*.R') # get colors
Returns val
if length(val)>1
. Otherwise it runs through args=list(...)
until it finds an element with
length(args[[i]])>0
and returns it. If everything fails NULL
will be returned.
getval(val, ...)
getval(val, ...)
val |
current value |
... |
sequence of alternative values |
a value
getval(NULL, 0) getval(1, 0)
getval(NULL, 0) getval(1, 0)
Returns a data frame with one row for each variable in data
:
getVariableInfo(data, n = 47)
getVariableInfo(data, n = 47)
data |
data frame: input data set |
n |
integer: character length for |
class
the base::class of the variable
missing
the number of missing values
infinite
the number of infinite values
unique
the number of unique values
valid
the number of unique valid values (see valid)
values
the values with the decreasing frequency
a data frame with information about the variables of the input data set
getVariableInfo(iris)
getVariableInfo(iris)
Extracts variable names from a data frame or matrix (column names).
getVariableNames(x, xvar = NULL, num = TRUE)
getVariableNames(x, xvar = NULL, num = TRUE)
x |
data frame/matrix: data set to analyse |
xvar |
character: variable names to analyse (default: |
num |
logical: should numerical or non-numerical variable use (default: |
character vector with variable names
getVariableNames(iris) getVariableNames(iris, num=FALSE) getVariableNames(normalize(iris, 0)) getVariableNames(normalize(iris, 0), num=FALSE)
getVariableNames(iris) getVariableNames(iris, num=FALSE) getVariableNames(normalize(iris, 0)) getVariableNames(normalize(iris, 0), num=FALSE)
Add a small amount of noise to a numeric vector. The result is x + runif(n, -a, a)
where n <- length(x)
and a <- abs(factor*amount)
argument. If amount==0
then amount
is set to 1e-6
times the smallest non-zero distance between adjacent unique x
values.
In case of no non-zero distances amount
is set to 1e-6*(1+min(abs(x)))
.
Note that jitter_min
delivers different results then base::jitter.
jitter_min(x, factor = 1, amount = 0)
jitter_min(x, factor = 1, amount = 0)
x |
numeric: vector to which jitter should be added |
factor |
numeric: multiplier for |
amount |
numeric: amount for jittering (default: |
jittered data
jitter_min(runif(6)) jitter_min(rep(0, 7)) jitter_min(rep(10000, 5))
jitter_min(runif(6)) jitter_min(rep(0, 7)) jitter_min(rep(10000, 5))
Stores in a temporary file the log messages including messages, warnings and errors.
loggit(log_lvl, log_msg) read_logs() set_logfile()
loggit(log_lvl, log_msg) read_logs() set_logfile()
log_lvl |
character: Level of log output. In actual practice, one of "DEBUG", "INFO", "WARN", and "ERROR" are common, but any string may be supplied |
log_msg |
character: Main log message |
Nothing.
if (interactive()) { set_logfile() # create a temporary file for logging loggit("DEBUG", "Hello world") read_logs() # get a data frame with the current messages. }
if (interactive()) { set_logfile() # create a temporary file for logging loggit("DEBUG", "Hello world") read_logs() # get a data frame with the current messages. }
Extracts the numeric vectors from a data frame and normalizes each vector.
Note: In case that a variable is constant for method==1
(minmax) the entries will be replaced by 0.5
and
for method==2
(standardization) the entries will be replaced by 0
.
normalize(x, method = 1)
normalize(x, method = 1)
x |
data.frame or matrix |
method |
integer: normalization method (default:
|
numeric matrix
In package normalize
or at CRAN
normalize(iris, 2)
normalize(iris, 2)
Converts a vector, matrix or data frame into a numeric vector, matrix or data frame.
numeric_data( x, select = NULL, out = c("data.frame", "matrix", "vector"), na.action = stats::na.pass, ..., title = NULL )
numeric_data( x, select = NULL, out = c("data.frame", "matrix", "vector"), na.action = stats::na.pass, ..., title = NULL )
x |
vector, matrix or data frame |
select |
vector: indicating columns to select (default: |
out |
output as |
na.action |
a function which indicates what should happen when the data contain NAs (default: stats::na.pass) |
... |
unused |
title |
character: title attribute (default |
the desired R object
numeric_data(iris) numeric_data(iris, out="matrix") numeric_data(iris, out="vector")
numeric_data(iris) numeric_data(iris, out="matrix") numeric_data(iris, out="vector")
Returns a reording of the columns of x
to visualize outliers or clusters better.
If no colum names are given then V1
, V2
, ... will be used.
order_andrews(x, method = 1)
order_andrews(x, method = 1)
x |
data matrix |
method |
numeric: order method (default:
|
order of column vectors
order_andrews(iris)
order_andrews(iris)
Returns a reordering of the columns of x
to visualize highly correlated variable pairs
based on a cluster analysis of the correlation matrix.
If no colum names are given then V1
, V2
, ... will be used.
order_parcoord(x, method = "spearman", ...)
order_parcoord(x, method = "spearman", ...)
x |
data matrix |
method |
numeric: order method (default: |
... |
further parameters given to stats::cor |
order of column vectors
order_parcoord(iris)
order_parcoord(iris)
pyramid
pyramid( tab, gap = 0, left = list(col = "red"), right = list(col = "blue"), ... )
pyramid( tab, gap = 0, left = list(col = "red"), right = list(col = "blue"), ... )
tab |
table: a table with two columns |
gap |
numeric(2): relative size of gap in |
left |
list: parameters for the left polygons (default: |
right |
list: parameters for the right polygons (default: |
... |
further parameters to use in graphics::plot.default |
a pyramid plot
data("Boston", package="MASS") tab <- table(data.frame(Boston$rad, Boston$chas)) pyramid(tab, main="Absolute frequencies") pyramid(tab, gap=c(0.2, 0.2)) rtab <- tab/sum(tab) pyramid(rtab, gap=c(0.2, 0.2), main="Relative frequencies") ctab <- proportions(tab, 2) pyramid(ctab, gap=c(0.2, 0.2), main="Conditional frequencies on columns") rtab <- proportions(tab, 1) pyramid(rtab, gap=c(0.2, 0.2), main="Conditional frequencies on rows") # zebraing pyramid(tab, gap=c(0.2, 0.2), left=list(list(col="black"), list(col="white")), right=list(list(col="blue"), list(col="green")))
data("Boston", package="MASS") tab <- table(data.frame(Boston$rad, Boston$chas)) pyramid(tab, main="Absolute frequencies") pyramid(tab, gap=c(0.2, 0.2)) rtab <- tab/sum(tab) pyramid(rtab, gap=c(0.2, 0.2), main="Relative frequencies") ctab <- proportions(tab, 2) pyramid(ctab, gap=c(0.2, 0.2), main="Conditional frequencies on columns") rtab <- proportions(tab, 1) pyramid(rtab, gap=c(0.2, 0.2), main="Conditional frequencies on rows") # zebraing pyramid(tab, gap=c(0.2, 0.2), left=list(list(col="black"), list(col="white")), right=list(list(col="blue"), list(col="green")))
Resets the par if necessary.
resetpar(oldpar)
resetpar(oldpar)
oldpar |
graphical parameters |
nothing
# no examples
# no examples
Shiny app for creating an Andrews curve diagram with interactive variable selection.
sandrews(data, xvar = character(0), ...)
sandrews(data, xvar = character(0), ...)
data |
matrix or data frame |
xvar |
character: names of selected variables for the plot |
... |
unused |
nothing
if (interactive()) sandrews(iris)
if (interactive()) sandrews(iris)
Shiny app for creating a Chernoff faces plot with interactive variable selection.
schernoff(data, xvar = character(0), ...)
schernoff(data, xvar = character(0), ...)
data |
matrix or data.frame |
xvar |
character: names of selected variables for the plot |
... |
further parameters given to DescTools::PlotFaces |
nothing
if (interactive()) schernoff(normalize(iris))
if (interactive()) schernoff(normalize(iris))
Shiny app which allows to run a cluster analysis with DBSCAN with interactive choice of variables, core distance, and minimal neighbours.
sdbscan(data, xvar = character(0), ...)
sdbscan(data, xvar = character(0), ...)
data |
matrix or data.frame |
xvar |
character: names of selected variables for the clustering |
... |
unused |
nothing
if (interactive()) sdbscan(iris)
if (interactive()) sdbscan(iris)
Shiny app which shows the contribution of each variable to the distance between two observations
with interactive variable selection. If is the distance between observations
and
in variable
then the contribution is computed:
sdistance(data, xvar = character(0), ...)
sdistance(data, xvar = character(0), ...)
data |
matrix or data.frame |
xvar |
character: names of selected variables for the plot |
... |
unused |
Total variance: with
the variance of the
th variable
Minimum:
Manhattan:
Gower: is rescaled to
in each variable and then
Euclidean:
Manhattan:
Maximum:
nothing
if (interactive()) sdistance(iris)
if (interactive()) sdistance(iris)
Shiny app for doing a factor analysis with interactive variable selection.
sfactor(data, xvar = character(0), ...)
sfactor(data, xvar = character(0), ...)
data |
matrix or data frame |
xvar |
character: names of selected variables for the plot |
... |
unused |
nothing
if (interactive()) sfactor(iris)
if (interactive()) sfactor(iris)
Shiny app which allows to run a hierarchical cluster analysis with interactive choice of variables, distance, and agglomeration method.
shclust(data, xvar = character(0), ...)
shclust(data, xvar = character(0), ...)
data |
matrix or data.frame |
xvar |
character: names of selected variables for the clustering |
... |
unused |
nothing
if (interactive()) shclust(iris)
if (interactive()) shclust(iris)
Shiny app which allows to run a k-means cluster analysis with interactive choice of variables.
skmeans(data, xvar = character(0), ...)
skmeans(data, xvar = character(0), ...)
data |
matrix or data.frame |
xvar |
character: names of selected variables for the clustering |
... |
unused |
nothing
if (interactive()) skmeans(iris)
if (interactive()) skmeans(iris)
Shiny app which allows to run a EM clustering with interactive choice of variables.
smclust(data, xvar = character(0), ...)
smclust(data, xvar = character(0), ...)
data |
matrix or data.frame |
xvar |
character: names of selected variables for the clustering |
... |
unused |
nothing
if (interactive()) smclust(iris)
if (interactive()) smclust(iris)
Shiny app for creating a Mosaic plot with interactive variable selection.
smosaic(data, xvar = character(0), yvar = character(0), ...)
smosaic(data, xvar = character(0), yvar = character(0), ...)
data |
table or data.frame |
xvar |
character: names of selected variables for x-axis |
yvar |
character: names of selected variables for y-axis |
... |
further parameters given to graphics::mosaicplot |
nothing
if (interactive()) smosaic(Titanic) dfTitanic <- toDataframe(Titanic) if (interactive()) smosaic(dfTitanic)
if (interactive()) smosaic(Titanic) dfTitanic <- toDataframe(Titanic) if (interactive()) smosaic(dfTitanic)
Sorts and bins the rows of the data frame x
according the sorting columns in sortCol
.
decreasing
and na.last
are recycled is necessary. If equibin
is TRUE
and nBins==NA
then nBins
is set to 100
. If equibin
is FALSE
and nBins==NA
then the bins are returned as they come from sorting; only identical values are
in one bin. If nBins
is positive then the bins are merged until nBins
reached.
Note that the numbers of observations per bin may vary.
sortbin( x, sortCol = 1, decreasing = FALSE, na.last = TRUE, nBins = NA, equibin = TRUE )
sortbin( x, sortCol = 1, decreasing = FALSE, na.last = TRUE, nBins = NA, equibin = TRUE )
x |
data frame |
sortCol |
numeric/character: names or indices of variable used for sorting (default: |
decreasing |
logical: should the sort order be increasing or decreasing (default: |
na.last |
logical: for controlling the treatment of NAs (default: |
nBins |
integer: maximal number of bins (default: |
equibin |
logical: should the number of observations equal per bin (default: |
(non-sequential) bin numbers as integer
data("Boston", package="MASS") tableplot(Boston, bin=sortbin(Boston))
data("Boston", package="MASS") tableplot(Boston, bin=sortbin(Boston))
Shiny app for creating a scatterplot matrix with interactive variable selection.
spairs(data, xvar = character(0), ...)
spairs(data, xvar = character(0), ...)
data |
matrix or data.frame |
xvar |
character: names of selected variables for the plot |
... |
further parameters given to graphics::pairs |
nothing
if (interactive()) spairs(iris)
if (interactive()) spairs(iris)
Shiny app for creating a Parallel Coordinate plot with interactive variable selection.
sparcoord(data, xvar = character(0), ...)
sparcoord(data, xvar = character(0), ...)
data |
matrix or data.frame |
xvar |
character: names of selected variables for the plot |
... |
further parameters given to MASS::parcoord |
nothing
if (interactive()) sparcoord(iris)
if (interactive()) sparcoord(iris)
Shiny app for choosing a specific plot.
splot(data, xvar = character(0), path = NULL)
splot(data, xvar = character(0), path = NULL)
data |
data.frame: input data |
xvar |
character: selected variables (default: |
path |
character: path where to read the plot modules (default: |
nothing
if (interactive()) splot(iris)
if (interactive()) splot(iris)
Shiny app for creating radar charts with interactive variable selection.
sradar(data, xvar = character(0), ...)
sradar(data, xvar = character(0), ...)
data |
matrix or data.frame |
xvar |
character: names of selected variables for the plot |
... |
unused |
nothing
if (interactive()) sradar(iris)
if (interactive()) sradar(iris)
A tableplot is a visualisation of multivariate data sets. Each column represents a variable and each row bin is an aggregate of a certain number of records. For numeric variables, a value box is plotted with minimum, mean (black line) and maximum value. If any missing values in a bin of a numeric variable appear the box left from the value box is plotted in gray. For categorical variables, a stacked bar chart is depicted of the proportions of categories. Missing values are taken into account.
tableplot( x, select = NULL, subset = NULL, bin = NULL, yj = NA, IQR_bias = 5, colpal = grDevices::rainbow, color.NA_num = "gray75", color.NA = "grey75", color.num = "lightblue", color.box = "deepskyblue", color.line = "black", box.lower = NULL, box.upper = NULL, box.line = NULL, cex.main = 1, cex.legend = 1, width = 1, height = 0.15 )
tableplot( x, select = NULL, subset = NULL, bin = NULL, yj = NA, IQR_bias = 5, colpal = grDevices::rainbow, color.NA_num = "gray75", color.NA = "grey75", color.num = "lightblue", color.box = "deepskyblue", color.line = "black", box.lower = NULL, box.upper = NULL, box.line = NULL, cex.main = 1, cex.legend = 1, width = 1, height = 0.15 )
x |
data frame |
select |
numeric/character: variable to show in the plot (default: |
subset |
numeric: index of observations to show |
bin |
integer: bin numbers to which a observations belongs (default: |
yj |
numeric: Yeo Johnson coefficient (default: |
IQR_bias |
numeric: parameter that determines when a logarithmic scale is used when |
colpal |
color palette to draw (default: |
color.NA_num |
color for missing of infinity values for numeric variables (default: |
color.NA |
color for missing values for categorical variables (default: |
color.num |
color for lower box for numeric variables (default: |
color.box |
color for upper box for numeric variables (default: |
color.line |
color for line in upper box for numeric variables (default: |
box.lower |
function: determine lower border in upper box for numeric variables (default: |
box.upper |
function: determine upper border in upper box for numeric variables (default: |
box.line |
function: determine line position in upper box for numeric variables (default: |
cex.main |
number: magnification to be used for the titles (default: |
cex.legend |
number: magnification to be used for the legends (default: |
width |
number: width of percentage axis (default: |
height |
number: percentage of the height of the legends (default: |
The idea and some code of the tableplot is taken from tableplot package by Martijn Tennekes and Edwin de Jonge. It differs from their package by
multicolumn sorting is possible, and
no support for 'ff' (out of memory vectors).
nothing
Tennekes, M., Jonge, E. de, Daas, P.J.H. (2013), Visualizing and Inspecting Large Datasets with Tableplots, Journal of Data Science 11 (1), 43-58.
data("Boston", package="MASS") tableplot(Boston, bin=sortbin(Boston))
data("Boston", package="MASS") tableplot(Boston, bin=sortbin(Boston))
Each line of a code template consists of condition based on the unnamed parameters and R code in which replacements with named parameters done.
template(text, ...)
template(text, ...)
text |
a code template |
... |
further parameters |
a character vector
template(" 1: 'Hello {{letter}}' !1: 'Good-bye {{letter}}' ", letter=sample(LETTERS, 1), runif(1)<0.5 #1 = first unnamed parameter )
template(" 1: 'Hello {{letter}}' !1: 'Good-bye {{letter}}' ", letter=sample(LETTERS, 1), runif(1)<0.5 #1 = first unnamed parameter )
A data frame containing various variable types and special values.
testdata
testdata
A data frame with n=25
rows and 8 variables:
runif(n)
with a NA
, NaN
, Inf
, -Inf
rnorm(n, 0, 2)
with a NA
, NaN
rep(0, n)
as.integer(rnorm(n, 0, 2)
with a NA
, NaN
sample(c(0,1), size=n, replace=TRUE)
factor(as.integer(rnorm(n, 0, 2))
with a NA
ordered(as.integer(rnorm(n, 0, 2))
with a new level 10
ordered(as.integer(rnorm(n, 0, 2))
with a NA
as.character(as.integer(rnorm(n, 0, 2))
with a NA
and ""
sample(c(T,F), size=n, replace=TRUE))
with a NA
rep("constant, n)
sample(c(T,F), size=n, replace=TRUE)
The elements in ...
will coerced into one text vector. The entries will either the text (method==NA
) or integer number starting at method
.
The first letter of the list element names will be capitalized.
toChoice(method = NA, ...)
toChoice(method = NA, ...)
method |
integer: which method is used for creating the list elements |
... |
character: choice values |
a list
txt <- c("the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog") toChoice(NA, txt) toChoice(0, txt) # integer sequence starts at zero
txt <- c("the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog") toChoice(NA, txt) toChoice(0, txt) # integer sequence starts at zero
Converts a table to a full data frame.
toDataframe(obj, name = NULL, ...)
toDataframe(obj, name = NULL, ...)
obj |
R object ( |
name |
character: vector of variable name(s), only use for a |
... |
further parameters given to base::as.data.frame.table |
a data frame
toDataframe(Titanic) toDataframe(austres)
toDataframe(Titanic) toDataframe(austres)
Given a (minimal) length len
and the number unused entries a layout is generated.
If sel<0
then less rows and more columns are used and if sel>0
then more rows and less columns are used.
toLayout(len, sel = 0, unused = 0)
toLayout(len, sel = 0, unused = 0)
len |
integer: minimal size of layout |
sel |
integer: select less or more rows (default: |
unused |
integer: number of unused entries (default: |
a matrix
toLayout(13)
toLayout(13)
Saves one or more data sets in RDS format to a temporary directory (tmpdir()
).
Data sets must have the class ts
or something that can be converted
to a data frame, e.g. matrix
, table
, etc.
toRDS(...)
toRDS(...)
... |
data sets to save |
returns the name of the created files
toRDS(Titanic) # saves to tempdir/Titanic.rds
toRDS(Titanic) # saves to tempdir/Titanic.rds
Estimate a trend and seasonaliyt for a time series. Available functions:
trend_season
to generate an estimate
print
to print the estimate
summary
to summarize the etsimate result
plot
to plot the time series, its estimation and the residuals
coef
to extract the coefficients if a seasonality estimation was done
residuals
to extract the residuals of the model
fitted
to the fitted values
trend_season(t, ...) ## Default S3 method: trend_season( t, trend = c("constant", "linear", "exponential"), season = c("none", "additive", "multiplicative"), ... ) ## S3 method for class 'trend_season' print(x, ...) ## S3 method for class 'trend_season' summary(object, ...) ## S3 method for class 'trend_season' plot(x, y, which = 1, ...)
trend_season(t, ...) ## Default S3 method: trend_season( t, trend = c("constant", "linear", "exponential"), season = c("none", "additive", "multiplicative"), ... ) ## S3 method for class 'trend_season' print(x, ...) ## S3 method for class 'trend_season' summary(object, ...) ## S3 method for class 'trend_season' plot(x, y, which = 1, ...)
t |
ts: time series object |
... |
unused |
trend |
character: trend method, either |
season |
character: seasonality method, either |
x , object
|
trend_season: estimated time series |
y |
unused |
which |
integer: what to plot, |
trend_season
returns a trend_season
object with
call
the function call
ts
the input time series
trend
the trend estimation (ts
object)
trend.residuals
the residuals of the trend estimation (ts
object)
season
the trend and season estimation (ts
object)
season.residuals
the residuals of the trend and season estimation (ts
object)
coefficients
the coefficients used in the seasonality estimation
residuals
the residuals of the model
fitted.values
the fitted values of the model
tts <- trend_season(austres, "linear") print(tts) summary(tts) plot(tts) plot(tts, which=2) residuals(tts) fitted(tts) coef(tts) # if NULL then no seasonality was estimated
tts <- trend_season(austres, "linear") print(tts) summary(tts) plot(tts) plot(tts, which=2) residuals(tts) fitted(tts) coef(tts) # if NULL then no seasonality was estimated
Some general UI elements for common use where last selected value is stored for reuse:
UIplottype
plot type, defines smvgraph_type
UIpointsymbol
plot symbol for point, defines smvgraph_pch
UIpointsize
point size, defines smvgraph_cex
UIlinetype
line type, defines smvgraph_lty
UIlinewidth
line width, defines smvgraph_lwd
UItextsize
text size, defines smvgraph_tex
UIlegend
legend position, defines smvgraph_legend
UIlegendsize
legend size, defines smvgraph_lex
UIdatanormalization
should data be rescaled, defines smvgraph_normalize
(no, minMax, mtandardization)
UIdistance
distance to use, defines smvgraph_distance
UIobservations
range of observations, defines smvgraph_obs
UImergegroups
should a set of grouping variables merged into one group variable, defines smvgraph_single
From the top menu are the following input elements are defined
input$smvgraph_pch
point symbol,
input$smvgraph_cex
point size,
input$smvgraph_lty
line type,
input$smvgraph_lwd
line width,
input$smvgraph_tex
text size, and
inpus$smvgraph_legend
legend position.
UIdatanormalization( sel = getShinyOption("smvgraph.current")$smvgraph_normalize ) UIdistance(sel = getShinyOption("smvgraph.current")$smvgraph_distance) UIobservations(n, sel = getShinyOption("smvgraph.current")$smvgraph_obs) UImergegroups(n, sel = getShinyOption("smvgraph.current")$smvgraph_single) UIpointsize(n, sel = getShinyOption("smvgraph.current")$smvgraph_cex) UIpointsymbol(n, sel = getShinyOption("smvgraph.current")$smvgraph_pch) UIlinewidth(n, sel = getShinyOption("smvgraph.current")$smvgraph_lwd) UIlinetype(n, sel = getShinyOption("smvgraph.current")$smvgraph_lty) UItextsize(n, sel = getShinyOption("smvgraph.current")$smvgraph_tex) UIlegend(n, sel = getShinyOption("smvgraph.current")$smvgraph_legend) UIlegendsize(n, sel = getShinyOption("smvgraph.current")$mvgraph_lex) UIplottype(n, sel = getShinyOption("smvgraph.current")$smvgraph_type)
UIdatanormalization( sel = getShinyOption("smvgraph.current")$smvgraph_normalize ) UIdistance(sel = getShinyOption("smvgraph.current")$smvgraph_distance) UIobservations(n, sel = getShinyOption("smvgraph.current")$smvgraph_obs) UImergegroups(n, sel = getShinyOption("smvgraph.current")$smvgraph_single) UIpointsize(n, sel = getShinyOption("smvgraph.current")$smvgraph_cex) UIpointsymbol(n, sel = getShinyOption("smvgraph.current")$smvgraph_pch) UIlinewidth(n, sel = getShinyOption("smvgraph.current")$smvgraph_lwd) UIlinetype(n, sel = getShinyOption("smvgraph.current")$smvgraph_lty) UItextsize(n, sel = getShinyOption("smvgraph.current")$smvgraph_tex) UIlegend(n, sel = getShinyOption("smvgraph.current")$smvgraph_legend) UIlegendsize(n, sel = getShinyOption("smvgraph.current")$mvgraph_lex) UIplottype(n, sel = getShinyOption("smvgraph.current")$smvgraph_type)
sel |
selected element |
n |
integer: number of observations |
an UI element for shiny
# none
# none
Computes the number a logical matrix or vector if all values are valid in x
or each colum or row of x
. Valid values for numeric variables are is.finite(v)
and for other types !is.na
Computes the number a logical matrix or vector if any values are valid in x
or each colum or row of x
.
valid(x, margin = 1:2, n = FALSE) invalid(x, margin = 1:2, n = FALSE)
valid(x, margin = 1:2, n = FALSE) invalid(x, margin = 1:2, n = FALSE)
x |
object: anything taht can be coerced to a data frame checked for valid/invalid values |
margin |
integer: a vector giving the subscripts for which valid/invalid values looked for, e.g. |
n |
logical: should just the number of valid/invalid values returned or a logical matrix/vector |
a logical data frame, a logical vector or an integer
data("testdata") valid(testdata) # matrix with logical entries if x has valid entry valid(testdata, n=TRUE) # number of valid entries in x valid(testdata, 1) # vector with logical entries if each row if x has valid entries valid(testdata, 1, n=TRUE) # number of rows with valid entries in x valid(testdata$xu)
data("testdata") valid(testdata) # matrix with logical entries if x has valid entry valid(testdata, n=TRUE) # number of valid entries in x valid(testdata, 1) # vector with logical entries if each row if x has valid entries valid(testdata, 1, n=TRUE) # number of rows with valid entries in x valid(testdata$xu)
Reports progress to the user during long-running operations.
with_progress(...) set_progress(...) inc_progress(...)
with_progress(...) set_progress(...) inc_progress(...)
... |
see [shiny::withProgress] |
see [shiny::withProgress]
## Only run examples in interactive R sessions if (interactive()) { options(device.ask.default = FALSE) ui <- fluidPage(plotOutput("plot")) # server <- function(input, output) { output$plot <- renderPlot({ with_progress(message = 'Calculation in progress', detail = 'This may take a while...', value = 0, { for (i in 1:15) { inc_progress(1/15) Sys.sleep(0.25) } }) plot(cars) }) } # shinyApp(ui, server) }
## Only run examples in interactive R sessions if (interactive()) { options(device.ask.default = FALSE) ui <- fluidPage(plotOutput("plot")) # server <- function(input, output) { output$plot <- renderPlot({ with_progress(message = 'Calculation in progress', detail = 'This may take a while...', value = 0, { for (i in 1:15) { inc_progress(1/15) Sys.sleep(0.25) } }) plot(cars) }) } # shinyApp(ui, server) }
Computes the Yeo-Johnson transformation, which is a normalizing transformation. The code and
documentation is taken from the VGAM package
(see function yeo.johnson
) with some slight modifications, e.g. NA
's are kept and
do not produce an error.
yeo.johnson( y, lambda, derivative = 0, epsilon = sqrt(.Machine$double.eps), inverse = FALSE )
yeo.johnson( y, lambda, derivative = 0, epsilon = sqrt(.Machine$double.eps), inverse = FALSE )
y |
numeric: a vector or matrix. |
lambda |
numeric: It is recycled to the same length as |
derivative |
non-negative integer: the default is the ordinary function evaluation,
otherwise the derivative with respect to |
epsilon |
numeric and positive value: the tolerance given to values of |
inverse |
logical: return the inverse transformation? (default: |
The Yeo-Johnson transformation can be thought of as an extension of the Box-Cox transformation. It handles both positive and negative values, whereas the Box-Cox transformation only handles positive values. Both can be used to transform the data so as to improve normality.
The Yeo-Johnson transformation or its inverse, or its derivatives with respect to
lambda
, of y
.
If inverse = TRUE
then the argument derivative = 0
is required.
Yeo, I.-K. and Johnson, R. A. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954–959.
VGAM::yeo.johnson
, boxcox
.
y <- seq(-4, 4, len = (nn <- 200)) ltry <- c(0, 0.5, 1, 1.5, 2) # Try these values of lambda lltry <- length(ltry) psi <- matrix(as.numeric(NA), nn, lltry) for (ii in 1:lltry) psi[, ii] <- yeo.johnson(y, lambda = ltry[ii]) matplot(y, psi, type = "l", ylim = c(-4, 4), lwd = 2, lty = 1:lltry, ylab = "Yeo-Johnson transformation", col = 1:lltry, las = 1, main = "Yeo-Johnson transformation with some values of lambda") abline(v = 0, h = 0) legend(x = 1, y = -0.5, lty = 1:lltry, legend = as.character(ltry), lwd = 2, col = 1:lltry)
y <- seq(-4, 4, len = (nn <- 200)) ltry <- c(0, 0.5, 1, 1.5, 2) # Try these values of lambda lltry <- length(ltry) psi <- matrix(as.numeric(NA), nn, lltry) for (ii in 1:lltry) psi[, ii] <- yeo.johnson(y, lambda = ltry[ii]) matplot(y, psi, type = "l", ylim = c(-4, 4), lwd = 2, lty = 1:lltry, ylab = "Yeo-Johnson transformation", col = 1:lltry, las = 1, main = "Yeo-Johnson transformation with some values of lambda") abline(v = 0, h = 0) legend(x = 1, y = -0.5, lty = 1:lltry, legend = as.character(ltry), lwd = 2, col = 1:lltry)
Runs all s...
functions for test purposes if interactively called.
zzz()
zzz()
nothing
zzz()
zzz()