B-Box™

Why B-Box™
- Analysis Methods
- Mass Data Analysis Service
Get B-Box™
- For Individuals
- For Business
- Download Brochure
Use B-Box™
- FAQ
- B-Box™ Clip
- Exercise
Ask B-Box™
- Technical Support
- Contact Us
- Suggestions

- Analysis Methods
- Mass Data Analysis Service

Basic statistics for your data are provided, including the sum, mean, variance, coefficient of variation, skewness and kurtosis. Possible outliers in your data are also detected.

Sample Size Estimation

The adequate sample size required for your analysis is provided, calculated based on the probability that the difference between the sample mean and the population mean will be smaller than a certain value.

Correlation Coefficient

This represents the level of linearity between two variables.
You can designate a specific color to entries corresponding to strongly correlated pairs of variables.

Plot

Histogram, box-plot, trend line and scatter plot are provided for your data.

Interaction

You can generate new variables by multiplying together existing variables.
This is mainly used for categorical variables (sex, level of education, etc.). It is a function that makes it easier for you to manage your data.

Dummy Variable

This function generates a set of binary variables based on an existing variable.
For example, for a variable of “months”, 11 binary variables are generated. It is a function that makes it easier for you to manage your data.

Missing Values Correction

This function fills in the empty cells where there is no observed value.
An empty cell can be filled using the mean or mode of the corresponding variable, average of its two nearest neighbors or estimation using regression.

Test Data

You can select a part of your data to be the test data. This test data will then be moved to a new sheet, and can be used to test models in various analyses.
It is a function that makes it easier for you to manage your data.
If your data is large enough, it can be divided into two sets – training data, used in generating a model, and test data, used in testing the model.
Test data is used to see how well a generated model explains the actual data.

Regression Analysis

Linear Regression analyzes the influence of independent variables on a dependent variable (y=f(x)+e).
There are a number of features that help you to easily find the optimal model.
B-Box™ recommends the optimal model via the “all possible models” method.
You can easily explore alternative optimal models by transforming the x and y variables.
The outlier auto deletion feature identifies outliers using the criteria used in B-Box™, and generates regression models based on the data after the outliers have been removed.
You can easily review the basic assumptions of linear regression (normality, multicollinearity, linearity, homoscedasticity and autocorrelation) via relevant hypothesis tests.
You can easily check the predicting power of the model using test data, and review a number of models at the same time using a Z variable.

Spline Regression

Spline regression analysis splits the independent variable into regions, and generates a separate regression model for each region.
You can generate regression models by manually assigning the splitting points (knot points) on the graph.
The automatic splitting feature generates regression models by automatically calculating the optimal knot points.
In addition, you can estimate the dependent variable for a new value of the independent variable from the model.
Lastly, you can review a number of models at the same time using a Z variable.

Logistic Regression

This is a statistical analysis which is used when the response is categorical. The analysis allows users to classify each observation and predict the probability of response. It is similar with linear regression but the major difference is that the dependent variables can be nominal.

If users have a number of independent variables and wish to find the best combination among them, B-Box™ recommends the optimal model via “All Possible Models” method. It considers all possible combinations of the independent variables and provides the optimal model whose all regression coefficients are valid and AIC is minimum.
Also B-Box™ calculates residuals and detects outliers, as a result of which users possibly increase the model accuracy.

Time Series Analysis

Time Series Analysis is used to identify characteristics and predict future values from a set of data observed with the passing of time.
B-Box™ contains the moving average, exponential smoothing, decomposition and Winters models for time series analysis. Using the auto execution feature, you can run all the models together and view the results, and on this “All Possible” results tab, you can run any one of the models and view more detailed results.
For exponential smoothing and Winters models, the smoothing factors can be automatically calculated for your convenience.
In addition, you can easily check the predicting power of the model using test data, and review a number of models at the same time using a Z variable.

Cluster Analysis

Cluster Analysis attempts to group together data points which share similar characteristics.
B-Box™ contains the PAM technique, a non-hierarchical method that is robust in dealing with outliers. For your convenience, the “All possible” feature allows you to enter a range of numbers of clusters rather than specifying a single number.
Moreover, silhouette scores are produced for each case, enabling you to see the changes in the silhouette scores when the clusters change, making it very convenient for you to run simulations to find suitable clusters.

Discriminant Analysis

Discriminant Analysis is used to determine to which category a data point belongs.
Discriminant analysis can be performed on every possible combination of variables (all possible models). You can see the apparent error rate of every possible model and run any one of the models to view more detailed results.
A screen is provided where you can categorize new data points, and the current categorization is visualized as a picture.

Factor Analysis

Factor Analysis attempts to identify hidden relationships between the variables and group them together.
You can obtain prompt results after selecting your variables and defining the number of factors.
If you chose the number of factors to be 2, the result of rotating the factors is visualized on a graph.

Principle Component Analysis

Principal Component Analysis reduces the number of variables, allowing you to explain your data in a simpler manner.
You can obtain prompt results after selecting your variables.
Scatter plots are provided for relationships between principal components, and between a principal component and an observable variable. Furthermore, to help you determine the number of principal components to consider, a scree plot is provided.

Biplot

A graph, called a biplot, is provided to visualize the relationships between your data and multidimensional variables, obtained from principal component analysis.

Structural Equation Modeling

Structural Equation Modeling identifies various causal relationships between the variables through a single model.
For your convenience in defining a model, B-Box™ contains the variable generation feature and the basic model generation feature that allow you to define variables easily. Moreover, B-Box™ also provides a process of validating the model before the calculations begin, to check whether the model as defined is mathematically solvable.
You can see your results at a glance on a path diagram, including the results of significance tests for the estimated coefficients (insignificant paths are highlighted as dotted red lines).

Outlier Diagnostics

For each variable, possible outliers are detected by checking whether each data point falls within a specified standard deviation range from the mean. Basic statistics for the variable before and after the removal of outliers are provided.
You can review a number of datasets at the same time using a Z variable.

Hypothesis Testing

B-Box™ provides hypothesis tests for the following:
Mean of a single variable
Variance of a single variable
Difference of the means of two variables
Ratio of the variances of two variables
Paired difference of two variables
Means of 3 or more variables.
You can test a number of datasets at the same time using a Z variable.

Data Envelopment Analysis

Data Envelopment Analysis analyzes the efficiency of each data point (called Decision Making Units, or DMUs, in this context) by considering input and output variables.
For each DMU, not only is an efficiency score given, but you can also see how much the input variables can be reduced and how much the output variables can be raised. Negative values can be considered.
Introducing the concept of super efficiency, B-Box™ enables absolute evaluation of the efficiency of DMUs, as well as relative comparisons.
You can evaluate the efficiencies of a number of sets of DMUs at the same time using a Z variable.

Linear Programming

You can find the optimal solution of an objective function under linear constraints, using solution algorithms for linear programs.

Analytic Hierarchy Process

Analytic Hierarchy Process (AHP) determines an order of importance between factors considered in a decision making process, and evalutes the alternatives for each factor.
Following Professor T. L. Satty’s AHP methodology, weights are placed on each factor in a subjective decision making process.
Subjective responses undergo a consistency test, in order to ensure reliability of the results.

Net Present Value

Present value is obtained by discounting future cash flows to take into account the time value of money, and investment decisions are assisted based on the net present value thus obtained..

Questionnaire Analysis

Analysis for a survey tends to be repetitive and time consuming. The analysis through B-Box carries out this by providing various statistics, plots and tests for the reliability and hypothesis for equal mean/ variance.

Decision Tree

Decision tree is a classification and prediction method using a tree-like graph or model, B-Box™ adopts CART algorithm which uses binary splitting of the tree nodes. Users can see the pruning process with the tree plot. Using Z variable a number of models are created at once so that they can easily find the optimal tree model.

Word Cloud

It converts contents from MS Word, MS Powerpoint, MS Excel, PDF, and websites into text file.Text Mining technique pictures keywords of the user's interest in a cloud form. Users can add or delete words, and change the cloud shape and the text font based on the conditions they defined.

Matching algorithm

The purpose of the Matching algorithm is to find a stable matching based on the preference of members between two groups. B-Box uses the Gale-Shapley algorithm to produce results satisfactory to all members of each group. Two results are given by the selection order exercised by either group.