Helsinki, Finland
annie@anniepham.fi

Descriptive statistics using R

This project was done with a dataset of players of Sngame Company. The project aimed to predict the time spending game and the game purchase made by these players. I used the regression model, precisely with 2 linear models

#verify data set and variables
names(sngame)
str(sngame)

#Summary of the data set
summary(sngame)

#Describe data set to get value number of valid case, missing value, standard deviation, median absolute deviation (mad), mean, median, min, max, range and standard error
library(Hmisc)
describe(sngame)

  • The dataset has 300 observations in total, with 154 female players, and 146 male players. The age is range from 20-39 years old.
  • The mean age: 26.79 years old.
  • There are 4 players who have elementary school education, 100 players in high school, and 196 players at the university level.
  • The minimum salary of the players is 1012, the maximum is 4986, the salary mean is 2973.
  • The mean of the number of people has social network site 2670, and the median is 2592 (sn.conn).
  • The mean of the number of minutes that people spend on the social network is 159.54 (sn.min). But the mean of the number of minutes that players spend on playing games 30.921 (game.min)
  • The mean of the amount of money the players spend on purchasing on the features, upgrades, and power-ups, etc. in the games is 18.97 (game.purchase)

 

gender             age                       edu                                    salary                      sn.conn

Female:154   Min.   :20.00    Elementary School:  4   Min.   :1012            Min.   :1800

Male  :146    1st Qu.:22.00     High School      :100       1st Qu.:1866          1st Qu.:2208

Median :26.00   University       :196          Median :3028       Median :2592

Mean   :26.79                                                  Mean   :2973         Mean   :2670

3rd Qu.:31.00                                                3rd Qu.:3864         3rd Qu.:3032

Max.   :39.00                                                 Max.   :4986           Max.   :4028

 

sn.min                  game.min               game.purchase

Min.   : 66.98     Min.   : 8.092         Min.   : 1.00

1st Qu.:128.46    1st Qu.:20.857       1st Qu.: 7.00

Median :150.90  Median :29.669   Median :14.00

Mean   :159.54   Mean   :30.921       Mean   :18.97

3rd Qu.:184.45   3rd Qu.:38.689     3rd Qu.:25.00

Max.   :296.08   Max.   :81.466         Max.   :88.00

#create the data frame containing only numerical variables: age, salary, game.min, game.purchase
sngame_num <- sngame[c(2,4,7,8)]

#Calculates the correlation for all the pairs of variables
cor(sngame_num)

#Testing the correlation of all numerical variables and proability values of this correlation, particularly to see p-value of two pairs (game.min& game.purchase and game.min&age)which have high correlation as we found earlier
library(psych)
corr.test(sngame_num, use=”complete”)
cor.test(game.min,game.purchase)
cor.test(age,game.min)
cor.test(salary,game.purchase)

This plot shows a positive association between time spent on game and age of players. It means that the older players are, the more time they spend on playing the game

This plot shows a positive correlation of the salary and game purchase. When a player earns the higher salary, they will also purchase more upgrades and extra features in the game.

The plot shows the positive correlation between game.purchase and game.min variables. The player who spends longer time on the social network sites to play the game will spend more money for game purchasing.

The correlation matrix between the variables is as follows:

age            1.0000000 -0.035873701 0.510517894     0.3063233

salary        -0.0358737  1.000000000 0.007457177     0.5062324

game.min       0.5105179  0.007457177 1.000000000     0.7839289

game.purchase  0.3063233  0.506232439 0.783928866     1.0000000

#t-test to understand if there is difference between two groups
t.test(game.min~gender)
t.test(game.purchase~gender)

We conducted a t-test to check whether the mean of game.min differs among male and females. The t-test showed no significant difference between male and female (p=0.85).

We also use a t-test to check whether the mean of game. purchase differs among male and females. The t-test showed no significant difference between male and female (p=0.64)

Therefore, we can conclude that the gender of player has no affection for their time and money spending on the game.

#Graphical analysis of age, salary, game.min and game.purchase to identify the peak value of each variable
hist(age)
hist(salary)
plot(density(game.min))
plot(density(game.purchase))

In the dataset of Sngame Company, the players are mainly people at the age of 20s.

Based on salary, there are two groups of correspondent that has the highest number of players. They are players who earn within a range of 1500-2000 euro and another group earns within 3500-4000 euro.

According to the diagram, the group of players who spend average 20 to 40 minutes for playing the game is the most common group. 

The diagram shows that the players tend to spends around 15-20 euro at most for upgrading game and extra features.

 

 

3 Responses

  1. Rose says:

    Your style is very unique compared to other folks I’ve read stuff from.
    I appreciate you for posting when you have the opportunity, Guess I’ll just bookmark
    this site.

  2. John Darer says:

    Hell᧐, i read your blօg from time to time and i own a similaг one and i was just
    wondering if yoᥙ get a lot of spam comments? If so һow do you protect against it, any plugin or
    anything you can suggest? I get so mucһ lateⅼy it’s drivіng
    mе crazy so any aѕsiѕtance is very much appreciated.

  3. oprolevorter says:

    Hi there, You’ve done an excellent job. I will definitely digg it and personally recommend to my friends. I am confident they will be benefited from this website.

Leave a Reply

Your email address will not be published.