on
A Taste of R
A Taste of R
R is widely accepted as a powerful as well as easy-to-learn language in data analysis, though python has got its own numpy, pandas, sympy, matplotlib, scipy, and more handy modules, R can still serve as a sharp weapon to deal with data.
This passage is mainly a comprehension on P.Haschke’s R-Course. Some documents like official ones are included as well.
Basics
extremely basic operations shall not be included in this text. All comments start with #.
-
To get help, type
?<anything>
orhelp(<anything>)
,?log()
equals but can’t replacehelp(log)
. -
To get functions can be used on anything, use
aporopos("<anything>")
. For example,apropos("mean")
. Looks familiar, python gotdir()
. -
By
install.packages("<packageName>")
better be conducted in the console, you can get libraries you want from the CRAN mirror, to use anything inside your module or session, import the library usinglibrary("<packageName>")
.
Introduction
-
Objects types are: Vectors(containing elements of the same type, one-dimensional), Matrices & Arrays(two or more dimensional, same type), Lists(like vectors but different type allowed), Data Frames(like table mapping, two-dimensional), Factors, Functions(As you know)
-
Modes types are: integer, numeric(real numbers), complex, character(AKA strings), logical(AKA Bool)
-
Assignment:
<variableName> <- <toBeAssigned>
, to get the type, useis(variableName)
, to get and remove the variables, usels()
andrm(<variableToBeRemoved>)
. I have to admit this part is weird, bash and lambda? -
function
c()
is for concatenating elements making vectors. -
some new(to me) functions may be
sd()
(standard deviation,var()
(variance),cov()
(covariance),cor()
(correlation coefficient),unique()
andwhich
is rather tricky.prod()
is awesome. The most weird thing I believe is the use and access of variables like<var>.<v>
and<var>$<v>
. -
seq()
is used for creating defined vectors,rep()
do repeat. Notation usage is similar to python but more powerful, simplyVector[Vector >= 1]
can save you lots of trouble.summary()
shows basic information you need.subset()
works perfect likemap()
function in python. -
print()
,paste()
,cat()
will deal with Stdout,paste()
is for multiple modes,cat()
doesn’t create object in active memory
Matrices
-
source()
import your previous codes.save()
saves your changes. -
Matrices have three main arguments: data(R object), nrow(number of rows), ncol(number of columns). Two more optional arguments are byrow and dimnames.
-
rbind()
&cbind()
can be useful.rownames()
&colnames()
cane be used to define and get names.diag()
, you knew it already. -
Matrices use
[]
for notations, black magic likeMatrix[Matrix[ , 2] > 4, ]
can be astonishing.t()
for transpose,solve()
for inverse,det()
for determinant,chol()
for cholesky decomposition,eigen()
is perfect,crossprod()
for cross product.%*%
for matrix multiplication.
Data Frames
-
data()
shows and load packages; Firstlibrary()
, thendata()
.class()
likeis()
,names()
is self-explanatory. -
read.csv()
for csv,download.file()
for getting files.DataFram$name
is for particular name extraction. -
str()
anddescribe()
for summary.
Graphics
- ggplot2 shall be installed in advance, and to set more, do like this
plot1 <- plot1 + geom_point(aes(x = FEhighway, y = FEcity))
, pretty ugly to me. :( .Other options can be referred at ggplot
Programs
-
if
&ifelse()
, other control flow functions just like C-style in one-line -
Defined functions are javascript-style, consider this
```r MyFunction <- function(Object) { Object + Object } ```