add <- function(x) {
x + 1
}Class06: R Functions
Background
Functions are at the heart of using R. Everything we do involves calling and using functions (from data input, analysis to results output).
All functions in R have at least 3 things:
- A name the thing we use to call the function.
- One or more input arguments that are comma seperated
- The body, lines of code between curly brackets {} that does the work of the function.
A first function
We are going to write a function to add some numbers:
Let’s try it out now
add(100)[1] 101
Will this work?
add(c(100, 200, 300))[1] 101 201 301
Modify the original function to be more useful and add more than just 1
add <- function(x, y=1) {
x + y
}add(100, 10)[1] 110
log(10, base=10)[1] 1
add(100)[1] 101
N.B Input agrguments can be either required or optional. The later have a fall-back default that is specified in the function code with an equals sign.
#add(100, 200, 300)A second function
All functions in R look like thi s
name <- function(arg) {
body
}
The sample() function in R randomly selects a specified number of items from a given set of values, with or without replacement.
Q. Return 12 numbers picked randomly from the input 1:10
sample(1:10, size=12, replace = TRUE) [1] 6 1 10 8 6 4 10 3 8 6 10 8
Q. Write the code to generate a 12 nucleotide long DNA sequence?
sample(c("A", "C", "G", "T"), size=12, replace = TRUE) [1] "A" "C" "A" "C" "C" "C" "T" "A" "T" "C" "T" "A"
Q. Write a first version function called
generate_dna()that generates a user specified lengthnrandom DNA sequence?
generate_dna <- function(n) {
sample(c("A", "C", "G", "T"), size=n, replace = TRUE)
}generate_dna(10) [1] "A" "G" "A" "A" "G" "A" "A" "C" "T" "A"
Q. Modify your function to return a FASTA-like sequence so rather than: [1] “T” “A” “G” “T” “C” “A” “T”, we want “TAGTCAT”.
generate_dna <- function(n) {
bases <- c("A", "C", "G", "T")
ans <- sample(bases, size=n, replace=TRUE)
ans <- paste(ans, collapse= "")
return(ans)
}generate_dna(10)[1] "AAGGAATAAG"
Q. Give the user an option to return FASTA format output sequence or standard multi-element vector format?
generate_dna <- function(n=6, fasta=TRUE) {
bases <- c("A", "C", "G", "T")
ans <- sample(bases, size=n, replace=TRUE)
if(fasta) {
ans <- paste(ans, collapse= "")
cat("Hello there...")
}
else {
cat("General Kenobi")
}
return(ans)
}generate_dna(10)Hello there...
[1] "TAGTCACTAT"
generate_dna(10, fasta= F)General Kenobi
[1] "T" "G" "T" "T" "T" "T" "G" "C" "G" "T"
A new cool function
Q. Write a function called
generate_protein()that generates a user specified length protein sequence in FASTA-like format?
generate_protein <- function(n, fasta = TRUE) {
aa <- c("A","R","N","D","C","E","Q","G","H","I",
"L","K","M","F","P","S","T","W","Y","V")
ans <- sample(aa, size=n, replace=TRUE)
ans <- paste(ans, collapse = "")
return(ans)
}generate_protein(20)[1] "KDTGMTHHTQKIQNTMEAYQ"
Q. Use your new
generate_protein()function to generate sequences between length 6 and 12 amino-acids in length and check if any of these are unique in nature (i.e. found in the NR database at NCBI)?
generate_protein <- function(n, fasta = TRUE) {
aa <- c("A","R","N","D","C","E","Q","G","H","I",
"L","K","M","F","P","S","T","W","Y","V")
ans <- sample(aa, size=n, replace=TRUE)
ans <- paste(ans, collapse = "")
return(ans)
}
for(i in 6:12) {
cat(">", i, sep="", "\n")
cat(generate_protein(i), "\n")
}>6
ASCCIF
>7
HQKYEQW
>8
WFMITPGI
>9
ILHSGVPDF
>10
VEMSHHRPMR
>11
EQKLQYVTCGW
>12
HMADVTTVANKF