Skip to content

Commit 3ef1195

Browse files
committed
Tutorial5 - Strings, Runes, and UTF-8 Encoding
1 parent 969a38d commit 3ef1195

3 files changed

Lines changed: 149 additions & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,3 +46,4 @@ go run tutorial1/main.go
4646
- [x] [Constants Variables and Basic Data Types](tutorial2)
4747
- [x] [Functions and Control Structures](tutorial3)
4848
- [x] [Arrays, Slices, Maps and Loops](tutorial4)
49+
- [x] [Strings, Runes, and UTF-8 Encoding](tutorial4)

tutorial5/README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Strings, Runes, and UTF-8 Encoding in Go
2+
3+
This README will guide you through some basic concepts in Go, including strings, runes, and UTF-8 encoding. The code we'll be discussing demonstrates these concepts in a simple and understandable way.
4+
5+
## Strings and UTF-8 Encoding
6+
7+
In Go, a string is a sequence of bytes. This means when you create a string in Go, it's encoded in UTF-8 by default. UTF-8 is a variable-width character encoding that can represent every character in the Unicode standard, yet the initial encoding of byte sequences is ASCII.
8+
9+
Let's take a look at this piece of code:
10+
11+
```go
12+
var myString = "résumé"
13+
var indexed = myString[1]
14+
fmt.Println(indexed)
15+
fmt.Printf("%v %T \n", indexed, indexed)
16+
```
17+
18+
Here, `myString[1]` doesn't give you the second character 'é', but instead it gives you a number which is the Unicode of the second byte of 'é'. This is because 'é' is a non-ASCII character represented by two bytes in UTF-8.
19+
20+
## Runes
21+
22+
A rune in Go is a type that represents a Unicode CodePoint. It does not matter if the character is ASCII or not, it can be represented as a rune. A rune literal is just a number. We usually use it to represent a character's unicode code point.
23+
24+
Let's modify the code to use runes:
25+
26+
```go
27+
var myString2 = []rune("résumé")
28+
var indexed2 = myString2[1]
29+
fmt.Println(indexed2)
30+
fmt.Printf("%v %T \n", indexed2, indexed2)
31+
```
32+
33+
Now `myString2[1]` gives you the Unicode of 'é', because we're treating the string as a sequence of runes instead of a sequence of bytes.
34+
35+
## String Concatenation
36+
37+
In Go, you can concatenate strings using the `+` operator:
38+
39+
```go
40+
var strSlice = []string{"s", "h", "i", "k", "h", "a"}
41+
var catStr = ""
42+
for i := range strSlice{
43+
catStr += strSlice[i]
44+
}
45+
fmt.Printf("\nString building: %v",catStr)
46+
```
47+
48+
However, since strings are immutable in Go, string concatenation using `+` creates a new string, which can be inefficient when concatenating a large number of strings. A more efficient way is to use the `strings.Builder`:
49+
50+
```go
51+
var strBuilder strings.Builder
52+
for i := range strSlice{
53+
strBuilder.WriteString(strSlice[i])
54+
}
55+
var catStr1 = strBuilder.String()
56+
fmt.Printf("\nString building through built in package: %v",catStr1)
57+
```
58+
59+
### Checkout the code
60+
61+
- [main.go](main.go)

tutorial5/main.go

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
package main
2+
3+
import (
4+
"fmt"
5+
"strings"
6+
)
7+
8+
func main() {
9+
var myString = "résumé"
10+
var indexed = myString[1]
11+
fmt.Println(indexed) //if we run this we get a number
12+
13+
//let's print out the type of the index value
14+
fmt.Printf("%v %T \n", indexed, indexed) //even wierder we get unsigned int8
15+
16+
for i,v := range myString{
17+
fmt.Println(i,v)
18+
} // we get index start with 0 and bunch of numbers with it's value which is bytes due to utf-8
19+
20+
fmt.Printf("\n The length of 'myString' is %v", len(myString))
21+
22+
// easier way to deal with the strings for indexing and iterating is to cast them in array of runes
23+
24+
fmt.Println("\n----------After rune------------")
25+
var myString2 = []rune("résumé")
26+
var indexed2 = myString2[1]
27+
fmt.Println(indexed2) //if we run this we get a number
28+
29+
//let's print out the type of the index value
30+
fmt.Printf("%v %T \n", indexed2, indexed2) //now we get int32
31+
32+
for i,v := range myString2{
33+
fmt.Println(i,v)
34+
} // we get index start with 0 and bunch of numbers with it's value which is bytes due to utf-8
35+
36+
fmt.Printf("\n The length of 'myString2' is %v", len(myString2))
37+
38+
// we can concantenate string using "+" symbol
39+
40+
var strSlice = []string{"s", "h", "i", "k", "h", "a"}
41+
var catStr = ""
42+
for i := range strSlice{
43+
catStr += strSlice[i]
44+
}
45+
fmt.Printf("\nString building: %v",catStr)
46+
47+
// strings are immutable in GO we are creating a new string catStr
48+
// we can use built in package for string building
49+
50+
var strBuilder strings.Builder
51+
for i := range strSlice{
52+
strBuilder.WriteString(strSlice[i])
53+
}
54+
var catStr1 = strBuilder.String()
55+
fmt.Printf("\nString building through built in package: %v",catStr1)
56+
57+
}
58+
59+
/*
60+
>>>>>>>>>>>>>>>>>>>>>>>>>>>> go run tutorial5/main.go
61+
62+
O/P :::::::::::::::::::::
63+
64+
195
65+
195 uint8
66+
0 114
67+
1 233
68+
3 115
69+
4 117
70+
5 109
71+
6 233
72+
73+
The length of 'myString' is 8
74+
----------After rune------------
75+
233
76+
233 int32
77+
0 114
78+
1 233
79+
2 115
80+
3 117
81+
4 109
82+
5 233
83+
84+
The length of 'myString2' is 6
85+
String building: shikha
86+
String building through built in package: shikha
87+
*/

0 commit comments

Comments
 (0)