Skip to content

Commit 22f63bd

Browse files
committed
Expand documentation for requireColumn
1 parent d5c55d7 commit 22f63bd

2 files changed

Lines changed: 88 additions & 0 deletions

File tree

  • core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api
  • docs/StardustDocs/topics

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/require.kt

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,61 @@ import org.jetbrains.kotlinx.dataframe.ColumnSelector
44
import org.jetbrains.kotlinx.dataframe.DataFrame
55
import org.jetbrains.kotlinx.dataframe.annotations.Interpretable
66
import org.jetbrains.kotlinx.dataframe.annotations.Refine
7+
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
78
import org.jetbrains.kotlinx.dataframe.impl.api.requireImpl
89
import kotlin.reflect.typeOf
910

1011
/**
1112
* Resolves [column] in this [DataFrame] and checks that its runtime type is a subtype of [C].
1213
* Throws if the column can't be resolved or if its type doesn't match.
14+
*
15+
* From the compiler plugin perspective, a new column will appear in the compile-time schema as a result of this operation.
16+
*
17+
* The aim here is to help incrementally migrate workflows to extension properties API.
18+
*
19+
* We recommend considering declaring a [DataSchema] and use [cast] or [convertTo] if you end up with more than a few `requireColumn` calls.
20+
*
21+
* Example:
22+
*
23+
* ```kotlin
24+
* val repos = DataFrame
25+
* .readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
26+
*
27+
* repos
28+
* .filter { "stargazers_count"<Int>() > 100 }
29+
* .sortByDesc("stargazers_count")
30+
* .select("full_name", "stargazers_count")
31+
* ```
32+
*
33+
* Notice how `stargazers_count` String is repeated three times. We can refactor this code using `requireColumn`:
34+
*
35+
* ```
36+
* val repos = DataFrame
37+
* .readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
38+
* .requireColumn { "stargazers_count"<Int>() }
39+
*
40+
* repos
41+
* .filter { stargazers_count > 100 }
42+
* .sortByDesc { stargazers_count }
43+
* .select { "full_name" and stargazers_count }
44+
* ```
45+
*
46+
* This way code becomes a bit more robust. For example, usages of a renamed column will become compile time errors that are easy to spot and update:
47+
* ```kotlin
48+
* val repos = DataFrame
49+
* .readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
50+
* .requireColumn { "stargazers_count"<Int>() }
51+
* .rename { stargazers_count }.into("stars")
52+
*
53+
* repos
54+
* .filter { stars > 100 }
55+
* .sortByDesc { stars }
56+
* .select { "full_name" and stars }
57+
* ```
58+
*
1359
*/
1460
@Refine
1561
@Interpretable("Require0")
1662
public inline fun <T, reified C> DataFrame<T>.requireColumn(noinline column: ColumnSelector<T, C>): DataFrame<T> =
1763
requireImpl(column, typeOf<C>())
64+

docs/StardustDocs/topics/require.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,44 @@ val df = peopleDf.requireColumn { "name"["firstName"]<String>() }
2323
// Use extension property after `requireColumn`
2424
val v: String = df.name.firstName[0]
2525
```
26+
27+
### Advanced example
28+
29+
Let's start with a pipeline that uses only String Column Accessors and String API overloads:
30+
31+
```kotlin
32+
val repos = DataFrame
33+
.readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
34+
35+
repos
36+
.filter { "stargazers_count"<Int>() > 100 }
37+
.sortByDesc("stargazers_count")
38+
.select("full_name", "stargazers_count")
39+
```
40+
41+
Notice how stargazers_count String is repeated three times. We can refactor this code using `requireColumn`:
42+
43+
```kotlin
44+
val repos = DataFrame
45+
.readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
46+
.requireColumn { "stargazers_count"<Int>() }
47+
48+
repos
49+
.filter { stargazers_count > 100 }
50+
.sortByDesc { stargazers_count }
51+
.select { "full_name" and stargazers_count }
52+
```
53+
54+
This way code becomes a bit more robust. For example, usages of a renamed column will become compile time errors that are easy to spot and update:
55+
56+
```kotlin
57+
val repos = DataFrame
58+
.readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
59+
.requireColumn { "stargazers_count"<Int>() }
60+
.rename { stargazers_count }.into("stars")
61+
62+
repos
63+
.filter { stars > 100 }
64+
.sortByDesc { stars }
65+
.select { "full_name" and stars }
66+
```

0 commit comments

Comments
 (0)