Skip to content

Commit b64b0ba

Browse files
authored
Merge pull request #21 from microsoft/dev/michalmano/lakehouse-explorer-documentation
Sample Lakehouse Explorer documentation
2 parents ad03132 + d6fa307 commit b64b0ba

7 files changed

Lines changed: 132 additions & 0 deletions

LakehouseExplorer.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Lakehouse Explorer Component #
2+
In addition to FluentUI controls, the Sample Extension provides a Sample Lakehouse Explorer component under the "FluentUI Playground" tab.
3+
4+
The Lakehouse Explorer component allows the user to select a Lakehouse from any of the Lakehouses they have access to using the Datahub API. The table data is fetched using the Storage Services API, and the component then displays the tables that are available in the selected Lakehouse. The user can then interact with the component and select their desired table.
5+
6+
### Potential Use Cases ###
7+
The Lakehouse Explorer and the functionality it provides can be used to access and manipulate the contents of a Lakehouse, such as viewing tables, adding tables, editing existing tables, and more.
8+
9+
### UI Experience ###
10+
First, in our sample item, we navigate to the FluentUI Playground tab to see the Lakehouse Explorer Component:
11+
12+
![An empty lakehouse explorer](./photos/lakehouse-explorer-empty.png)
13+
14+
We click on "Add" to choose our Lakehouse, prompting the Onelake Data Hub dialog to appear:
15+
16+
<img width="1000" alt="Onelake Datahub Dialog" src="./photos/datahub-explorer-lakehouse.png">
17+
18+
Here, we see all the Lakehouse artifacts that we have access to. Note, that the Lakehouse can be in a different workspace than your artifact.
19+
20+
After selecting a Lakehouse, the tables are loaded and can be selected in the UI.
21+
22+
<img height="500" alt="Loading Tables" src="./photos/lakehouse-explorer-load-tables.png">
23+
<img height="500" alt="Tables Loaded" src="./photos/lakehouse-explorer-no-selection.png">
24+
<img height="500" alt="Table Selecteds" src="./photos/lakehouse-explorer-table-selected.png">
25+
26+
The UI is updated to reflect that 'my_table' is selected.
27+
28+
## Frontend ##
29+
The implementation of the Lakehouse Explorer can be found here : [Lakehouse Explorer Component](./Frontend/src/components/SampleWorkloadLakehouseExplorer)
30+
31+
First, we'll address the need for two different tree components, one to handle Lakehouses that have a schema and one to handle Lakehouses that don't have a schema.
32+
33+
**What is a schema**? A schema is an upcoming feature of the Lakehouse artifact that allows us to organize tables. New Lakehouses will, in the future, be created with a default schema "dbo". The user will be able to add other schemas and create tables under them.
34+
35+
Old Lakehouses will retain their structure and will not have the schema property.
36+
37+
In the above sample flow, we chose an "old" Lakehouse that doesn't have a schema. For reference, this is the structure of a Lakehouse with the schemas property:
38+
39+
![Schema Table Selected](./photos/lakehouse-explorer-table-selected-with-schema.png)
40+
41+
42+
### Step 1: Open the datahub selector dialog ###
43+
44+
We use the Datahub API in the SampleWorkloadController to open a dialog containing the Onelake data hub with the available Lakehouses. When a Lakehouse is selected we receive a response of the type **DatahubSelectorDialogResult**.
45+
46+
We extract the following information regarding the selected Lakehouse from the response:
47+
48+
```json
49+
{
50+
id: artifactObjectId,
51+
workspaceId: workspaceObjectId,
52+
type: "Lakehouse",
53+
displayName,
54+
description
55+
}
56+
```
57+
58+
This information will serve us in Step 2.
59+
60+
61+
### Step 2: Send a request to the backend to fetch table data from Onelake ###
62+
When a Lakehouse is selected, the following function is called:
63+
64+
```typescript
65+
async function setTables(additionalScopesToConsent: string) {
66+
let accessToken = await callAuthAcquireAccessToken(extensionClient, additionalScopesToConsent);
67+
const tablePath = getTablesInLakehousePath(sampleWorkloadBEUrl, selectedLakehouse.workspaceId, selectedLakehouse.id);
68+
let tables = await getTablesInLakehouse(tablePath, accessToken.token);
69+
setTablesInLakehouse(tables);
70+
setHasSchema(tables[0]?.schema != null);
71+
}
72+
```
73+
This function:
74+
1. Acquires an access token which will be passed in the "Authorization" header to the Backend, which is required for data access.
75+
2. Fetches the tables in the Lakehouse using getTablesInLakehouse.
76+
3. Sets the schema indicator and the tables to be displayed in the component. The schema indicator determines which tree component to render.
77+
78+
Please reference the [Lakehouse Explorer Controller](Frontend/src/controller/LakehouseExplorerController.ts).
79+
80+
## Backend ##
81+
82+
The following Backend code is required in order to communicate with Azure Storage services and retrieve the tables metadata. Overall, the Backend receives a client token from the Frontend, uses this to acquire a token OBO for the necessary scopes, uses the OBO token to send a Get request to OneLake to the directory containing the desired tables, and finally returns a list of table names and their corresponding paths to the Frontend for display in the Lakehouse Explorer.
83+
84+
### Step 1: Acquire Access Token On Behalf Of ###
85+
First, we'll look at the LakehouseController.
86+
87+
The request from the Frontend is handled by **[GetTablesAsync](Backend/src/Controllers/LakeHouseController.cs#L86)**.
88+
89+
This function:
90+
1. Creates an authorization context and gets a token on behalf of the user using the bearer token passed from the Frontend
91+
2. Uses the LakehouseClientService to call **GetOneLakeTables**, which returns a list of type LakehouseTable in the Lakehouse.
92+
93+
### Step 2: Fetch the paths of the desired tables with GetOneLakeTables ###
94+
Now, we'll look at the LakehouseClientService.
95+
96+
**[GetOneLakeTables](Backend/src/Services/LakeHouseClientService.cs#L247)** does the following:
97+
1. Gets a list of paths for objects in the Lakehouse (GetPathList)
98+
2. Filters out the desired paths.
99+
3. Converts each path into an object of type LakehouseTable, which includes the name of the table, the path, and whether or not the table is part of a schema:
100+
101+
```Csharp
102+
{
103+
public class LakehouseTable
104+
{
105+
public string Name { get; init; }
106+
107+
public string Path { get; init; }
108+
109+
public string Schema { get; init; }
110+
}
111+
}
112+
```
113+
114+
In order to get the list of paths, we use the following REST API provided by Storage Services : [Path API](https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/list?view=rest-storageservices-datalakestoragegen2-2019-12-12)
115+
Note that for OneLake calls, the `directory` param is required, and should match the Lakehouse ID.
116+
For more information on OneLake integration with ADLS (Azure Data Lake Storage), see [OneLake Integration](https://learn.microsoft.com/en-us/fabric/onelake/onelake-api-parity#managed-onelake-folders).)
117+
118+
For more Storage Service REST APIs see: [All REST APIs](https://learn.microsoft.com/en-us/rest/api/storageservices/data-lake-storage-gen2?view=rest-storageservices-datalakestoragegen2-2019-12-12)
119+
120+
### Step 3: Return a list of LakehouseTables ###
121+
122+
We passed recursive=true to the function GetPathList. This, due to the fact that there are two different types of Lakehouses, as explained in the Frontend section.
123+
Lakehouses without schemas have the following path structure: ``` <lakehouseId>/Tables/<tableName> ```, so the table names can be found without recursion as direct descendants of the directory.
124+
125+
However, Lakehouses with schemas have the following path struture ```<lakehouseId>/Tables/<schemaName>/<tableName>``` and thus in order to find the table names, we must recursively search *each schema*.
126+
127+
The path list contains several paths per table. We select the paths that end in _delta_log, because Fabric tables are saved as delta tables, and thus each table has a '_delta_log' folder that will be represented by a path. In this way, we ensure we're selecting only paths that correspond to tables.
128+
129+
From the path, we extract the table name and the schema, if available, and return a list of LakehouseTables.
130+
131+
The names of the tables can now be used in conjunction with [Fabric Table REST APIs](https://learn.microsoft.com/en-us/rest/api/fabric/lakehouse/tables) to load the tables themselves.
132+
39.4 KB
Loading
13.3 KB
Loading
11.5 KB
Loading
10.4 KB
Loading
15.2 KB
Loading
11 KB
Loading

0 commit comments

Comments
 (0)