Skip to content

Commit f7e7f90

Browse files
committed
tests and video indexer
1 parent 427c332 commit f7e7f90

34 files changed

+2439
-182
lines changed

.github/workflows/ci.yml

Lines changed: 80 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,103 @@
1-
name: CI
1+
name: build
22

33
on:
4-
pull_request:
5-
branches: [ main ]
64
push:
7-
branches: [ main ]
5+
branches:
6+
- main
7+
- 'release/*'
8+
pull_request:
9+
workflow_dispatch:
810

911
env:
10-
DOTNET_VERSION: '9.0.x'
12+
DOTNET_VERSION: 9.0.x
13+
SOLUTION_PATH: MarkItDown.slnx
14+
CLI_PROJECT: src/MarkItDown.Cli/MarkItDown.Cli.csproj
15+
LIB_PROJECT: src/MarkItDown/MarkItDown.csproj
1116

1217
jobs:
13-
build:
14-
name: Build and Test
18+
build_and_test:
1519
runs-on: ubuntu-latest
1620

1721
steps:
1822
- name: Checkout
19-
uses: actions/checkout@v5
23+
uses: actions/checkout@v4
2024

2125
- name: Setup .NET
2226
uses: actions/setup-dotnet@v4
2327
with:
2428
dotnet-version: ${{ env.DOTNET_VERSION }}
2529

26-
- name: Restore dependencies
27-
run: dotnet restore
30+
- name: Restore
31+
run: dotnet restore ${{ env.SOLUTION_PATH }}
2832

2933
- name: Build
30-
run: dotnet build --configuration Release --no-restore
34+
run: dotnet build ${{ env.SOLUTION_PATH }} --configuration Release --no-restore
3135

3236
- name: Test
33-
run: dotnet test --configuration Release --no-build --verbosity normal --collect:"XPlat Code Coverage"
37+
run: dotnet test ${{ env.SOLUTION_PATH }} --configuration Release --no-build --logger trx --results-directory TestResults
38+
39+
- name: Upload test results
40+
if: always()
41+
uses: actions/upload-artifact@v4
42+
with:
43+
name: test-results
44+
path: TestResults
45+
46+
package:
47+
needs: build_and_test
48+
runs-on: ubuntu-latest
49+
50+
steps:
51+
- name: Checkout
52+
uses: actions/checkout@v4
53+
54+
- name: Setup .NET
55+
uses: actions/setup-dotnet@v4
56+
with:
57+
dotnet-version: ${{ env.DOTNET_VERSION }}
58+
59+
- name: Restore
60+
run: dotnet restore ${{ env.SOLUTION_PATH }}
61+
62+
- name: Pack library
63+
run: dotnet pack ${{ env.LIB_PROJECT }} --configuration Release --no-build -o artifacts/nuget
64+
65+
- name: Publish CLI bundles
66+
run: |
67+
set -euo pipefail
68+
ARTIFACT_ROOT=artifacts/cli
69+
mkdir -p "$ARTIFACT_ROOT"
70+
rids=("win-x64" "linux-x64" "osx-arm64")
71+
for rid in "${rids[@]}"; do
72+
outDir="$ARTIFACT_ROOT/$rid"
73+
dotnet publish ${{ env.CLI_PROJECT }} \
74+
-c Release \
75+
-r "$rid" \
76+
--self-contained true \
77+
/p:PublishSingleFile=true \
78+
/p:IncludeNativeLibrariesForSelfExtract=true \
79+
/p:IncludeAllContentForSelfExtract=true \
80+
/p:EnableCompressionInSingleFile=true \
81+
/p:DebugType=none \
82+
-o "$outDir"
83+
84+
if [[ "$rid" == win-* ]]; then
85+
(cd "$outDir" && zip -qr "../markitdown-cli-$rid.zip" .)
86+
else
87+
(cd "$outDir" && tar -czf "../markitdown-cli-$rid.tar.gz" .)
88+
fi
89+
90+
rm -rf "$outDir"
91+
done
92+
93+
- name: Upload NuGet package
94+
uses: actions/upload-artifact@v4
95+
with:
96+
name: nuget-packages
97+
path: artifacts/nuget
3498

35-
- name: Upload coverage reports to Codecov
36-
if: success() && (github.event_name == 'push' || github.event_name == 'pull_request')
37-
uses: codecov/codecov-action@v5
99+
- name: Upload CLI bundles
100+
uses: actions/upload-artifact@v4
38101
with:
39-
token: ${{ secrets.CODECOV_TOKEN }}
40-
files: ./**/coverage.cobertura.xml
41-
fail_ci_if_error: false
102+
name: cli-bundles
103+
path: artifacts/cli

MarkItDown.slnx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@
55
<Platform Name="x86" />
66
</Configurations>
77
<Folder Name="/src/">
8+
<Project Path="src/MarkItDown.Cli/MarkItDown.Cli.csproj" />
89
<Project Path="src\MarkItDown\MarkItDown.csproj" />
910
</Folder>
1011
<Folder Name="/tests/">
12+
<Project Path="tests/MarkItDown.Cli.Tests/MarkItDown.Cli.Tests.csproj" />
1113
<Project Path="tests/MarkItDown.Tests/MarkItDown.Tests.csproj" />
1214
</Folder>
1315
</Solution>

README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -495,6 +495,41 @@ var markItDown = new MarkItDown();
495495
markItDown.RegisterConverter(new MyCustomConverter());
496496
```
497497

498+
### Interactive CLI
499+
500+
Prefer a guided experience? Launch the Spectre.Console powered CLI and drive conversions from a rich terminal UI:
501+
502+
```bash
503+
dotnet run --project src/MarkItDown.Cli
504+
```
505+
506+
The CLI lets you:
507+
508+
- Queue single files, whole directories, or URLs with animated progress reporting and ambient status updates.
509+
- Inspect inputs beforehand with metadata panels, markdown previews, and an extension distribution radar for large batches.
510+
- Configure Azure, Google Cloud, and AWS credentials (API keys, connection strings, or managed identity scenarios) without editing JSON manually.
511+
- Tweak segmentation preferences (inline annotations, audio slice duration) on the fly.
512+
- Review per-file success/failure summaries and launch the output folder directly from the shell.
513+
514+
Settings entered during the session are kept in memory only—ideal for experimenting locally without persisting secrets to disk.
515+
516+
Need a portable binary that runs without the .NET runtime? Publish the CLI as a self-contained single file (artifacts land in `artifacts/cli`):
517+
518+
```bash
519+
dotnet publish src/MarkItDown.Cli/MarkItDown.Cli.csproj \
520+
-c Release \
521+
-r linux-x64 \
522+
--self-contained true \
523+
/p:PublishSingleFile=true \
524+
/p:IncludeNativeLibrariesForSelfExtract=true \
525+
/p:IncludeAllContentForSelfExtract=true \
526+
/p:EnableCompressionInSingleFile=true \
527+
/p:DebugType=none \
528+
-o artifacts/cli/linux-x64
529+
```
530+
531+
Swap `linux-x64` for `win-x64` or `osx-arm64` to target other platforms. The GitHub Actions workflow (`.github/workflows/ci.yml`) already runs these publishes and uploads zipped artifacts on every build.
532+
498533
## 🎯 Advanced Usage Patterns
499534

500535
### Custom Format Converters
@@ -833,15 +868,18 @@ The `AzureIntelligenceOptions`, `GoogleIntelligenceOptions`, and `AwsIntelligenc
833868
Media = new AzureMediaIntelligenceOptions
834869
{
835870
AccountId = configuration["Azure:VideoIndexer:AccountId"],
871+
AccountName = configuration["Azure:VideoIndexer:AccountName"],
836872
Location = configuration["Azure:VideoIndexer:Location"],
837873
SubscriptionId = configuration["Azure:VideoIndexer:SubscriptionId"],
838874
ResourceGroup = configuration["Azure:VideoIndexer:ResourceGroup"],
875+
ResourceId = configuration["Azure:VideoIndexer:ResourceId"],
839876
ArmAccessToken = configuration.GetConnectionString("AzureVideoIndexerArmToken")
840877
}
841878
};
842879
```
843880

844881
- **Managed identity**: omit the `ApiKey`/`ArmAccessToken` properties and the providers automatically fall back to `DefaultAzureCredential`. Assign the managed identity the *Cognitive Services User* role for Document Intelligence and Vision, and follow the [Video Indexer managed identity instructions](https://learn.microsoft.com/azure/azure-video-indexer/video-indexer-use-azure-ad) to authorize uploads.
882+
- **Video Indexer tips**: Video uploads require both the Video Indexer account (ID + region) and either the full resource ID or the trio of subscription id/resource group/account name, plus an ARM token or Azure AD identity with `Contributor` access on the Video Indexer resource. The interactive CLI exposes dedicated prompts for these values under “Configure cloud providers”.
845883

846884
```csharp
847885
var azureOptions = new AzureIntelligenceOptions
@@ -857,6 +895,7 @@ The `AzureIntelligenceOptions`, `GoogleIntelligenceOptions`, and `AwsIntelligenc
857895
Media = new AzureMediaIntelligenceOptions
858896
{
859897
AccountId = "<video-indexer-account-id>",
898+
AccountName = "<video-indexer-account-name>",
860899
Location = "trial"
861900
}
862901
};
@@ -925,6 +964,23 @@ The `AzureIntelligenceOptions`, `GoogleIntelligenceOptions`, and `AwsIntelligenc
925964

926965
- **IAM roles / AWS managed identity**: leave the credential fields null to use the default AWS credential chain (environment variables, shared credentials file, EC2/ECS/EKS IAM roles, or AWS SSO). Ensure the execution role has permissions for `textract:AnalyzeDocument`, `rekognition:DetectLabels`, `rekognition:DetectText`, `transcribe:StartTranscriptionJob`, and S3 access for the specified buckets.
927966

967+
#### YouTube metadata & captions
968+
969+
- **Docs**: [YoutubeExplode](https://github.com/Tyrrrz/YoutubeExplode) (used under the hood).
970+
- **Out of the box**: `YouTubeUrlConverter` now enriches Markdown with title, channel, stats, thumbnails, and (when available) auto-generated captions laid out as timecoded segments.
971+
- **Custom provider**: supply `MarkItDownOptions.YouTubeMetadataProvider` to disable network access, inject caching, or swap to an alternative implementation.
972+
973+
```csharp
974+
var options = new MarkItDownOptions
975+
{
976+
YouTubeMetadataProvider = new YoutubeExplodeMetadataProvider(), // default
977+
// You can plug in a stub or caching decorator instead:
978+
// YouTubeMetadataProvider = new MyCachedYouTubeProvider(inner: new YoutubeExplodeMetadataProvider())
979+
};
980+
```
981+
982+
When a provider returns `null` the converter falls back to URL-derived metadata, so YouTube support remains fully optional.
983+
928984
For LLM-style post-processing, assign `MarkItDownOptions.AiModels` with an `IAiModelProvider`. The built-in `StaticAiModelProvider` accepts `Microsoft.Extensions.AI` clients (chat models, speech-to-text, etc.), enabling you to share application-wide model builders.
929985

930986
### Converter Priority & Detection

src/MarkItDown.Cli/AssemblyInfo.cs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
using System.Runtime.CompilerServices;
2+
3+
[assembly: InternalsVisibleTo("MarkItDown.Cli.Tests")]

0 commit comments

Comments
 (0)