Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 80 additions & 29 deletions .github/workflows/docker-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,25 @@

name: Docker release - tika-server and tika-grpc

# Auto-trigger on tag push is disabled (TIKA-4725). The official tika-docker
# images on Docker Hub (apache/tika) are published from the apache/tika-docker
# repository using its own Dockerfiles and tagging conventions. When this
# workflow ran on the 4.0.0-alpha-1 source tag it pushed an image built from
# the stale Dockerfiles under tika-server/docker-build/ to
# apache/tika:4.0.0-alpha-1, which collided with the tika-docker-managed tag
# and ran with the pre-4.x bare-jar entrypoint (broken plugin loading). Re-enable
# only after the in-repo Dockerfiles are kept in sync with (or replaced by a
# pointer to) apache/tika-docker.
on:
push:
tags:
- '[0-9]+.[0-9]+.[0-9]+*'
workflow_dispatch:
inputs:
tag:
description: 'Tika release tag (e.g. 4.0.0-alpha-1). Must already exist as a git tag.'
required: true
build_number:
description: 'Docker build number for this Tika tag (1 for first build, increment on rebuilds).'
required: true
default: '1'

jobs:
release-tika-server:
Expand All @@ -29,12 +44,42 @@ jobs:

steps:
- uses: actions/checkout@v6

- name: Extract version from tag
id: version
with:
ref: ${{ inputs.tag }}

# Compute the tag set for each image. Three tags per image at minimum:
# apache/tika:<tag> (mutable; moves on each rebuild)
# apache/tika:<tag>-<build> (immutable; one per rebuild)
# apache/tika:latest (only for non-prerelease tags)
# The grpc image always pushes :latest (no 3.x incumbent to protect).
- name: Compute tags
id: tags
run: |
TAG_NAME="${GITHUB_REF#refs/tags/}"
echo "tag=${TAG_NAME}" >> "$GITHUB_OUTPUT"
tag='${{ inputs.tag }}'
build='${{ inputs.build_number }}'
minimal="apache/tika:${tag}
apache/tika:${tag}-${build}"
full="apache/tika:${tag}-full
apache/tika:${tag}-${build}-full"
grpc="apache/tika-grpc:${tag}
apache/tika-grpc:${tag}-${build}
apache/tika-grpc:latest"
case "$tag" in
*-alpha*|*-BETA*|*-RC*)
echo "Prerelease tag $tag — skipping :latest for apache/tika."
;;
*)
minimal="${minimal}
apache/tika:latest"
full="${full}
apache/tika:latest-full"
;;
esac
{
echo "minimal<<EOF"; echo "$minimal"; echo "EOF"
echo "full<<EOF"; echo "$full"; echo "EOF"
echo "grpc<<EOF"; echo "$grpc"; echo "EOF"
} >> "$GITHUB_OUTPUT"

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
Expand All @@ -52,38 +97,30 @@ jobs:
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6.19.2
with:
file: tika-server/docker-build/minimal/Dockerfile
platforms: linux/amd64,linux/arm64,linux/arm/v7,linux/s390x
platforms: linux/amd64,linux/arm64,linux/s390x
push: true
build-args: |
TIKA_VERSION=${{ steps.version.outputs.tag }}
tags: |
apache/tika:${{ steps.version.outputs.tag }}
apache/tika:latest
TIKA_VERSION=${{ inputs.tag }}
tags: ${{ steps.tags.outputs.minimal }}

- name: Build and push tika-server full
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6.19.2
with:
file: tika-server/docker-build/full/Dockerfile
platforms: linux/amd64,linux/arm64,linux/arm/v7,linux/s390x
platforms: linux/amd64,linux/arm64,linux/s390x
push: true
build-args: |
TIKA_VERSION=${{ steps.version.outputs.tag }}
tags: |
apache/tika:${{ steps.version.outputs.tag }}-full
apache/tika:latest-full
TIKA_VERSION=${{ inputs.tag }}
tags: ${{ steps.tags.outputs.full }}

release-tika-grpc:
runs-on: ubuntu-latest
timeout-minutes: 120

steps:
- uses: actions/checkout@v6

- name: Extract version from tag
id: version
run: |
TAG_NAME="${GITHUB_REF#refs/tags/}"
echo "tag=${TAG_NAME}" >> "$GITHUB_OUTPUT"
with:
ref: ${{ inputs.tag }}

- name: Set up JDK 17
uses: actions/setup-java@v5
Expand All @@ -107,9 +144,22 @@ jobs:
username: ${{ secrets.DOCKERHUB_USER }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Compute grpc tags
id: grpc_tags
run: |
tag='${{ inputs.tag }}'
build='${{ inputs.build_number }}'
{
echo "tags<<EOF"
echo "apache/tika-grpc:${tag}"
echo "apache/tika-grpc:${tag}-${build}"
echo "apache/tika-grpc:latest"
echo "EOF"
} >> "$GITHUB_OUTPUT"

- name: Prepare tika-grpc Docker build context
run: |
TIKA_VERSION="${{ steps.version.outputs.tag }}"
TIKA_VERSION='${{ inputs.tag }}'
OUT_DIR=target/tika-grpc-docker

mkdir -p "${OUT_DIR}/libs/tika-grpc" "${OUT_DIR}/plugins" "${OUT_DIR}/config" "${OUT_DIR}/bin"
Expand Down Expand Up @@ -151,7 +201,8 @@ jobs:
platforms: linux/amd64,linux/arm64
push: true
build-args: |
VERSION=${{ steps.version.outputs.tag }}
tags: |
apache/tika-grpc:${{ steps.version.outputs.tag }}
apache/tika-grpc:latest
VERSION=${{ inputs.tag }}
# apache/tika-grpc is new in 4.x with no prior `:latest` to protect, so
# we track latest from the start (unlike apache/tika the server image,
# whose :latest stays on 3.x until 4.0.0 GA).
tags: ${{ steps.grpc_tags.outputs.tags }}
4 changes: 2 additions & 2 deletions .github/workflows/docker-snapshot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ jobs:
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6.19.2
with:
context: target/tika-server-minimal-docker
platforms: linux/amd64,linux/arm64,linux/arm/v7,linux/s390x
platforms: linux/amd64,linux/arm64,linux/s390x
push: true
build-args: |
TIKA_VERSION=${{ steps.version.outputs.tika_version }}
Expand Down Expand Up @@ -157,7 +157,7 @@ jobs:
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6.19.2
with:
context: target/tika-server-full-docker
platforms: linux/amd64,linux/arm64,linux/arm/v7,linux/s390x
platforms: linux/amd64,linux/arm64,linux/s390x
push: true
build-args: |
TIKA_VERSION=${{ steps.version.outputs.tika_version }}
Expand Down
1 change: 1 addition & 0 deletions docs/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
** xref:advanced/spooling.adoc[Spooling]
** xref:advanced/embedded-documents.adoc[Embedded Document Metadata]
** xref:advanced/local-vlm-server.adoc[Running a Local VLM Server]
** xref:advanced/integration-testing/run-uat-script.adoc[Tika-Server REST UAT Script]
* xref:developers/index.adoc[Developers]
** xref:developers/serialization.adoc[Serialization and Configuration]
* xref:faq.adoc[FAQ]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
//
// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//

= Tika-Server REST UAT Script

A portable shell script that exercises the tika-server REST surface against an
already-running server. The same script is used as the docker image smoke
test, the e2e integration test, and as part of the source-release
verification.

== Where it lives

[source]
----
release-tools/uat/
├── run-uat.sh # the script
└── test-files/
├── testPDF.pdf
├── testHTML.html
└── test_recursive_embedded.docx
----

== What it covers

Roughly 25 REST endpoint checks across the default-mode endpoints, header
behavior, and error handling — the same surface enumerated in the manual
walkthrough at xref:advanced/integration-testing/tika-server.adoc[Tika-Server
Integration Testing], translated to bash + curl assertions.

Coverage includes:

* `/version`, `/parsers`, `/detectors`, `/mime-types` (introspection)
* `/detect/stream` (mime detection)
* `/tika`, `/tika/text`, `/tika/xml`, `/tika/json` (parse)
* `/meta`, `/meta/{field}` (metadata)
* `/rmeta`, `/rmeta/text` (recursive metadata)
* `/unpack/all` (embedded extraction; verifies the response is a valid zip)
* `/language/stream`
* `/meta/form`, `/rmeta/form` (multipart variants)
* `enableUnsecureFeatures=false` gating: `/meta/config`, `/rmeta/config`,
`/tika/config` all return 403
* `X-Tika-OCRskipOcr` header, `Content-Disposition` filename
* 404 / 405 error handling

Two checks (T18d, T27) are currently disabled with inline comments pointing
at tika-core behavior anomalies that need fixing — re-enable them when those
land.

== Running it

The script takes a URL pointing at a running tika-server. It does *not* start
or stop the server itself.

[source,bash]
----
release-tools/uat/run-uat.sh [host]
# default host: http://localhost:9998
----

Exit code: `0` on all-pass, `1` on any failure. Failed checks print the
expected pattern and a truncated response body.

=== Against the unpacked bin.zip distribution

[source,bash]
----
unzip tika-server-standard-<VERSION>-bin.zip -d /tmp/tika-server-dist
cd /tmp/tika-server-dist
java -jar tika-server.jar -p 9998 -h localhost &
sleep 12
~/path/to/tika/release-tools/uat/run-uat.sh
----

=== Against the Docker image

The `docker-tool.sh test-uat` subcommand wraps starting the container, waiting
for `/version`, running the UAT, and stopping the container:

[source,bash]
----
cd tika-server/docker-build
./docker-tool.sh test-uat <DOCKER_VERSION>
----

=== As part of the e2e tests (CI)

The Maven module `tika-e2e-tests/tika-server` unpacks the bin.zip, forks
`java -jar tika-server.jar`, and invokes this script via
`org.apache.tika.server.e2e.RunUatSmokeTest`. The CI workflow
`.github/workflows/main-jdk17-build.yml` runs this automatically on every PR
via `mvn -pl tika-e2e-tests -am clean verify -Pe2e`.

== When to use it

* *Pre-vote release verification.* Unpack
`tika-server-standard-<VERSION>-bin.zip` from `dist/dev` and run the UAT
against it. Catches packaging regressions before the vote thread starts.
* *Pre-publish docker verification.* Run via `docker-tool.sh test-uat` after
building a new image and before tagging it for release.
* *Local development sanity check.* When changing anything in
`tika-server-core` or the bin.zip assembly descriptor, run the UAT against
the build output to confirm you didn't regress endpoint behavior.
* *Adding new endpoints.* When a new REST endpoint lands, add a corresponding
check to the script so future regressions get caught.

== Platform notes

The script is bash + curl + unzip. It's skipped automatically on Windows by
the e2e test (no bash). On Linux/macOS it runs as-is. No external dependencies
beyond the standard tooling.
Loading
Loading