Commit 3497281
Matches prefix to verify presence of DOCX,PPTX,XLSX files instead of standard file names (#3959)
Instead of looking for presence of `word/document.xml` ,
`ppt/presentation.xml` and `xl/workbook.xml` to identify DOCX,PPTX and
XLSX files, we look for prefix `word/document*.xml`,
`ppt/presentation*.xml` and `xl/workbook*.xml` as certain files
generated from office365 has files with different names.
Fixes #3937
---------
Co-authored-by: Yao You <theyaoyou@gmail.com>1 parent 0fa5174 commit 3497281
5 files changed
Lines changed: 23 additions & 4 deletions
File tree
- test_unstructured
- file_utils
- testfiles/file_type
- unstructured
- file_utils
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
1 | 10 | | |
2 | 11 | | |
3 | 12 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| 34 | + | |
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
| |||
987 | 989 | | |
988 | 990 | | |
989 | 991 | | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
Binary file not shown.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
747 | 747 | | |
748 | 748 | | |
749 | 749 | | |
750 | | - | |
| 750 | + | |
751 | 751 | | |
752 | 752 | | |
753 | | - | |
| 753 | + | |
754 | 754 | | |
755 | 755 | | |
756 | | - | |
| 756 | + | |
757 | 757 | | |
758 | 758 | | |
759 | 759 | | |
| |||
0 commit comments