Skip to content

Commit 662428d

Browse files
committed
Better describe what up here with Import and Export
1 parent f6d36b8 commit 662428d

5 files changed

Lines changed: 107 additions & 20 deletions

File tree

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,40 @@
1-
"""
2-
Import/Export File Formats
1+
r"""Import/Export File Formats, Importers and Exporters
2+
3+
The data of files on a filesystem or retrieved from the Internet often are structured \
4+
according to a specific structures and rules. For example, consider different kinds of \
5+
structuring used in a JSON file, versus an HTML files, or a compressed GZIP file.
6+
7+
In some cases, such as archive files, e.g., ZIP, TAR, and JAR, the file contains component parts, \
8+
which in WMA terminology are called "members" which is part of the broader metadata items \
9+
called "elements".
310
4-
There various file formats can be used by 'Import' and 'Export' and related functions, \
5-
e.g. 'ImportString'.
11+
A MIME type is typically associated with each kind of format. \Mathics3, following WMA, \
12+
uses a shortend name for this MIME type. For example \Mathics3 uses "HTML" as a shorthand \
13+
for the MIME type "text/html".
14+
15+
Below is a list of file supported file types that we have builtin importers or exporters written \
16+
in Python. Other importers, however, are written in \Mathics3.
17+
18+
Variable <url>
19+
:\$ExportFormats:
20+
/doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/\$exportformats</url> \
21+
contains a list of file formats that are supported by <url>
22+
:Export:
23+
/doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/export</url>, \
24+
while <url>
25+
:\$ImportFormats:
26+
/doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/\$importformats</url> \
27+
does the corresponding thing for <url>
28+
:Import:
29+
/doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/import</url>.
630
731
Many Import/Export functions are registered in SystemFiles/Formats/*.wl which is \
832
autoloaded on startup.
933
1034
The Built-in Functions are defined in a separate context.
1135
For example, HTML` or Compress`. This is done to not pollute the System` namespace.
1236
"""
37+
38+
# This tells documentation how to sort this module
39+
# Here we are also hiding "file_io" since this can erroneously appear at the top level.
40+
sort_order = "mathics.builtin.importing-export-file-formats"

mathics/builtin/import_export/importexport.py

Lines changed: 42 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,33 @@
11
# -*- coding: utf-8 -*-
22

3-
"""
3+
r"""
44
Import and Export Functions and Variables
55
6+
Many kinds data formats can be read into or written from \Mathics3.
7+
8+
In contrast to reading or writing a file, <i>importing</i> and <i>exporting</i> imply some sort of \
9+
data restructuring into \Mathics3 and structuring into a filesystem that is not \
10+
just a stream of bytes, but instead also contains additional metadata and requires data reorganization \
11+
when stored in a filesystem.
12+
13+
See <url>
14+
:Import/Export File Formats:
15+
/doc/reference-of-built-in-symbols/fileformats/</url> for documentation \
16+
on the specific kinds of File Formats \Mathics3 supports.
17+
18+
19+
Variable <url>
20+
:\$ExportFormats:
21+
/doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/\$exportformats</url> \
22+
contains a list of file formats that are supported by <url>
23+
:Export:
24+
/doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/export</url>, \
25+
while <url>
26+
:\$ImportFormats:
27+
/doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/\$importformats</url> \
28+
does the corresponding thing for <url>
29+
:Import:
30+
/doc/reference-of-built-in-symbols/inputoutput-files-and-filesystem/importing-and-exporting/import</url>.
631
"""
732

833
import base64
@@ -434,7 +459,13 @@ class Import(Builtin):
434459
"$OptionSyntax": "System`Ignore",
435460
}
436461

437-
summary_text = "import elements from a file"
462+
rules = {
463+
"Import[filename_]": "Import[filename, {}]",
464+
}
465+
466+
summary_text = (
467+
r"read and convert to \Mathics3 some or all elements of structured file"
468+
)
438469

439470
def eval_elements_query(self, source, evaluation, options={}):
440471
"""Import[source_String, "Elements", OptionsPattern[]]"""
@@ -537,7 +568,9 @@ class ImportString(Builtin):
537568
"$OptionSyntax": "System`Ignore",
538569
}
539570

540-
summary_text = "import data or elements of data from a string"
571+
summary_text = (
572+
r"read and convert to \Mathics3 some or all elements of structured string"
573+
)
541574

542575
def eval_data_only(self, data, evaluation, options={}):
543576
"ImportString[data_, OptionsPattern[]]"
@@ -618,7 +651,9 @@ class Export(Builtin):
618651
"$OptionSyntax": "System`Ignore",
619652
}
620653

621-
summary_text = "export elements to a file"
654+
summary_text = (
655+
r"write and convert to \Mathics3 some or all elements of structured file"
656+
)
622657

623658
def eval(self, dest, expr, evaluation, options={}):
624659
"Export[dest_, expr_, OptionsPattern[Export]]"
@@ -767,7 +802,9 @@ class ExportString(Builtin):
767802
rules = {
768803
"ExportString[expr_, elems_?NotListQ]": ("ExportString[expr, {elems}]"),
769804
}
770-
summary_text = "export elements to a string"
805+
summary_text = (
806+
r"write and convert to \Mathics3 some or all elements of structured string"
807+
)
771808

772809
def eval_element(self, expr, element: String, evaluation: Evaluation, **options):
773810
"ExportString[expr_, element_String, OptionsPattern[ExportString]]"

mathics/eval/fileformats/__init__.py

Whitespace-only changes.

mathics/eval/fileformats/compression.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
"""
2+
Evaluation routines for handling data in some sort of archive format,
3+
e.g. ZIP, TAR, etc.
4+
"""
5+
16
import zipfile
27
from typing import Optional
38

@@ -22,7 +27,7 @@ def eval_ImportZIP(
2227
"""If `members` is empty, this function takes a ZIP file path and returns a
2328
list of file names/paths contained inside.
2429
25-
"If `members` is given, then extract those members from the ZIP file.
30+
"If `members` is given, then extract those members (or files) from the ZIP file.
2631
"""
2732

2833
zip_path, is_temporary_file = resolve_file(zip_name, "r", evaluation)
@@ -36,6 +41,10 @@ def eval_ImportZIP(
3641
if members is None:
3742
filenames = archive.namelist()
3843
mathics_filenames = to_mathics_list(*filenames)
44+
45+
# Wrap metadata or "elements" of of the zip file into
46+
# list of Rule. The caller can then use
47+
# rules to pick out specific elements desired.
3948
exprs = [
4049
Expression(
4150
SymbolRule,

mathics/eval/import_export/importexport.py

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
"""
22
Functions for figuring out a filetype or MIME type a given
33
file path.
4+
5+
Following WMA, we use WMA's custom short name for a mime type.
46
"""
57

68
import mimetypes
@@ -37,6 +39,7 @@
3739
# convert these mismatches
3840
MIME_SHORTNAME_TO_WMA: Final[Dict[str, str]] = {"JPG": "JPEG", "TXT": "Text"}
3941

42+
# FIXME: elements of the below dict should be a dataclass.
4043
IMPORTERS = {}
4144

4245
# TODO: This hard-coded dictionary should be
@@ -254,7 +257,7 @@ def eval_Import_general(
254257

255258
elements = [el.value for el in elements]
256259

257-
# Determine file format
260+
# Determine WMA version of the mime type.
258261
file_format = None
259262
for el in elements.copy():
260263
if el.upper() in IMPORTERS.keys():
@@ -270,7 +273,8 @@ def eval_Import_general(
270273
evaluation.predetermined_out = current_predetermined_out
271274
return SymbolFailed
272275

273-
# Load the importer
276+
# Extract information about the loader used for this MIME type.
277+
# FIXME: turn into dataclass
274278
conditionals, import_function_symbol, posts, importer_options = IMPORTERS[
275279
file_format
276280
]
@@ -419,11 +423,11 @@ def eval_Import_general(
419423

420424

421425
def eval_Import_Elements(file_format: str, evaluation):
422-
"""
423-
Basic implementation behind Import[fileformat, Elements].
426+
"""Basic implementation behind Import[fileformat, Elements].
427+
424428
This returns the element names that can be used for a specific
425-
file_format type. We get this from the AvailableElements field
426-
mentioned when registering an importer.
429+
file_format type. We get this from the
430+
AvailableElements field mentioned when registering an importer.
427431
"""
428432
filetype = MIME_SHORTNAME_TO_WMA.get(file_format, file_format).upper()
429433

@@ -450,14 +454,23 @@ def perform_import(
450454
data: Optional[str],
451455
elements: Optional[list] = None,
452456
):
453-
"""
454-
This routine does the data import.
457+
""" This routine does the import. "import" here means reading a \
458+
file or string which has been structured according to a format belonging to a mime type.
459+
455460
"findfile", if not "None", is the path of a file where the unimported data resides.
456-
If findfile is empty, then "data" will have the string data for that file, and
461+
If "findfile" is empty, then "data" will have the string data for that file, and
457462
this routine will create a temporary file containing the data. The actual importer
458463
then uses this file.
459464
460-
"elements" when given contains the parts or kinds of things that should be extracted.
465+
"elements", when given, contains the parts or kinds of things that should be extracted.
466+
Usually, there are custom routines for retrieving an element.
467+
468+
It is also possible that when a custom element extraction does not
469+
exist, that the caller will do the filtering after retrieving all of the information.
470+
471+
This is not advisable when the information inside an element is small compared
472+
to the information of the entire importable file. For example consider asking
473+
about the member names or contents of tar file compared to the entire tar file.
461474
"""
462475
current_predetermined_out = evaluation.predetermined_out
463476
if function_channels == ListExpression(String("FileNames")):

0 commit comments

Comments
 (0)