Skip to content

AMFOrA: A digital toolkit for archaeological ceramic analysis #307

@aleciaco

Description

@aleciaco

Submitting Author: Name (@aleciaco)
Package Name: AMFOrA_public
One-Line Description of Package: I have been developing an open source toolkit for field surveyors to extract macroscopic data from archaeological ceramic sherds since 2018. I am hoping to be able to publish in JOSS, but their updated terms for submission require that the GitHub has been active for longer than 6 months, and I am hoping that by reviewing through PyOpenSci it can be fast-tracked for submission.
Repository Link (if existing): https://github.com/aleciaco/AMFOrA_public
EiC: TBD


Code of Conduct & Commitment to Maintain Package

Description

  • AMFOrA is an open-source Python toolkit that turns flatbed scans of ceramic cross-sections into quantitative archaeological data through a fully automated batch-processing pipeline. A single call to its full_analysis() function extracts up to 92 variables per sherd, covering sherd geometry, dual-method inclusion and void detection (blob and contour), size distributions, per-inclusion angularity and morphological classification (deliberate temper, natural inclusion, or weathered), sherd-corrected fabric orientation with circular statistics for inferring forming technique, paste and inclusion color in CIE Lab* space, and three-zone core-periphery firing atmosphere profiles. Beyond the high-level pipeline, individual functions can be called independently with full parameter customization, and a built-in CeramicStatisticalAnalyzer class provides PCA, hierarchical/k-means/DBSCAN clustering, and correlation analysis, while the companion CeramicVisualization class generates Plotly-based dendrograms, biplots, scree plots, heatmaps, and dashboards — all running in under three seconds per sherd on a sub-$400 hardware setup.

Community Partnerships

We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:

Scope

  • Please indicate which category or categories this package falls under:

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific

  • Geospatial
  • Education

  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). For community partnerships, check also their specific guidelines as documented in the links above. Please note any areas you are unsure of:

  • Data extraction: AMFOrA pulls quantitative measurements out of raw flatbed scans by applying computer vision algorithms (foreground segmentation, blob detection, contour detection, distance transforms, and k-means color clustering) to isolate sherds from their backgrounds and identify inclusions, voids, color zones, and orientations as structured numerical features.

  • Data processing (munging): The toolkit converts raw detections into analysis-ready variables by correcting inclusion angles for arbitrary scan rotation, transforming circular orientation data into PCA-compatible linear metrics via sine/cosine projections and Von Mises statistics, and aggregating per-feature measurements into 92 sherd-level columns exported as a tidy pandas DataFrame/CSV.

  • Data validation/testing: AMFOrA validates its own defaults through systematic parameter sweeps that produce cumulative detection curves across 16 diverse fabrics, and validates its output through dual-method convergence checks (blob vs. contour correlations) and hierarchical cluster analysis benchmarked against expert archaeological judgment using cophenetic correlation and silhouette scores.

  • Data visualization: Through its CeramicVisualization class, AMFOrA generates interactive Plotly-based outputs including PCA biplots and 3D scatter plots, scree plots, dendrograms, correlation heatmaps, cluster comparison box plots, and an archaeological summary dashboard, alongside annotated images of detection results for individual sherds.

  • Who is the target audience and what are the scientific applications of this package?

  • The target audience for this package are archaeologists working on surveys where large numbers of non-diagnostic fragments are often underused for the amount of potential data they have; often they are just counted and weighed when they could be used for much more.

  • Are there other Python packages that accomplish similar things? If so, how does yours differ?

  • Not to my knowledge

  • Any other questions or issues we should be aware of:

P.S. Have feedback/comments about our review process? Leave a comment on our GitHub Discussions

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    pre-submission

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions