Skip to content

(aws-glue-alpha): (struct schema produces unsupported inputStrings) #26935

@nihakue

Description

@nihakue

Describe the bug

Regarding this line: https://github.com/aws/aws-cdk/blame/main/packages/%40aws-cdk/aws-glue-alpha/lib/schema.ts#L209

As far as I can tell, this will happily create invalid inputStrings for nested structs:

const nested = Schema.struct([
  {
    name: "name",
    comment: "The name of the thing",
    type: Schema.STRING
  },
  {
    name: "url",
    type: Schema.STRING
  }
])
{
  name: "some_nested_struct",
  type: nested
}

Will generate the following inputString for the nested struct:

struct<name:string COMMENT 'The name of the thing',url:string>

If you create a Glue table with this in the schema, athena will throw an error whenever you try to query the table:

HIVE_INVALID_METADATA: Glue table 'db.table' column 'some_nested_struct' has invalid data type: struct<name:string COMMENT 'The name of the thing',url:string>
...

From what I can tell, 'COMMENT' is not supported in nested structs. If I try to manually create a schema in a fresh glue table, adding "COMMENT" to the inputString of a nested string causes Glue to treat the type as 'unknown'

For example, before the COMMENT I can inspect the schema and see its type:

{
  "some_nested_struct": {
    "name": "string",
    "url": "string"
  }
}

But if I add the comment and inspect the type of the column I see:

{
  "some_nested_struct": {
    "name": {
      "unknown": "STRUCT <\n  name: STRING COMMENT 'some comment',\n  url: STRING\n>"
    },
    "url": "string"
  }
}

Expected Behavior

Ideally Glue would support nested comments (or at worst ignore them), but the CDK construct should at least not generate input strings that are guaranteed to not work.

Current Behavior

See description

Reproduction Steps

See description

Possible Solution

See expected behavior

Additional Information/Context

No response

CDK CLI Version

2.87.0 (build 9fca790)

Framework Version

No response

Node.js Version

v18.16.0

OS

AL2

Language

Typescript

Language Version

5.1.3

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    @aws-cdk/aws-glueRelated to AWS GluebugThis issue is a bug.effort/mediumMedium work item – several days of effortgood first issueRelated to contributions. See CONTRIBUTING.mdp2

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions