Skip to content

[BUG] PPL multisearch on index with aliased @timestamp field fails with class org.apache.calcite.plan.RelCompositeTrait cannot be cast to class org.apache.calcite.rel.RelCollation #5533

Description

@VladNastase

What is the bug?
When doing a multisearch with two (or more) subsearches, the multisearch fails if the @timestamp field is an alias to another field.

The examples below use the OpenSearch Dashboards sample data (specifically the indexes opensearch_dashboards_sample_data_logs and opensearch_dashboards_sample_data_flights).

The mappings for the `opensearch_dashboards_sample_data_logs` are: (expandable section)
{
  "opensearch_dashboards_sample_data_logs": {
    "aliases": {},
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "alias",
          "path": "timestamp"
        },
        "agent": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "bytes": {
          "type": "long"
        },
        "clientip": {
          "type": "ip"
        },
        "event": {
          "properties": {
            "dataset": {
              "type": "keyword"
            }
          }
        },
        "extension": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "geo": {
          "properties": {
            "coordinates": {
              "type": "geo_point"
            },
            "dest": {
              "type": "keyword"
            },
            "src": {
              "type": "keyword"
            },
            "srcdest": {
              "type": "keyword"
            }
          }
        },
        "host": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "index": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "ip": {
          "type": "ip"
        },
        "machine": {
          "properties": {
            "os": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "ram": {
              "type": "long"
            }
          }
        },
        "memory": {
          "type": "double"
        },
        "message": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "phpmemory": {
          "type": "long"
        },
        "referer": {
          "type": "keyword"
        },
        "request": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "response": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "tags": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "timestamp": {
          "type": "date"
        },
        "url": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "utc_time": {
          "type": "date"
        }
      }
    }
  }
}
The mappings for the `opensearch_dashboards_sample_data_flights` are: (expandable section)
{
  "opensearch_dashboards_sample_data_flights": {
    "aliases": {},
    "mappings": {
      "properties": {
        "AvgTicketPrice": {
          "type": "float"
        },
        "Cancelled": {
          "type": "boolean"
        },
        "Carrier": {
          "type": "keyword"
        },
        "Dest": {
          "type": "keyword"
        },
        "DestAirportID": {
          "type": "keyword"
        },
        "DestCityName": {
          "type": "keyword"
        },
        "DestCountry": {
          "type": "keyword"
        },
        "DestLocation": {
          "type": "geo_point"
        },
        "DestRegion": {
          "type": "keyword"
        },
        "DestWeather": {
          "type": "keyword"
        },
        "DistanceKilometers": {
          "type": "float"
        },
        "DistanceMiles": {
          "type": "float"
        },
        "FlightDelay": {
          "type": "boolean"
        },
        "FlightDelayMin": {
          "type": "integer"
        },
        "FlightDelayType": {
          "type": "keyword"
        },
        "FlightNum": {
          "type": "keyword"
        },
        "FlightTimeHour": {
          "type": "keyword"
        },
        "FlightTimeMin": {
          "type": "float"
        },
        "Origin": {
          "type": "keyword"
        },
        "OriginAirportID": {
          "type": "keyword"
        },
        "OriginCityName": {
          "type": "keyword"
        },
        "OriginCountry": {
          "type": "keyword"
        },
        "OriginLocation": {
          "type": "geo_point"
        },
        "OriginRegion": {
          "type": "keyword"
        },
        "OriginWeather": {
          "type": "keyword"
        },
        "dayOfWeek": {
          "type": "integer"
        },
        "timestamp": {
          "type": "date"
        }
      }
    }
  }
}

The following query:

| multisearch 
    [search source = opensearch_dashboards_sample_data_logs | where geo.src='IT']
    [search source = opensearch_dashboards_sample_data_logs | where geo.src='US']

fails with the following error:

{
  "error": {
    "context": {
      "stage": "executing",
      "plan": "LogicalSystemLimit(sort0=[$20], dir0=[DESC], fetch=[10000], type=[QUERY_SIZE_LIMIT])\
  LogicalProject(referer=[$0], request=[$1], agent=[$2], extension=[$3], memory=[$4], ip=[$5], index=[$6], message=[$7], url=[$8], tags=[$9], geo=[$10], utc_time=[$15], bytes=[$16], machine=[$17], response=[$20], clientip=[$21], host=[$22], event=[$23], phpmemory=[$25], timestamp=[$26], @timestamp=[$27])\
    LogicalSort(sort0=[$27], dir0=[DESC])\
      LogicalUnion(all=[true])\
        LogicalProject(referer=[$0], request=[$1], agent=[$2], extension=[$3], memory=[$4], ip=[$5], index=[$6], message=[$7], url=[$8], tags=[$9], geo=[$10], geo.coordinates=[$11], geo.dest=[$12], geo.src=[$13], geo.srcdest=[$14], utc_time=[$15], bytes=[$16], machine=[$17], machine.os=[$18], machine.ram=[$19], response=[$20], clientip=[$21], host=[$22], event=[$23], event.dataset=[$24], phpmemory=[$25], timestamp=[$26], @timestamp=[$33])\
          LogicalFilter(condition=[=($13, 'IT')])\
            LogicalProject(referer=[$0], request=[$1], agent=[$2], extension=[$3], memory=[$4], ip=[$5], index=[$6], message=[$7], url=[$8], tags=[$9], geo=[$10], geo.coordinates=[$11], geo.dest=[$12], geo.src=[$13], geo.srcdest=[$14], utc_time=[$15], bytes=[$16], machine=[$17], machine.os=[$18], machine.ram=[$19], response=[$20], clientip=[$21], host=[$22], event=[$23], event.dataset=[$24], phpmemory=[$25], timestamp=[$26], _id=[$27], _index=[$28], _score=[$29], _maxscore=[$30], _sort=[$31], _routing=[$32], @timestamp=[$26])\
              CalciteLogicalIndexScan(table=[[OpenSearch, opensearch_dashboards_sample_data_logs]])\
        LogicalProject(referer=[$0], request=[$1], agent=[$2], extension=[$3], memory=[$4], ip=[$5], index=[$6], message=[$7], url=[$8], tags=[$9], geo=[$10], geo.coordinates=[$11], geo.dest=[$12], geo.src=[$13], geo.srcdest=[$14], utc_time=[$15], bytes=[$16], machine=[$17], machine.os=[$18], machine.ram=[$19], response=[$20], clientip=[$21], host=[$22], event=[$23], event.dataset=[$24], phpmemory=[$25], timestamp=[$26], @timestamp=[$33])\
          LogicalFilter(condition=[=($13, 'US')])\
            LogicalProject(referer=[$0], request=[$1], agent=[$2], extension=[$3], memory=[$4], ip=[$5], index=[$6], message=[$7], url=[$8], tags=[$9], geo=[$10], geo.coordinates=[$11], geo.dest=[$12], geo.src=[$13], geo.srcdest=[$14], utc_time=[$15], bytes=[$16], machine=[$17], machine.os=[$18], machine.ram=[$19], response=[$20], clientip=[$21], host=[$22], event=[$23], event.dataset=[$24], phpmemory=[$25], timestamp=[$26], _id=[$27], _index=[$28], _score=[$29], _maxscore=[$30], _sort=[$31], _routing=[$32], @timestamp=[$26])\
              CalciteLogicalIndexScan(table=[[OpenSearch, opensearch_dashboards_sample_data_logs]])",
      "stage_description": "Running the query"
    },
    "reason": "Error while preparing plan [LogicalSystemLimit(sort0=[$20], dir0=[DESC], fetch=[10000], type=[QUERY_SIZE_LIMIT])\
  LogicalProject(referer=[$0], request=[$1], agent=[$2], extension=[$3], memory=[$4], ip=[$5], index=[$6], message=[$7], url=[$8], tags=[$9], geo=[$10], utc_time=[$15], bytes=[$16], machine=[$17], response=[$20], clientip=[$21], host=[$22], event=[$23], phpmemory=[$25], timestamp=[$26], @timestamp=[$27])\
    LogicalSort(sort0=[$27], dir0=[DESC])\
      LogicalUnion(all=[true])\
        LogicalProject(referer=[$0], request=[$1], agent=[$2], extension=[$3], memory=[$4], ip=[$5], index=[$6], message=[$7], url=[$8], tags=[$9], geo=[$10], geo.coordinates=[$11], geo.dest=[$12], geo.src=[$13], geo.srcdest=[$14], utc_time=[$15], bytes=[$16], machine=[$17], machine.os=[$18], machine.ram=[$19], response=[$20], clientip=[$21], host=[$22], event=[$23], event.dataset=[$24], phpmemory=[$25], timestamp=[$26], @timestamp=[$33])\
          LogicalFilter(condition=[=($13, 'IT')])\
            LogicalProject(referer=[$0], request=[$1], agent=[$2], extension=[$3], memory=[$4], ip=[$5], index=[$6], message=[$7], url=[$8], tags=[$9], geo=[$10], geo.coordinates=[$11], geo.dest=[$12], geo.src=[$13], geo.srcdest=[$14], utc_time=[$15], bytes=[$16], machine=[$17], machine.os=[$18], machine.ram=[$19], response=[$20], clientip=[$21], host=[$22], event=[$23], event.dataset=[$24], phpmemory=[$25], timestamp=[$26], _id=[$27], _index=[$28], _score=[$29], _maxscore=[$30], _sort=[$31], _routing=[$32], @timestamp=[$26])\
              CalciteLogicalIndexScan(table=[[OpenSearch, opensearch_dashboards_sample_data_logs]])\
        LogicalProject(referer=[$0], request=[$1], agent=[$2], extension=[$3], memory=[$4], ip=[$5], index=[$6], message=[$7], url=[$8], tags=[$9], geo=[$10], geo.coordinates=[$11], geo.dest=[$12], geo.src=[$13], geo.srcdest=[$14], utc_time=[$15], bytes=[$16], machine=[$17], machine.os=[$18], machine.ram=[$19], response=[$20], clientip=[$21], host=[$22], event=[$23], event.dataset=[$24], phpmemory=[$25], timestamp=[$26], @timestamp=[$33])\
          LogicalFilter(condition=[=($13, 'US')])\
            LogicalProject(referer=[$0], request=[$1], agent=[$2], extension=[$3], memory=[$4], ip=[$5], index=[$6], message=[$7], url=[$8], tags=[$9], geo=[$10], geo.coordinates=[$11], geo.dest=[$12], geo.src=[$13], geo.srcdest=[$14], utc_time=[$15], bytes=[$16], machine=[$17], machine.os=[$18], machine.ram=[$19], response=[$20], clientip=[$21], host=[$22], event=[$23], event.dataset=[$24], phpmemory=[$25], timestamp=[$26], _id=[$27], _index=[$28], _score=[$29], _maxscore=[$30], _sort=[$31], _routing=[$32], @timestamp=[$26])\
              CalciteLogicalIndexScan(table=[[OpenSearch, opensearch_dashboards_sample_data_logs]])\
]",
    "details": "class org.apache.calcite.plan.RelCompositeTrait cannot be cast to class org.apache.calcite.rel.RelCollation (org.apache.calcite.plan.RelCompositeTrait and org.apache.calcite.rel.RelCollation are in unnamed module of loader java.net.URLClassLoader @2ee18aca)",
    "location": [
      "while running the query",
      "while compiling the optimized query plan for physical execution"
    ],
    "code": "PLANNING_ERROR",
    "type": "SQLException"
  },
  "status": 500
}

Selecting the timestamp field (with the fields command) like in the following query will fail in the same way.

| multisearch 
    [search source = opensearch_dashboards_sample_data_logs | where geo.src='IT']
    [search source = opensearch_dashboards_sample_data_logs | where geo.src='US']
| fields timestamp

IMPORTANT! Weirdly enough, selecting the @timestamp field will make the query succeed.

The opensearch_dashboards_sample_data_flights index does not have this problem. The following query succeeds:

| multisearch 
    [search source = opensearch_dashboards_sample_data_flights | where DestWeather='Sunny']
    [search source = opensearch_dashboards_sample_data_flights | where DestWeather='Rain']

Furthermore, doing a multisearch that includes both the opensearch_dashboards_sample_data_logs and opensearch_dashboards_sample_data_flights indexes will succeed, as seen below:

| multisearch 
    [search source = opensearch_dashboards_sample_data_logs | where geo.src='IT'] 
    [search source = opensearch_dashboards_sample_data_logs | where geo.src='US'] 
    [search source=opensearch_dashboards_sample_data_flights]
The result for the query above: (expandable section)
{
  "schema": [
    {
      "name": "referer",
      "type": "string"
    },
    {
      "name": "request",
      "type": "string"
    },
    {
      "name": "agent",
      "type": "string"
    },
    {
      "name": "extension",
      "type": "string"
    },
    {
      "name": "memory",
      "type": "double"
    },
    {
      "name": "ip",
      "type": "ip"
    },
    {
      "name": "index",
      "type": "string"
    },
    {
      "name": "message",
      "type": "string"
    },
    {
      "name": "url",
      "type": "string"
    },
    {
      "name": "tags",
      "type": "string"
    },
    {
      "name": "geo",
      "type": "struct"
    },
    {
      "name": "utc_time",
      "type": "timestamp"
    },
    {
      "name": "bytes",
      "type": "bigint"
    },
    {
      "name": "machine",
      "type": "struct"
    },
    {
      "name": "response",
      "type": "string"
    },
    {
      "name": "clientip",
      "type": "ip"
    },
    {
      "name": "host",
      "type": "string"
    },
    {
      "name": "event",
      "type": "struct"
    },
    {
      "name": "phpmemory",
      "type": "bigint"
    },
    {
      "name": "timestamp",
      "type": "timestamp"
    },
    {
      "name": "@timestamp",
      "type": "timestamp"
    },
    {
      "name": "FlightNum",
      "type": "string"
    },
    {
      "name": "Origin",
      "type": "string"
    },
    {
      "name": "OriginLocation",
      "type": "geo_point"
    },
    {
      "name": "DestLocation",
      "type": "geo_point"
    },
    {
      "name": "FlightDelay",
      "type": "boolean"
    },
    {
      "name": "DistanceMiles",
      "type": "float"
    },
    {
      "name": "FlightTimeMin",
      "type": "float"
    },
    {
      "name": "OriginWeather",
      "type": "string"
    },
    {
      "name": "dayOfWeek",
      "type": "int"
    },
    {
      "name": "AvgTicketPrice",
      "type": "float"
    },
    {
      "name": "Carrier",
      "type": "string"
    },
    {
      "name": "FlightDelayMin",
      "type": "int"
    },
    {
      "name": "OriginRegion",
      "type": "string"
    },
    {
      "name": "DestAirportID",
      "type": "string"
    },
    {
      "name": "FlightDelayType",
      "type": "string"
    },
    {
      "name": "Dest",
      "type": "string"
    },
    {
      "name": "FlightTimeHour",
      "type": "string"
    },
    {
      "name": "Cancelled",
      "type": "boolean"
    },
    {
      "name": "DistanceKilometers",
      "type": "float"
    },
    {
      "name": "OriginCityName",
      "type": "string"
    },
    {
      "name": "DestWeather",
      "type": "string"
    },
    {
      "name": "OriginCountry",
      "type": "string"
    },
    {
      "name": "DestCountry",
      "type": "string"
    },
    {
      "name": "DestRegion",
      "type": "string"
    },
    {
      "name": "DestCityName",
      "type": "string"
    },
    {
      "name": "OriginAirportID",
      "type": "string"
    }
  ],
  "datarows": [ <TRUNCATED> ],
  "total": 10000,
  "size": 10000
}

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. On a fresh instalation of OpenSearch + OS Dashboards 3.6.0/3.7.0 go to the /app/home#/tutorial_directory URL path and add the "Sample web logs" and "Sample flight data".
  2. Use the queries provided in the issue either in the Dev Console or the Query Workbench.

What is the expected behavior?
The query should not fail with a Java class cast exception and return the union of all the rows in all subsearches, together with all their fields.

What is your host/environment?

  • OS: N/A
  • Version: Tested on both OpenSearch 3.6.0 and OpenSearch 3.7.0
  • Plugins: N/A

Do you have any screenshots?
N/A

Do you have any additional context?
A few more examples of queries that work:

| multisearch 
    [search source = opensearch_dashboards_sample_data_logs | where geo.src='IT' | head 5] 
    [search source = opensearch_dashboards_sample_data_logs | where geo.src='US' | head 5]  
    [search source=opensearch_dashboards_sample_data_flights]

A few more examples of queries that do not work:

| multisearch 
    [search source = opensearch_dashboards_sample_data_logs | where geo.src='IT' | head 5] 
    [search source = opensearch_dashboards_sample_data_logs | where geo.src='US' | head 5]  
    [search source=opensearch_dashboards_sample_data_flights | head 5]

Metadata

Metadata

Assignees

No one assigned

    Labels

    PPLPiped processing languagebugSomething isn't workingcalcitecalcite migration releated

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions