Skip to content

lambda binary responses #19

@dazza-codes

Description

@dazza-codes

Notes or references on lambda binary responses

https://docs.aws.amazon.com/apigateway/latest/developerguide/lambda-proxy-binary-media.html

use the Content-Type in the response to manage parsing the data correctly

binary stream data via async aiohttp might parse the data differently already?

def lambda_handler(event, context):
    number = random.randint(0,1)
    if number == 1:
        response = s3.get_object(
            Bucket='bucket-name',
            Key='image.png',
        )
        image = response['Body'].read()
        return {
            'headers': { "Content-Type": "image/png" },
            'statusCode': 200,
            'body': base64.b64encode(image).decode('utf-8'),
            'isBase64Encoded': True
        }
    else:
        return {
            'headers': { "Content-type": "text/html" },
            'statusCode': 200,
            'body': "<h1>This is text</h1>",
        }

request might need to ask for binary media type
Accept: application/octet-stream

https://pypi.org/project/pbjson/

https://github.com/mapbox/geobuf

https://www.compose.com/articles/faster-operations-with-the-jsonb-data-type-in-postgresql/

And this has some immediate benefits:

more efficiency,
significantly faster to process,
supports indexing (which can be a significant advantage, as we'll see later),
simpler schema designs (replacing entity-attribute-value (EAV) tables with jsonb columns, which can be queried, indexed and joined, allowing for performance improvements up until 1000X!)
And some drawbacks:

slightly slower input (due to added conversion overhead),
it may take more disk space than plain json due to a larger table footprint, though not always,
certain queries (especially aggregate ones) may be slower due to the lack of statistics.

The reason behind this last issue is that, for any given column, PostgreSQL saves descriptive statistics such as the number of distinct and most common values, the fraction of NULL entries, and --for ordered types-- a histogram of the data distribution. All of this will be unavailable when the info is entered as JSON fields, and you will suffer a heavy performance penalty especially when aggregating data (COUNT, AVG, SUM, etc) among tons of JSON fields.

To avoid this, you may consider storing data that you may aggregate later on regular fields.

https://www.bizety.com/2018/11/12/protocol-buffers-vs-json/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions