Species suggestion Backend by BryonLewis · Pull Request #467 · Kitware/batai

BryonLewis · 2026-04-02T18:42:23Z

Large code change number is because of the 4MB geojson species boundaries
In prep for a future change I've modified grts_cell to be sample_frame_id for the GRTS cell Ids. This is because the sample_frame_id is an additional number that is used for distinguishing between CONUS (14), Canada/Alaska, Mexico, Offshore
Updates most references from grts_cell to sample_frame_id and also creations points to allow it to be populated based on either Guano for filename metadata
Adds SpeciesRanges species.geojson that contains GeoJSON multiple polygons and a species name
Created a SpeciesRange Model that stores the geometry data with a reference to the Species in the DB
Migrations are made for adding this new table as well as changing the name of grts_cell to sample_frame_id
Add a load_species_geojson Django Management command to load the geojson into the database
Species Return value has a field called in_range that determines if the current RecordingId, or current GRTS_Cell_Id and Sample_frame_id are inside the species range.
The Species endpoint now has query parameters:
- Recording Id: Use the grts_cell_id and sample_frame_id (defaulting to 14 for Continental US if None) for the recording Id and returning a list of Species back
- GRTS_Cell_Id/Sample_Frame_Id: Instead of providing a recording, you can provide the GRTS Cell ID and the Sample Frame ID with the sample frame defaulting to 14.

This will be followed up with the UI section in #468

… listing

cloudflare-workers-and-pages · 2026-04-03T16:26:44Z

Deploying bats-ai with Cloudflare Pages

Latest commit:	`37e1802`
Status:	✅ Deploy successful!
Preview URL:	https://d012e543.bats-ai.pages.dev
Branch Preview URL:	https://species-suggestion-backend.bats-ai.pages.dev

View logs

naglepuff · 2026-04-07T21:35:11Z

+def get_species(
+    request: HttpRequest,
+    grts_cell_id: int | None = Query(None),
+    sample_frame_id: int = Query(14),


Could we use a constant for 14 to encode some sense of meaning for future developers? Maybe CONUS_SAMPLE_FRAME_ID = 14 somewhere?

More generally, why isn't this just set as a model default, so it's always defined in the database?

Is there a special meaning to when it's NULL (and defaults in code to 14) vs when it's explicitly set to 14 in the database?

This was something that wasn't fully understood until recently. There is a 10x10km grid laid across the areas on the map that USGS has created for the NABat project that provides these grts_cell_ids. What wasn't known is that there are different GRTS_cell_ids for different countries/regions, Continental US is 14, Canada and Alaska are a different number and Mexico is another number.
The GRTS Cell Ids when imported should have this smaple_frame_id in the shape file and should be set.

Now when we import Bat file recordings they have metadata attached to them in multiple ways. One is an internal metadata format called guano metadata which are metada fields attached to the WAV file itself and can be read using the python guano package. The other way is the file naming scheme itself can contain things like the date/time/ grts_cell_id and possible sample_frame_id. The sample_frame_id isn't guaranteed to be populalated or found using guano or the file naming scheme so it can be unknown. In those instances I'm assuming it's the continental U.S. which is something that wasn't done before.
So we either can:

update all recordings to swap to a sample_frame_id of 14 if it is none

When looking for a grts_cell_id and a sample_frame_id isn't provided whe can interpret it as being 14

There may be times when I want to use this endpoint without a recording but with the GRTS cell Id and optional sample_frame_id and return species suggestions.

It's kind of a complicated decision based on what data USGS is providing us, trying to make reasonable defaults and determining what may be given in the future.

Swapped to a label constant in 14c969e

This may change in the future if we get some more information about the GRTS_Cell_Id and the sample_frame_id and better ways to determine from the wav file what the sample_frame_id is.

Thanks for the detailed explanation!

Ok, it makes more sense now that we want to preserve the recording metadata (or lack thereof) as-is, especially since it might be possible to better fill gaps in the future.

naglepuff · 2026-04-07T21:50:51Z

Not sure if you have used django-click at all but if you prefer writing CLIs with click, we already include that as a dependency. I wouldn't say you have to rewrite this command, but keep in mind for the future.

brianhelba · 2026-04-08T12:47:36Z

+DEFAULT_GEOJSON = Path(__file__).resolve().parents[2] / "data" / "species.geojson"
+
+
+class Command(BaseCommand):


Would it make sense to make this a data migration, so it's loaded automatically when the database is bootstrapped?

I went back and forth on including it because I didn't know if I wanted to make it a requirement for the application. I think for ease of deployment it may make sense to have it and keep the management command as a way to update if I happent o get new species.geojson with updated ranges/species.

updated the path locations and added a python command to migration to load it in 7dfd479

Yeah, I think the data migration is a great ergonomic improvement.

Is it reasonable to declare that if species.geojson is updated in the future, we'll just be required to also add a new data migration to keep the database updated accordingly? That way, the state of these rows in the database should always match the state of the species.geojson file in the repo.

If you want to do that approach, we should be able to delete this management command entirely now. If / when a new species.geojson data migration needs to be written, it might be nice to share some of the code, so it's not copy-pasted every time, but I think it's reasonable to defer that until we actually get a second data migration.

If you want to keep this manage.py command, I think it'd be much more maintainable to find a way to deduplicate the species.geojson deserialization / parsing logic. Maybe it'd be good to have it in a common utility, then have both the manage.py command and the data migration import ant use it.

I don't know what the future holds but we could have different deployments that have different species suggestions. For now it made sense for it to be in the migration fo the USGS vetting process that is ocurring, but we may have instances in which we decide to utilize a species suggestion list that is trained (from annotations) instead of curated.

I did consider the de-duplication or abstracting out code for specifically species ingestion. I internally decided against that (and this may be thinking too much ahead) but in the future if the species suggestion structures change, it would be easier for me to modify and add additional optional fields to the management command instead of relying on an abstracted out bit of code that would either have to be copied or maintained to be historically accurate with the migration. I can also understand the arguments of YAGNI for it.

Makes sense. One the one hand, with the removal of the error guards, these two code blocks are looking quite similar. On the other hand, I totally see your point that future ingest updates wouldn't necessarily want to modify the historic migration code.

When that future comes, we'll need to figure out how to handle the species-ranges.geojson file changes. Do we edit the file in-place and add a new migration that attempts to reconcile any differences from the old one? Do we add a second file and a second migration to just re-do all the ingest work? Regardless, it's something we can worry about later.

Co-authored-by: Brian Helba <brian.helba@kitware.com>

brianhelba

See inline feedback.

brianhelba · 2026-04-08T15:00:20Z

+                geom__intersects=cell_geom,
+            )
+            qs = Species.objects.annotate(
+                pk=F("id"),


Is this particular line necessary?

Updated to remove the unnecessary pk: a95324b

Also noticed that I should set in_range default to False if there exists a cell_geom so the species list is clear that
in_range: null - means that it couldn't figure it out because it didn't have a valid grts_cell_id geom
in_range: true - intersecting cell geom from the grts_cell_id with the species range
in_range: false - has a valid cell geom but doesn't intersect with the species range

Since in_range is something we're making up just for this output, it's worth adding at least a code comment to the SpeciesSchema.in_range definition to explain these semantics. That's very helpful when first trying to interpret the code.

brianhelba · 2026-04-08T15:23:01Z

+
+        fid = props.get("id")
+        source_feature_id = str(fid) if fid is not None else ""
+        SpeciesRange.objects.update_or_create(


Is it normal for a single file to contain duplicate SpeciesRange? If that's not expected, I think this should be a .create and appropriate error if a duplicate is detected.

I'd expect this operation to always run on an empty SpeciesRange table, since it was only created as part of this migration.

I don't know for sure if this could in the future have duplicate feature IDs and species types. This was provided from USGS and didn't have a specification for what exactly the data would be. I realize this would have the latest in the geojson feature order so maybe I should use create and fail if there happens to be duplicates.

Up to you. If you decide to tolerate duplicates and use some policy like "always prefer the first / last entry", then it'd be worth a code comment to note the intent.

brianhelba · 2026-04-08T15:24:03Z

+            )
+        )
+        for msg in errors:
+            self.stdout.write(self.style.ERROR(msg))


Why not just write the errors as they occur?

mostly habit from other ingestion/management scripts where I may have thousands of things being processed and logged and I want a concise list of the failure items (if they are non-raising/stopping) at the end to see what is missed. If they were stopping errors I would log them immediately.

brianhelba · 2026-04-08T15:31:05Z

+                geom = GEOSGeometry(json.dumps(geom_json))
+            except Exception as e:
+                errors.append(f"Feature {i} ({code}): invalid geometry: {e}")
+                continue


I notice that in the migration, this operation doesn't fail through to continue. I think that's totally reasonable, given that invalid geometry is a huge red flag and perhaps should be fatal (rather than proceeding with a suspect file).

Do we want to be more lenient in this script for some reason?

Same as below I'm removing it so it will properly fail. I think initiall during testing I was being cautious and just wanted to visualized data incase there were problems with the geojson I got so I was over cautious in getting data into the DB to see the structure of it.

brianhelba · 2026-04-08T15:35:14Z

+
+            try:
+                geom = GEOSGeometry(json.dumps(geom_json))
+            except Exception as e:


Can you make this more specific than Exception? We wouldn't want to accidentally catch something like a file read permission error.

Maybe you just want json.JSONDecodeError? I'm not sure if GEOSGeometry raises any exceptions that you'd want to silently let pass?

I'll remove it and let the ingestion fail if there happens to be any problems.

Co-authored-by: Brian Helba <brian.helba@kitware.com>

… is false

brianhelba · 2026-04-08T17:34:19Z

@BryonLewis Ok, this is looking really good to me.

I think there's just 2 places that warrant code comments, and we're set?

BryonLewis · 2026-04-08T17:44:40Z

I've added in comments about the create_or_update, and the in_range and also added a section to the README.md to describe a bit what is happening with species-ranges.geojson and how it can be updated in the future.

Thanks for taking a look at this. I'd like to get it merged sometime today so I can update the front-end PR and possibly give some USGS members a link to the suggested species interface preview for feedback.

brianhelba

BryonLewis added 5 commits April 2, 2026 13:42

modify grts_cell to sample_frame_id, the real name

d5d1ef5

add species range model, management loading command, admin interface

8705dcb

GET species support for recordingId, GRTSCell and Sample_frame_Id for…

4a9b595

… listing

mark geojson file as generated to remove line count

9291c8d

mark geojson file as vendored to remove line count

c844931

BryonLewis mentioned this pull request Apr 2, 2026

Species Suggestion Frontend #468

Merged

add category label back to Species results

37e1802

BryonLewis marked this pull request as ready for review April 6, 2026 12:00

BryonLewis requested a review from naglepuff April 6, 2026 12:00

naglepuff approved these changes Apr 7, 2026

View reviewed changes

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/management/commands/load_species_geojson.py Outdated

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread .gitattributes Outdated

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/views/species.py Outdated

BryonLewis and others added 2 commits April 8, 2026 08:55

Update .gitattributes

4f2b7a9

Co-authored-by: Brian Helba <brian.helba@kitware.com>

Update bats_ai/core/management/commands/load_species_geojson.py

3b1e2fd

Co-authored-by: Brian Helba <brian.helba@kitware.com>

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/views/species.py Outdated

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/views/species.py Outdated

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/admin/species_range.py Outdated

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/models/species_range.py Outdated

brianhelba requested changes Apr 8, 2026

View reviewed changes

BryonLewis added 5 commits April 8, 2026 09:24

swap to tuples to lists for admin

6db3150

prevent nullable for SpeciesRange.source_feature_id field

b886db0

update GEOJSON location, add loading to migration

7dfd479

default to CONUS_SAMPLE_FRAME_ID=14 for unknown sample_frame_ids

14c969e

cleanup species view

1c6cae4

BryonLewis force-pushed the species-suggestion-backend branch from 202aa00 to 1c6cae4 Compare April 8, 2026 14:24

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/management/commands/load_species_geojson.py Outdated

pathlib for DEFAULT_GEOJSON

5cfc201

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/views/species.py Outdated

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/views/species.py Outdated

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/migrations/0036_rename_grts_cell_recording_sample_frame_id_and_more.py Outdated

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/management/commands/load_species_geojson.py Outdated

brianhelba reviewed Apr 8, 2026

View reviewed changes

Comment thread bats_ai/core/management/commands/load_species_geojson.py Outdated

brianhelba reviewed Apr 8, 2026

View reviewed changes

BryonLewis and others added 4 commits April 8, 2026 12:17

rename species.geojson to species-ranges.geojson

9bb636a

Update bats_ai/core/management/commands/load_species_geojson.py

39bbdd3

Co-authored-by: Brian Helba <brian.helba@kitware.com>

allow more failing errors in load species range

fadceef

remove unecessary pk, and default to false if geom exists and overlap…

a95324b

… is false

in_range output comments, species-range ingestion description

b048a59

BryonLewis requested a review from brianhelba April 8, 2026 17:59

brianhelba approved these changes Apr 8, 2026

View reviewed changes

BryonLewis merged commit 19f969b into main Apr 8, 2026
4 checks passed

BryonLewis deleted the species-suggestion-backend branch April 8, 2026 18:23

		DEFAULT_GEOJSON = Path(__file__).resolve().parents[2] / "data" / "species.geojson"


		class Command(BaseCommand):

Conversation

BryonLewis commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying bats-ai with Cloudflare Pages

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brianhelba left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BryonLewis Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brianhelba commented Apr 8, 2026

Uh oh!

BryonLewis commented Apr 8, 2026

Uh oh!

brianhelba left a comment

Choose a reason for hiding this comment

BryonLewis commented Apr 2, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Apr 3, 2026 •

edited

Loading

BryonLewis Apr 8, 2026 •

edited

Loading