You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -732,10 +732,10 @@ And HDF5 stores the metadata associated with the table, including
732
732
column names, row labels, and data types (like FITS).
733
733
734
734
Finally, HDF5 is a cross-language standard, so if you write an HDF5
735
-
file with Pandas, you can read it back with many other software tools
735
+
file with pandas, you can read it back with many other software tools
736
736
(more than FITS).
737
737
738
-
We can write a Pandas`DataFrame` to an HDF5 file like this:
738
+
We can write a pandas`DataFrame` to an HDF5 file like this:
739
739
740
740
```python
741
741
filename ='gd1_data.hdf'
@@ -759,19 +759,19 @@ file if it already exists rather than append another dataset to it.
759
759
In this episode, we re-loaded the Gaia data we saved from a previous query.
760
760
761
761
We transformed the coordinates and proper motion from ICRS to a frame
762
-
aligned with the orbit of GD-1, stored the results in a Pandas
762
+
aligned with the orbit of GD-1, stored the results in a pandas
763
763
`DataFrame`, and visualized them.
764
764
765
-
We combined all of these steps into a single function that we can reuse in the future to go straight from the output of a query with object coordinates in the ICRS reference frame directly to a Pandas DataFrame that includes object coordinates in the GD-1 reference frame.
765
+
We combined all of these steps into a single function that we can reuse in the future to go straight from the output of a query with object coordinates in the ICRS reference frame directly to a pandas DataFrame that includes object coordinates in the GD-1 reference frame.
766
766
767
767
We saved our results to an HDF5 file which we can use to restart the analysis from this stage or verify our results at some future time.
- When you make a scatter plot, adjust the size of the markers and their transparency so the figure is not overplotted; otherwise it can misrepresent the data badly.
772
772
- For simple scatter plots in Matplotlib, `plot` is faster than `scatter`.
773
-
- An Astropy `Table` and a Pandas`DataFrame` are similar in many ways and they provide many of the same functions. They have pros and cons, but for many projects, either one would be a reasonable choice.
774
-
- To store data from a Pandas`DataFrame`, a good option is an HDF5 file, which can contain multiple Datasets (we'll dig in more in the Join lesson).
773
+
- An Astropy `Table` and a pandas`DataFrame` are similar in many ways and they provide many of the same functions. They have pros and cons, but for many projects, either one would be a reasonable choice.
774
+
- To store data from a pandas`DataFrame`, a good option is an HDF5 file, which can contain multiple Datasets (we'll dig in more in the Join lesson).
Now that we have identified the bounds of the cluster in proper motion,
434
434
we will use it to select rows from `results_df`.
435
435
436
-
We will use the following function, which uses Pandas operators to make
436
+
We will use the following function, which uses pandas operators to make
437
437
a mask that selects rows where `series` falls between `low` and
438
438
`high`.
439
439
@@ -563,7 +563,7 @@ Recall that we chose HDF5 because it is a binary format producing small files th
563
563
564
564
Additionally, HDF5 files can contain more than one dataset and can store metadata associated with each dataset (such as column names or observatory information, like a FITS header).
565
565
566
-
We can add to our existing Pandas`DataFrame` to an HDF5 file by omitting the `mode='w'` keyword like this:
566
+
We can add to our existing pandas`DataFrame` to an HDF5 file by omitting the `mode='w'` keyword like this:
567
567
568
568
```python
569
569
filename ='gd1_data.hdf'
@@ -662,7 +662,7 @@ the proper motion limits we identified in this lesson, which will allow us to ex
- A workflow is often prototyped on a small set of data which can be explored more easily and used to identify ways to limit a dataset to exactly the data you want.
665
-
- To store data from a Pandas`DataFrame`, a good option is an HDF5 file, which can contain multiple Datasets.
665
+
- To store data from a pandas`DataFrame`, a good option is an HDF5 file, which can contain multiple Datasets.
Copy file name to clipboardExpand all lines: index.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ permalink: index.html
3
3
site: sandpaper::sandpaper_site
4
4
---
5
5
6
-
The Foundations of Astronomical Data Science curriculum covers a range of core concepts necessary to efficiently study the ever-growing datasets developed in modern astronomy. In particular, this curriculum teaches learners to perform database operations (SQL queries, joins, filtering) and to create publication-quality data visualisations. Learners will use software packages common to the general and astronomy-specific data science communities ([Pandas](https://pandas.pydata.org), [Astropy](https://www.astropy.org), [Astroquery](https://astroquery.readthedocs.io/en/latest/) combined with two astronomical datasets: the large, all-sky, multi-dimensional dataset from the [Gaia satellite](https://sci.esa.int/web/gaia), which measures the positions, motions, and distances of approximately a billion stars in our Milky Way galaxy with unprecedented accuracy and precision; and the [Pan-STARRS photometric survey](https://panstarrs.stsci.edu/), which precisely measures light output and distribution from many stars. Together, the software and datasets are used to reproduce part of the analysis from the article ["Off the beaten path: Gaia reveals GD-1 stars outside of the main stream"](https://arxiv.org/abs/1805.00425) by Drs. Adrian M. Price-Whelan and Ana Bonaca. This lesson shows how to identify and visualize the GD-1 stellar stream, which is a globular cluster that has been tidally stretched by the Milky Way.
6
+
The Foundations of Astronomical Data Science curriculum covers a range of core concepts necessary to efficiently study the ever-growing datasets developed in modern astronomy. In particular, this curriculum teaches learners to perform database operations (SQL queries, joins, filtering) and to create publication-quality data visualisations. Learners will use software packages common to the general and astronomy-specific data science communities ([pandas](https://pandas.pydata.org), [Astropy](https://www.astropy.org), [Astroquery](https://astroquery.readthedocs.io/en/latest/) combined with two astronomical datasets: the large, all-sky, multi-dimensional dataset from the [Gaia satellite](https://sci.esa.int/web/gaia), which measures the positions, motions, and distances of approximately a billion stars in our Milky Way galaxy with unprecedented accuracy and precision; and the [Pan-STARRS photometric survey](https://panstarrs.stsci.edu/), which precisely measures light output and distribution from many stars. Together, the software and datasets are used to reproduce part of the analysis from the article ["Off the beaten path: Gaia reveals GD-1 stars outside of the main stream"](https://arxiv.org/abs/1805.00425) by Drs. Adrian M. Price-Whelan and Ana Bonaca. This lesson shows how to identify and visualize the GD-1 stellar stream, which is a globular cluster that has been tidally stretched by the Milky Way.
7
7
8
8
GD-1 is a stellar stream around the Milky Way. This means it is a collection of stars that we believe was once part of a bound clump, but the gravitational influence of the Milky Way has torn it apart and spread it over an arc that traces out its orbit on the sky. This is interesting, because if the original bound clump was a dwarf galaxy, understanding its orbit with sufficient precision allows us to measure the mass of the Milky Way, which is very important for understanding the future and past of the Milky Way as a whole. But that is much easier to do if we have a coordinate system aligned with the stream because that makes fitting the location of the stars much easier mathematically - it becomes more linear instead of some complicated curve. Additionally, this stream is especially interesting because it has "gaps", which have a natural interpretation as being caused by the influence of small clumps of dark matter passing near the stream. Knowing the typical rate of these gaps tells you about the typical size and density of these clumps, which turns out to be one of the best probes we have of the fine structure of dark matter.
9
9
@@ -13,7 +13,7 @@ This lesson can be taught in approximately 10 hours and covers the following top
13
13
- Using Astroquery to query a remote server in Python.
14
14
- Transforming coordinates between common coordinate systems using Astropy units and coordinates.
15
15
- Working with common astronomical file formats, including FITS, HDF5, and CSV.
16
-
- Managing your data with Pandas DataFrames and Astropy Tables.
16
+
- Managing your data with pandas DataFrames and Astropy Tables.
17
17
- Writing functions to make your work less error-prone and more reproducible.
18
18
- Creating a reproducible workflow that brings the computation to the data.
19
19
- Customising all elements of a plot and creating complex, multi-panel, publication-quality graphics.
0 commit comments