Skip to content

Commit 3a46545

Browse files
authored
Merge pull request #272 from zonca/fix-271-lowercase-pandas
Fix: lowercase 'pandas' per branding guidelines
2 parents 568a1fc + 6c8584d commit 3a46545

11 files changed

Lines changed: 48 additions & 48 deletions

File tree

episodes/03-transform.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ exercises: 5
99
- Select rows and columns from an Astropy `Table`.
1010
- Use Matplotlib to make a scatter plot.
1111
- Use Gala to transform coordinates.
12-
- Make a Pandas `DataFrame` and use a Boolean `Series` to select rows.
12+
- Make a pandas `DataFrame` and use a Boolean `Series` to select rows.
1313
- Save a `DataFrame` in an HDF5 file.
1414

1515
::::::::::::::::::::::::::::::::::::::::::::::::::
1616

1717
:::::::::::::::::::::::::::::::::::::::: questions
1818

1919
- How do we make scatter plots in Matplotlib?
20-
- How do we store data in a Pandas `DataFrame`?
20+
- How do we store data in a pandas `DataFrame`?
2121

2222
::::::::::::::::::::::::::::::::::::::::::::::::::
2323

@@ -40,7 +40,7 @@ analysis, identifying stars with the proper motion we expect for GD-1.
4040
2. Then we will transform the coordinates and proper motion data from
4141
ICRS back to the coordinate frame of GD-1.
4242

43-
3. We will put those results into a Pandas `DataFrame`.
43+
3. We will put those results into a pandas `DataFrame`.
4444

4545

4646
::::::::::::::::::::::::::::::::::::::::::::::::::
@@ -471,7 +471,7 @@ We started with a rectangle in the GD-1 frame. When
471471
transformed to the ICRS frame, it is a non-rectangular region. Now,
472472
transformed back to the GD-1 frame, it is a rectangle again.
473473

474-
## Pandas DataFrame
474+
## pandas DataFrame
475475

476476
At this point we have two objects containing different sets of the
477477
data relating to identifying stars in GD-1. `polygon_results` is the Astropy `Table` we downloaded from Gaia.
@@ -506,33 +506,33 @@ transformed coordinates.
506506

507507
::::::::::::::::::::::::::::::::::::::::: callout
508508

509-
## Pandas `DataFrame`s versus Astropy `Table`s
509+
## pandas `DataFrame`s versus Astropy `Table`s
510510

511-
Two common choices are the Pandas `DataFrame` and Astropy `Table`.
512-
Pandas `DataFrame`s and Astropy `Table`s share many of the same characteristics
511+
Two common choices are the pandas `DataFrame` and Astropy `Table`.
512+
pandas `DataFrame`s and Astropy `Table`s share many of the same characteristics
513513
and most of the manipulations that we do can be done with either. As you become
514514
more familiar with each, you will develop a sense of which one you prefer for
515515
different tasks. For instance you may choose to use Astropy `Table`s to read
516-
in data, especially astronomy specific data formats, but Pandas `DataFrame`s to
516+
in data, especially astronomy specific data formats, but pandas `DataFrame`s to
517517
inspect the data. Fortunately, Astropy makes it easy to convert between the
518-
two data types. We will choose to use Pandas `DataFrame`, for two reasons:
518+
two data types. We will choose to use pandas `DataFrame`, for two reasons:
519519

520520
1. It provides capabilities that are (almost) a superset of the other data
521521
structures, so it is the all-in-one solution.
522522

523-
2. Pandas is a general-purpose tool that is useful in many domains,
523+
2. pandas is a general-purpose tool that is useful in many domains,
524524
especially data science. If you are going to develop expertise in one
525-
tool, Pandas is a good choice.
525+
tool, pandas is a good choice.
526526

527-
However, compared to an Astropy `Table`, Pandas has one big drawback:
527+
However, compared to an Astropy `Table`, pandas has one big drawback:
528528
it does not keep the metadata associated with the table, including the
529529
units for the columns. Nevertheless, we think it's a useful data type
530530
to be familiar with.
531531

532532

533533
::::::::::::::::::::::::::::::::::::::::::::::::::
534534

535-
It is straightforward to convert an Astropy `Table` to a Pandas `DataFrame`.
535+
It is straightforward to convert an Astropy `Table` to a pandas `DataFrame`.
536536

537537
```python
538538
import pandas as pd
@@ -641,7 +641,7 @@ and consolidate them into a single function that we can use to take the
641641
coordinates and proper motion that we get as an Astropy `Table` from our
642642
Gaia query, add columns representing the reflex corrected
643643
GD-1 coordinates and proper motions, and transform it into a
644-
Pandas `DataFrame`.
644+
pandas `DataFrame`.
645645
This is a general function that we will use multiple times as we build different
646646
queries so we want to write it once and then call the function rather than having
647647
to copy and paste the code over and over again.
@@ -652,7 +652,7 @@ def make_dataframe(table):
652652
653653
table: Astropy Table
654654
655-
returns: Pandas DataFrame
655+
returns: pandas DataFrame
656656
"""
657657
#Create a SkyCoord object with the coordinates and proper motions
658658
# in the input table
@@ -695,7 +695,7 @@ results_df = make_dataframe(polygon_results)
695695

696696
At this point we have run a successful query and combined the results into a single `DataFrame`. This is a good time to save the data.
697697

698-
To save a Pandas `DataFrame`, one option is to convert it to an
698+
To save a pandas `DataFrame`, one option is to convert it to an
699699
Astropy `Table`, like this:
700700

701701
```python
@@ -712,7 +712,7 @@ astropy.table.table.Table
712712
Then we could write the `Table` to a FITS file, as we did in the
713713
previous lesson.
714714

715-
But, like Astropy, Pandas provides functions to write DataFrames in other formats; to
715+
But, like Astropy, pandas provides functions to write DataFrames in other formats; to
716716
see what they are [find the functions here that begin with
717717
`to_`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).
718718

@@ -732,10 +732,10 @@ And HDF5 stores the metadata associated with the table, including
732732
column names, row labels, and data types (like FITS).
733733

734734
Finally, HDF5 is a cross-language standard, so if you write an HDF5
735-
file with Pandas, you can read it back with many other software tools
735+
file with pandas, you can read it back with many other software tools
736736
(more than FITS).
737737

738-
We can write a Pandas `DataFrame` to an HDF5 file like this:
738+
We can write a pandas `DataFrame` to an HDF5 file like this:
739739

740740
```python
741741
filename = 'gd1_data.hdf'
@@ -759,19 +759,19 @@ file if it already exists rather than append another dataset to it.
759759
In this episode, we re-loaded the Gaia data we saved from a previous query.
760760

761761
We transformed the coordinates and proper motion from ICRS to a frame
762-
aligned with the orbit of GD-1, stored the results in a Pandas
762+
aligned with the orbit of GD-1, stored the results in a pandas
763763
`DataFrame`, and visualized them.
764764

765-
We combined all of these steps into a single function that we can reuse in the future to go straight from the output of a query with object coordinates in the ICRS reference frame directly to a Pandas DataFrame that includes object coordinates in the GD-1 reference frame.
765+
We combined all of these steps into a single function that we can reuse in the future to go straight from the output of a query with object coordinates in the ICRS reference frame directly to a pandas DataFrame that includes object coordinates in the GD-1 reference frame.
766766

767767
We saved our results to an HDF5 file which we can use to restart the analysis from this stage or verify our results at some future time.
768768

769769
:::::::::::::::::::::::::::::::::::::::: keypoints
770770

771771
- When you make a scatter plot, adjust the size of the markers and their transparency so the figure is not overplotted; otherwise it can misrepresent the data badly.
772772
- For simple scatter plots in Matplotlib, `plot` is faster than `scatter`.
773-
- An Astropy `Table` and a Pandas `DataFrame` are similar in many ways and they provide many of the same functions. They have pros and cons, but for many projects, either one would be a reasonable choice.
774-
- To store data from a Pandas `DataFrame`, a good option is an HDF5 file, which can contain multiple Datasets (we'll dig in more in the Join lesson).
773+
- An Astropy `Table` and a pandas `DataFrame` are similar in many ways and they provide many of the same functions. They have pros and cons, but for many projects, either one would be a reasonable choice.
774+
- To store data from a pandas `DataFrame`, a good option is an HDF5 file, which can contain multiple Datasets (we'll dig in more in the Join lesson).
775775

776776
::::::::::::::::::::::::::::::::::::::::::::::::::
777777

episodes/04-motion.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: Plotting and Pandas
2+
title: Plotting and pandas
33
teaching: 50
44
exercises: 15
55
---
66

77
::::::::::::::::::::::::::::::::::::::: objectives
88

9-
- Use a Boolean Pandas `Series` to select rows in a `DataFrame`.
9+
- Use a Boolean pandas `Series` to select rows in a `DataFrame`.
1010
- Save multiple `DataFrame`s in an HDF5 file.
1111

1212
::::::::::::::::::::::::::::::::::::::::::::::::::
@@ -30,7 +30,7 @@ analysis, identifying stars with the proper motion we expect for GD-1.
3030

3131
## Outline
3232

33-
1. We will put those results into a Pandas `DataFrame`, which we will use
33+
1. We will put those results into a pandas `DataFrame`, which we will use
3434
to select stars near the centerline of GD-1.
3535

3636
2. Plotting the proper motion of those stars, we will identify a region
@@ -88,7 +88,7 @@ results_df = pd.read_hdf(filename, 'results_df')
8888

8989
## Exploring data
9090

91-
One benefit of using Pandas is that it provides functions for
91+
One benefit of using pandas is that it provides functions for
9292
exploring the data and checking for problems.
9393
One of the most useful of these functions is `describe`, which
9494
computes summary statistics for each column.
@@ -236,7 +236,7 @@ type(phi2)
236236
pandas.core.series.Series
237237
```
238238

239-
The result is a `Series`, which is the structure Pandas uses to
239+
The result is a `Series`, which is the structure pandas uses to
240240
represent columns.
241241

242242
We can use a comparison operator, `>`, to compare the values in a
@@ -282,7 +282,7 @@ mask = (phi2 > phi2_min) & (phi2 < phi2_max)
282282
## Logical operators
283283

284284
Python's logical operators (`and`, `or`, and `not`)
285-
do not work with NumPy or Pandas. Both libraries use the bitwise
285+
do not work with NumPy or pandas. Both libraries use the bitwise
286286
operators (`&`, `|`, and `~`) to do elementwise logical operations
287287
([explanation here](https://stackoverflow.com/questions/21415661/logical-operators-for-boolean-indexing-in-pandas)).
288288

@@ -433,7 +433,7 @@ plt.plot(pm1_rect, pm2_rect, '-')
433433
Now that we have identified the bounds of the cluster in proper motion,
434434
we will use it to select rows from `results_df`.
435435

436-
We will use the following function, which uses Pandas operators to make
436+
We will use the following function, which uses pandas operators to make
437437
a mask that selects rows where `series` falls between `low` and
438438
`high`.
439439

@@ -563,7 +563,7 @@ Recall that we chose HDF5 because it is a binary format producing small files th
563563

564564
Additionally, HDF5 files can contain more than one dataset and can store metadata associated with each dataset (such as column names or observatory information, like a FITS header).
565565

566-
We can add to our existing Pandas `DataFrame` to an HDF5 file by omitting the `mode='w'` keyword like this:
566+
We can add to our existing pandas `DataFrame` to an HDF5 file by omitting the `mode='w'` keyword like this:
567567

568568
```python
569569
filename = 'gd1_data.hdf'
@@ -662,7 +662,7 @@ the proper motion limits we identified in this lesson, which will allow us to ex
662662
:::::::::::::::::::::::::::::::::::::::: keypoints
663663

664664
- A workflow is often prototyped on a small set of data which can be explored more easily and used to identify ways to limit a dataset to exactly the data you want.
665-
- To store data from a Pandas `DataFrame`, a good option is an HDF5 file, which can contain multiple Datasets.
665+
- To store data from a pandas `DataFrame`, a good option is an HDF5 file, which can contain multiple Datasets.
666666

667667
::::::::::::::::::::::::::::::::::::::::::::::::::
668668

episodes/05-select.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -435,7 +435,7 @@ our analysis at a later date we should save this information to a file.
435435
There are several ways we could do that, but since we are already
436436
storing data in an HDF5 file, we will do the same with these variables.
437437

438-
To save them to an HDF5 file we first need to put them in a Pandas object.
438+
To save them to an HDF5 file we first need to put them in a pandas object.
439439
We have seen how to create a `Series` from a column in a `DataFrame`.
440440
Now we will build a `Series` from scratch.
441441
We do not need the full `DataFrame` format with multiple rows and columns

episodes/06-join.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -882,7 +882,7 @@ that for each candidate star we have identified exactly one source in
882882
Pan-STARRS that is likely to be the same star.
883883

884884
To check whether there are any values other than `1`, we can convert
885-
this column to a Pandas `Series` and use `describe`, which we saw
885+
this column to a pandas `Series` and use `describe`, which we saw
886886
in episode 3.
887887

888888
```python
@@ -979,7 +979,7 @@ getsize(filename) / MB
979979

980980
## Another file format - CSV
981981

982-
Pandas can write a variety of other formats, [which you can read about
982+
pandas can write a variety of other formats, [which you can read about
983983
here](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).
984984
We won't cover all of them, but one other important one is
985985
[CSV](https://en.wikipedia.org/wiki/Comma-separated_values), which
@@ -1064,7 +1064,7 @@ the CSV file also does not.
10641064
However, even if we had written a CSV file from an astropy `Table`, which does contain data type,
10651065
data type would not appear in the CSV file, highlighting a limitation of this format.
10661066
Additionally, notice that the index in `candidate_df` has become an unnamed column
1067-
in `read_back_csv` and a new index has been created. The Pandas functions for writing and reading CSV
1067+
in `read_back_csv` and a new index has been created. The pandas functions for writing and reading CSV
10681068
files provide options to avoid that problem, but this is an example of
10691069
the kind of thing that can go wrong with CSV files.
10701070

episodes/07-photo.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ stars. But the main sequence of GD-1 appears as an overdense region in the lowe
185185

186186
We want to be able to make this plot again, with any selection of PanSTARRs photometry,
187187
so this is a natural time to put it into a function that accepts as input
188-
an Astropy `Table` or Pandas `DataFrame`, as long as
188+
an Astropy `Table` or pandas `DataFrame`, as long as
189189
it has columns named `g_mean_psf_mag` and `i_mean_psf_mag`. To do this we will change
190190
our variable name from `candidate_df` to the more generic `dataframe`.
191191

index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ permalink: index.html
33
site: sandpaper::sandpaper_site
44
---
55

6-
The Foundations of Astronomical Data Science curriculum covers a range of core concepts necessary to efficiently study the ever-growing datasets developed in modern astronomy. In particular, this curriculum teaches learners to perform database operations (SQL queries, joins, filtering) and to create publication-quality data visualisations. Learners will use software packages common to the general and astronomy-specific data science communities ([Pandas](https://pandas.pydata.org), [Astropy](https://www.astropy.org), [Astroquery](https://astroquery.readthedocs.io/en/latest/) combined with two astronomical datasets: the large, all-sky, multi-dimensional dataset from the [Gaia satellite](https://sci.esa.int/web/gaia), which measures the positions, motions, and distances of approximately a billion stars in our Milky Way galaxy with unprecedented accuracy and precision; and the [Pan-STARRS photometric survey](https://panstarrs.stsci.edu/), which precisely measures light output and distribution from many stars. Together, the software and datasets are used to reproduce part of the analysis from the article ["Off the beaten path: Gaia reveals GD-1 stars outside of the main stream"](https://arxiv.org/abs/1805.00425) by Drs. Adrian M. Price-Whelan and Ana Bonaca. This lesson shows how to identify and visualize the GD-1 stellar stream, which is a globular cluster that has been tidally stretched by the Milky Way.
6+
The Foundations of Astronomical Data Science curriculum covers a range of core concepts necessary to efficiently study the ever-growing datasets developed in modern astronomy. In particular, this curriculum teaches learners to perform database operations (SQL queries, joins, filtering) and to create publication-quality data visualisations. Learners will use software packages common to the general and astronomy-specific data science communities ([pandas](https://pandas.pydata.org), [Astropy](https://www.astropy.org), [Astroquery](https://astroquery.readthedocs.io/en/latest/) combined with two astronomical datasets: the large, all-sky, multi-dimensional dataset from the [Gaia satellite](https://sci.esa.int/web/gaia), which measures the positions, motions, and distances of approximately a billion stars in our Milky Way galaxy with unprecedented accuracy and precision; and the [Pan-STARRS photometric survey](https://panstarrs.stsci.edu/), which precisely measures light output and distribution from many stars. Together, the software and datasets are used to reproduce part of the analysis from the article ["Off the beaten path: Gaia reveals GD-1 stars outside of the main stream"](https://arxiv.org/abs/1805.00425) by Drs. Adrian M. Price-Whelan and Ana Bonaca. This lesson shows how to identify and visualize the GD-1 stellar stream, which is a globular cluster that has been tidally stretched by the Milky Way.
77

88
GD-1 is a stellar stream around the Milky Way. This means it is a collection of stars that we believe was once part of a bound clump, but the gravitational influence of the Milky Way has torn it apart and spread it over an arc that traces out its orbit on the sky. This is interesting, because if the original bound clump was a dwarf galaxy, understanding its orbit with sufficient precision allows us to measure the mass of the Milky Way, which is very important for understanding the future and past of the Milky Way as a whole. But that is much easier to do if we have a coordinate system aligned with the stream because that makes fitting the location of the stars much easier mathematically - it becomes more linear instead of some complicated curve. Additionally, this stream is especially interesting because it has "gaps", which have a natural interpretation as being caused by the influence of small clumps of dark matter passing near the stream. Knowing the typical rate of these gaps tells you about the typical size and density of these clumps, which turns out to be one of the best probes we have of the fine structure of dark matter.
99

@@ -13,7 +13,7 @@ This lesson can be taught in approximately 10 hours and covers the following top
1313
- Using Astroquery to query a remote server in Python.
1414
- Transforming coordinates between common coordinate systems using Astropy units and coordinates.
1515
- Working with common astronomical file formats, including FITS, HDF5, and CSV.
16-
- Managing your data with Pandas DataFrames and Astropy Tables.
16+
- Managing your data with pandas DataFrames and Astropy Tables.
1717
- Writing functions to make your work less error-prone and more reproducible.
1818
- Creating a reproducible workflow that brings the computation to the data.
1919
- Customising all elements of a plot and creating complex, multi-panel, publication-quality graphics.

instructors/calculating_MIST_isochrone.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ expect to find stars in GD-1.
185185
We will save this result so we can reload it later without repeating the
186186
steps in this section.
187187

188-
So we can save the data in an HDF5 file, we will put it in a Pandas
188+
So we can save the data in an HDF5 file, we will put it in a pandas
189189
`DataFrame` first:
190190

191191
```python

0 commit comments

Comments
 (0)