**Long Term Overall- **To successfully construct a model to accurately predict the age of the Abalone based on their physical characteristics.
Intermediate Goals:
- To create a model that produces an accuracy score of above 80%.
- To minimize and avoid possible group attribution bias.
- To complete the project by
- Download data set
- Turn dataset into data frame.
- Exploratory data analysis(repair data, find missing values, etc.)
- Define target column(s) as the number of rings (which determine age which we want to predict).
- Define feature column(s) based on rest of the columns
- Find non-numerical columns, determine whether to one-hot encode or remove.
- Train/test split (70% train, 30% test)
- Create a Gaussian Naïve Bayes classifier model.
- Fit model to training data.
- Get predictions for test set target values.
- Get accuracy using predictions for testing data.
- Start with state of data frame after part 1 step 3.
- Remove features and/or change targets for other models.
- Repeat steps (part 1) 6 through 11.
Kenneth : Program Leader
Schedule to start: Throughout the CapStone
Isaiah : Lead Code Designer
Schedule to start: Throughout Code Development Process
Erika : Note Taker
Schedule to start: Throughout the CapStone
Adam : Auditor
Schedule to start: Once the code has been completed
Bobbi : Presentation Designer
Schedule to start: Once we have all our findings from implementation of the model/code
We want to find out how we can accurately predict the age of abalone based on their physical traits. Traditionally, their age is determined by cutting and staining the shell. Then, counting the rings on the shell. The challenge we need to face is finding what other characteristics can we use to determine the age of the abalone. Not only do we want to find another method to determine the age of abalone, but we want to build a more efficient model that’s just as accurate, if not more than previous methods.
Motivated Questions
- What physical traits in abalone change over time?
- What other existential factors will produce bias and how can we mitigate them?
- What model best suits the data provided?
- How can our findings improve future research?