Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# Real-Time Bitcoin Price Analysis with InfluxDB and PyFlink

This project demonstrates a real-time Bitcoin analytics pipeline that fetches live prices, stores them in InfluxDB, and uses NeuralProphet to forecast future prices.

---

## Folder Structure

```
bitcoin-analytics-project/
β”‚
β”œβ”€β”€ Dockerfile # Dockerfile to create the container environment for the app
β”œβ”€β”€ docker-compose.yml # Defines and runs multi-container Docker apps (InfluxDB + App)
β”œβ”€β”€ .env # Stores sensitive env variables like InfluxDB token
β”‚
β”œβ”€β”€ bitcoin_utils.py # Core API module: fetches live BTC price and computes metrics
β”‚
β”œβ”€β”€ bitcoin.API.ipynb # Interactive notebook demonstrating usage of the API class
β”œβ”€β”€ bitcoin.Fetch.ipynb # Notebook showcasing the full pipeline: streaming + forecast
β”‚
β”œβ”€β”€ bitcoin.API.md # Markdown documentation for the API class and its usage
β”œβ”€β”€ bitcoin.fetch.md # Full end-to-end markdown doc showing real-time + forecast demo
β”‚
└── README.md # Instructions to build, run, and test the entire project
```

---

## Prerequisites

- Install **Docker** and **Docker Compose**.
- Ensure the following ports are free:
- `8086` (for InfluxDB)
- `8888` (for Jupyter Notebook)
- Clone the repository:

```bash
git clone https://github.com/yourusername/bitcoin-price-analysis.git
cd bitcoin-price-analysis
```

---

## πŸ”§ Step 1: Start InfluxDB (Initial Setup Only)

```bash
docker-compose up influxdb
```

Then, open your browser and go to:
http://localhost:8086

Complete the setup form using the following:

- **Username**: `admin`
- **Password**: `admin123`
- **Organization**: `crypto`
- **Bucket**: `bitcoin_prices`

### Generate Token

1. Go to the **"Data"** section in the left sidebar.
2. Navigate to the **"Tokens"** tab.
3. Click **"Generate Token" β†’ "All-Access Token"**.
4. **Copy the generated token**.

---

## Step 2: Save the Token

Paste the token inside your `.env` file:

```
INFLUXDB_TOKEN=<YOUR_GENERATED_TOKEN_HERE>
```

> Do not commit `.env` to GitHub.

---

## πŸ”„ Step 3: Restart with Full Application

First, shut down InfluxDB:

```bash
Ctrl + C
```

Then clean and restart everything:

```bash
docker-compose down -v
docker-compose up --build
```

---

## πŸ““ Access Jupyter Notebook

Once up, visit:
http://localhost:8888

Open and run the following notebooks:

- `bitcoin.API.ipynb`
- `bitcoin.Fetch.ipynb`

---

## Notebook Summaries

### `bitcoin.API.ipynb`

**Purpose**: Demonstrates how to use the `BitcoinPriceSource` API class.

**What it does**:
- Initializes the price source.
- Iterates over 10 fetches from the CoinGecko API.
- Prints:
- Timestamped Bitcoin prices
- Moving Average (MA)
- Standard Deviation
- Exponential Moving Average (EMA)
- Trend and Cumulative Return

---

### `bitcoin.Fetch.ipynb`

**Purpose**: Complete forecast pipeline.

**What it does**:
- Uses Pyflink a apache flink framework to extract real time data and print it
- Fetches historical BTC data using `yfinance`.
- Trains a `NeuralProphet` model.
- Forecasts prices for the next 365 days.
- Visualizes:
- Forecasted prices
- Trend, weekly, and yearly seasonality components
- Prints forecast for the next 7 days.


**Why we use two Docker containers for clean separation of concerns:**

- **influxdb_container**: Runs the InfluxDB service to store time-series data.
- **umd_data605_app**: Runs the application (Python + Jupyter + PyFlink) that fetches Bitcoin prices and sends metrics to InfluxDB.

Keeping them separate ensures:

- Each container has a single responsibility.
- Easier debugging, scaling, and maintenance.
- Flexibility to replace or upgrade one service without touching the other.


**Why Docker Network**

Docker containers are isolated by default. To allow them to communicate (e.g., the app pushing data into InfluxDB), we connect them using a custom bridge network (flink_influx_network):
This makes sure:

- The app can reach InfluxDB at http://influxdb_container:8086 (container name acts like a hostname).
- Both services remain discoverable to each other but isolated from the host unless explicitly exposed.



**Why Set Up InfluxDB and Generate Tokens**

- InfluxDB 2.x uses token-based authentication for secure access.
- On first-time setup, we must:
- Run only the InfluxDB container.
- Open http://localhost:8086 and manually:
- Create an admin user, org, and bucket.
- Generate an All-Access Token.
- This token is needed so the app container can authenticate and write metrics to the InfluxDB service securely.
- Once the token is created:
- We store it in a .env file.
- It is automatically injected into the app via docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
FROM ubuntu:20.04

# Set environment variables to avoid interactive prompts
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Etc/UTC

# Install system dependencies
RUN apt-get update && apt-get upgrade -y && \
apt-get install -y --no-install-recommends \
openjdk-11-jdk \
python3 \
python3-pip \
python3-dev \
curl \
git \
build-essential \
tzdata && \
apt-get clean && rm -rf /var/lib/apt/lists/*

# Set JAVA path for Flink (needed on ARM64)
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-arm64
ENV PATH=$JAVA_HOME/bin:$PATH

# Upgrade pip and install base Python packages (excluding prophet/pystan)
RUN python3 -m pip install --upgrade pip setuptools wheel && \
pip install \
apache-flink==1.17.1 \
ipython \
tornado==6.1 \
jupyter-client==7.3.2 \
jupyter-contrib-core \
jupyter-contrib-nbextensions \
notebook \
influxdb-client \
requests \
psycopg2-binary \
yapf \
numpy \
pandas \
matplotlib \
yfinance

# Create working directory
WORKDIR /data

# Expose Jupyter port
EXPOSE 8888

# Start Jupyter on container boot
CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--allow-root", "--NotebookApp.token=''"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "469b4c2b",
"metadata": {},
"source": [
"### Real-Time Bitcoin Price Streaming: Sample API Usage"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "03626841",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2025-05-18 01:28:25.250286] Price: $103324.00\n",
"[1] Timestamp: 1747531705250, Price: $103324.00\n",
"[2025-05-18 01:28:55.354586] Price: $103319.00\n",
"[2] Timestamp: 1747531735354, Price: $103319.00\n",
"[2025-05-18 01:29:25.476993] Price: $103319.00\n",
"[3] Timestamp: 1747531765476, Price: $103319.00\n",
"[2025-05-18 01:29:55.613666] Price: $103313.00\n",
"[4] Timestamp: 1747531795613, Price: $103313.00\n",
"[2025-05-18 01:30:25.956787] Price: $103313.00\n",
"[5] Timestamp: 1747531825956, Price: $103313.00\n",
"[2025-05-18 01:30:56.115765] Price: $103302.00\n",
"[6] Timestamp: 1747531856115, Price: $103302.00\n",
"[2025-05-18 01:31:26.224400] Price: $103302.00\n",
"[7] Timestamp: 1747531886224, Price: $103302.00\n",
"[2025-05-18 01:31:56.332820] Price: $103302.00\n",
"[8] Timestamp: 1747531916332, Price: $103302.00\n",
"[2025-05-18 01:32:26.440963] Price: $103297.00\n",
"[9] Timestamp: 1747531946440, Price: $103297.00\n",
"[2025-05-18 01:32:56.540052] Price: $103297.00\n",
"-----> MA: $103308.80, StdDev: 9.44, EMA: $103297.00, Max: $103324.00, Min: $103297.00\n",
" Trend: -1, Cumulative Return: -0.03%, 24h Change: 0.35%\n",
"\n",
"[10] Timestamp: 1747531976540, Price: $103297.00\n"
]
}
],
"source": [
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"from bitcoin_utils import BitcoinPriceSource\n",
"import itertools\n",
"\n",
"# Initialize source with window\n",
"source = BitcoinPriceSource()\n",
"\n",
"# Simulate just one fetch loop iteration for demonstration\n",
"for i, (timestamp, price) in zip(range(10), source):\n",
" print(f\"[{i+1}] Timestamp: {timestamp}, Price: ${price:.2f}\")\n"
]
},
{
"cell_type": "markdown",
"id": "1a8b3ccd",
"metadata": {},
"source": [
"The output shows Bitcoin price fetches every 30 seconds along with computed statistics once enough data points are collected## Price Fetch Logs\n",
"Each fetch prints the current UTC timestamp and Bitcoin price. It also prints a count of the fetch number, the timestamp in milliseconds, and the price.\n",
"\n",
"## Computed Metrics\n",
"After collecting 10 price points (the window size), the script calculates and displays:\n",
"\n",
"- **Moving Average (MA)** of prices in the window\n",
"- **Standard Deviation (StdDev)**, measuring price volatility\n",
"- **Exponential Moving Average (EMA)**, which weights recent prices more\n",
"- **Maximum and Minimum** prices in the window\n",
"- **Trend indicator**: -1 means the price is trending down, 1 means up\n",
"- **Cumulative Return**: percentage price change over the window\n",
"- **24h Change**: price change in last 24 hours reported by the API\n",
"\n",
"## Summary\n",
"The script fetches prices repeatedly, and after every 10 samples, it summarizes recent price behavior to help analyze short-term market trends and volatility.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading
Loading