Skip to content

昨日のデータを取得、過去分のデータを取得するように修正#53

Open
lamrongol wants to merge 2 commits into
boatraceopenapi:gh-pagesfrom
lamrongol:get_past
Open

昨日のデータを取得、過去分のデータを取得するように修正#53
lamrongol wants to merge 2 commits into
boatraceopenapi:gh-pagesfrom
lamrongol:get_past

Conversation

@lamrongol

@lamrongol lamrongol commented Jun 24, 2026

Copy link
Copy Markdown

boatraceopenapi/results#25
と同じく、「本日」分のデータ取得を行おうとすると結果のWebサイト反映までに時間がかかり一部データ(時刻が遅いレースのデータ)を取得できない可能性があるため、一日過ぎてからデータを取得

Summary by CodeRabbit

  • New Features
    • スクレイピング結果の自動更新用ワークフローを追加し、定期実行と手動実行の両方に対応しました。
  • Bug Fixes
    • 取得対象日を前日基準に切り替え、日付管理を安定化しました。
    • 記録済みの日付情報を更新し、対象JSONの保存タイミングを見直しました。
  • Chores
    • 定期実行のスケジュールを変更し、実行環境のPHPバージョンを更新しました。

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

scraper.php の日付基準を昨日へ変更し、指定日モードを追加しました。あわせて過去取得用の get_past.yml を新設し、cron.yml の実行時刻と PHP バージョン指定を更新しました。docs/v3/recorded_day.json も更新されています。

Changes

過去データ取得フロー追加

Layer / File(s) Summary
scraper.php の日付判定と保存分岐
scraper.php
$day_specified をコマンドライン引数で判定し、東京時間の基準日を本日から昨日へ変更。指定時は recorded_day.json を読み込んで1日前へ進め、2020-01-01 より前なら終了する。保存先は年別 JSON を維持しつつ、指定時は recorded_day.json、非指定時は yesterday.json に分岐した。
get_past.yml ワークフロー新設
.github/workflows/get_past.yml, docs/v3/recorded_day.json
2時間ごとの cron と手動実行で run_for_past ジョブを開始し、PHP 8.5 で依存を再構成した後に php scraper.php v3 True を実行する。docs/v3 の JSON 変更がある場合のみ再同期してコミットし、push 失敗時は rebase+autostash 後に再試行する。docs/v3/recorded_day.json の記録日も更新された。
cron.yml スケジュールと PHP 指定
.github/workflows/cron.yml
cron の実行時刻を 0,30 * * * * から 30 5 * * * に変更した。deploy-v3 を先頭に配置し、Setup PHPphp-version8.4 から 8.5 に更新した。

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Poem

🐰 きのうの粒を集めてぴょん、
朝五時半に cron がこつこつ。
get_past.yml が道をひらき、
recorded_day が静かに更新。
ぴかぴか JSON、森に並ぶよ 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed 主な変更点である「昨日以降のデータ取得・過去分取得への切り替え」を適切に表しており、内容も関連しています。
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/get_past.yml:
- Around line 26-29: The checkout step currently leaves Git credentials
available while later Composer steps may execute untrusted dependency code.
Update the Checkout Code action in the workflow to disable credential
persistence via the actions/checkout configuration, and ensure any push-capable
remote authentication is applied only immediately before the push logic that
follows the Composer install/update path. Use the existing Checkout Code step
and the downstream composer/push steps as the anchors for this change.

In `@scraper.php`:
- Line 20: The recorded_day.json handling in scraper.php is treating a .json
file as raw text, which makes the generated file invalid JSON. Update the
read/write flow around the Carbon date parsing and the corresponding save logic
to use json_decode/json_encode consistently, or rename the file if it is meant
to be plain text. Make sure the affected logic around recorded_day.json is
aligned so both the existing read path and the write path produce valid JSON.
- Around line 20-23: The threshold comparison in scraper.php uses
Carbon::parse('2020-01-01') without an explicit timezone, so the isBefore check
can behave differently in UTC-based environments. Update the comparison in the
scraper logic to parse the threshold date with Asia/Tokyo (JST) so it matches
the timezone already applied to $date, and keep the fix localized around the
Carbon::parse and isBefore usage.
- Around line 47-50: The save call in scraper.php is using an undefined
variable, so the data passed to PreviewSaver::save is wrong. In the branch that
writes docs/{$version}/yesterday.json, replace the use of $odds with the
previews variable obtained earlier in the scraper flow (the one returned before
this conditional), and keep the storage path unchanged. Check the conditional
block around file_put_contents and $storage->save so the saved payload matches
the array expected by PreviewSaver::save.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4f24769f-6db1-48c2-bce1-240e82f1789f

📥 Commits

Reviewing files that changed from the base of the PR and between e5c7aa8 and ed9f5a5.

📒 Files selected for processing (5)
  • .github/workflows/cron.yml
  • .github/workflows/get_past.yml
  • docs/v3/recorded_day.json
  • docs/v3/today.json
  • scraper.php

Comment on lines +26 to +29
- name: Checkout Code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd
with:
fetch-depth: 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

checkout の認証情報を Composer 実行前に残さないでください。

Line 26-29 は checkout した資格情報をそのまま保持した状態で、後続の composer install/update で外部依存コードを実行しています。これだと supply-chain 側から push 権限付きトークンを読まれる余地があります。persist-credentials: false にして、push 直前だけ remote URL にトークンを設定する形へ分離した方が安全です。

修正例
       - name: Checkout Code
         uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd
         with:
           fetch-depth: 0
+          persist-credentials: false
...
       - name: Deploy JSON for scraper v3
         run: |
           git config --global user.name "github-actions[bot]"
           git config --global user.email "github-actions[bot]`@users.noreply.github.com`"
+          git remote set-url origin "https://x-access-token:${GITHUB_TOKEN}`@github.com/`${GITHUB_REPOSITORY}.git"
🧰 Tools
🪛 zizmor (1.26.1)

[warning] 26-29: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/get_past.yml around lines 26 - 29, The checkout step
currently leaves Git credentials available while later Composer steps may
execute untrusted dependency code. Update the Checkout Code action in the
workflow to disable credential persistence via the actions/checkout
configuration, and ensure any push-capable remote authentication is applied only
immediately before the push logic that follows the Composer install/update path.
Use the existing Checkout Code step and the downstream composer/push steps as
the anchors for this change.

Source: Linters/SAST tools

Comment thread scraper.php
// 昨日の日付を東京時間で取得
$date = Carbon::yesterday('Asia/Tokyo');
if ($day_specified) {
$date = new Carbon(file_get_contents("docs/{$version}/recorded_day.json"))->timezone('Asia/Tokyo');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

recorded_day.json を JSON として扱っていません。

Line 20/48 は .json ファイルを生文字列で read/write しており、追加された docs/v3/recorded_day.json は JSON として不正です。現に静的解析も parse error になっています。JSON を使うなら json_encode/json_decode に揃えるか、プレーンテキスト用途なら拡張子を変えた方が安全です。

JSON に揃える例
-    $date = new Carbon(file_get_contents("docs/{$version}/recorded_day.json"))->timezone('Asia/Tokyo');
+    $recordedDay = json_decode((string) file_get_contents("docs/{$version}/recorded_day.json"), true, 512, JSON_THROW_ON_ERROR);
+    $date = new Carbon($recordedDay['date'])->timezone('Asia/Tokyo');
...
-    file_put_contents("docs/{$version}/recorded_day.json", $date->toDateString());
+    file_put_contents(
+        "docs/{$version}/recorded_day.json",
+        json_encode(['date' => $date->toDateString()], JSON_THROW_ON_ERROR)
+    );

Also applies to: 48-48

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scraper.php` at line 20, The recorded_day.json handling in scraper.php is
treating a .json file as raw text, which makes the generated file invalid JSON.
Update the read/write flow around the Carbon date parsing and the corresponding
save logic to use json_decode/json_encode consistently, or rename the file if it
is meant to be plain text. Make sure the affected logic around recorded_day.json
is aligned so both the existing read path and the write path produce valid JSON.

Source: Linters/SAST tools

Comment thread scraper.php
Comment on lines +20 to +23
$date = new Carbon(file_get_contents("docs/{$version}/recorded_day.json"))->timezone('Asia/Tokyo');
$date = $date->subDay();
if ($date->isBefore(Carbon::parse('2020-01-01'))) {
exit;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

閾値日の比較に JST を明示してください。

Line 22 は比較対象だけタイムゾーン未指定です。GitHub Actions のようにデフォルトが UTC の環境だと、2020-01-01 00:00 JST が「閾値より前」と判定されて、2020-01-01 分を1日早く打ち切ります。

修正例
-    if ($date->isBefore(Carbon::parse('2020-01-01'))) {
+    if ($date->isBefore(Carbon::parse('2020-01-01', 'Asia/Tokyo'))) {
         exit;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
$date = new Carbon(file_get_contents("docs/{$version}/recorded_day.json"))->timezone('Asia/Tokyo');
$date = $date->subDay();
if ($date->isBefore(Carbon::parse('2020-01-01'))) {
exit;
$date = new Carbon(file_get_contents("docs/{$version}/recorded_day.json"))->timezone('Asia/Tokyo');
$date = $date->subDay();
if ($date->isBefore(Carbon::parse('2020-01-01', 'Asia/Tokyo'))) {
exit;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scraper.php` around lines 20 - 23, The threshold comparison in scraper.php
uses Carbon::parse('2020-01-01') without an explicit timezone, so the isBefore
check can behave differently in UTC-based environments. Update the comparison in
the scraper logic to parse the threshold date with Asia/Tokyo (JST) so it
matches the timezone already applied to $date, and keep the fix localized around
the Carbon::parse and isBefore usage.

Comment thread scraper.php Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
scraper.php (2)

22-23: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

閾値日の比較に JST を明示してください。

比較側の Carbon::parse('2020-01-01') がTZ未指定のため、JST境界で早期打ち切りが起きます。Asia/Tokyo を明示してください。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scraper.php` around lines 22 - 23, The threshold date comparison in the
early-exit check is using Carbon::parse without an explicit timezone, which can
shift the boundary at JST. Update the comparison in the scraper logic so the
Carbon parse call explicitly uses Asia/Tokyo, and keep the check aligned with
the existing $date->isBefore flow to ensure the cutoff is evaluated in JST.

20-20: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

recorded_day.json の read/write が JSON 契約と不整合です。

.json を生文字列で扱っており、ファイル形式契約が崩れています。json_decode/json_encode に統一するか、プレーンテキスト用途に拡張子を変更してください。

Also applies to: 48-48

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scraper.php` at line 20, The recorded_day.json access in scraper.php is
treating a JSON file as plain text, so align the read/write contract in the code
path that builds the Carbon date (and the related write at the other referenced
location). Update the logic around the file_get_contents/Carbon usage to parse
via json_decode when reading and json_encode when persisting, or rename the file
if it is meant to store plain text, so the contract is consistent across the
scraper flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@scraper.php`:
- Around line 22-23: The threshold date comparison in the early-exit check is
using Carbon::parse without an explicit timezone, which can shift the boundary
at JST. Update the comparison in the scraper logic so the Carbon parse call
explicitly uses Asia/Tokyo, and keep the check aligned with the existing
$date->isBefore flow to ensure the cutoff is evaluated in JST.
- Line 20: The recorded_day.json access in scraper.php is treating a JSON file
as plain text, so align the read/write contract in the code path that builds the
Carbon date (and the related write at the other referenced location). Update the
logic around the file_get_contents/Carbon usage to parse via json_decode when
reading and json_encode when persisting, or rename the file if it is meant to
store plain text, so the contract is consistent across the scraper flow.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: af775919-0bc5-4fba-a6ac-453da84d65e6

📥 Commits

Reviewing files that changed from the base of the PR and between ed9f5a5 and 4d02fdd.

📒 Files selected for processing (2)
  • .github/workflows/cron.yml
  • scraper.php

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant