DataFrame provides several methods for retrieving processed data. These methods are trigger operations that execute the entire pipeline.
These methods use generators to maintain constant memory usage regardless of dataset size:
<?php
use function Flow\ETL\DSL\{data_frame, from_array};
$dataFrame = data_frame()->read(from_array($largeDataset));
foreach ($dataFrame->get() as $rows) {
echo "Processing batch of " . $rows->count() . " rows\n";
// Process each batch
foreach ($rows as $row) {
// Process individual row
}
}<?php
foreach ($dataFrame->getEach() as $row) {
echo "ID: " . $row->get('id')->value() . "\n";
echo "Name: " . $row->get('name')->value() . "\n";
}<?php
foreach ($dataFrame->getAsArray() as $rowsArray) {
// $rowsArray is an array of arrays
foreach ($rowsArray as $rowArray) {
echo "ID: " . $rowArray['id'] . "\n";
}
}<?php
foreach ($dataFrame->getEachAsArray() as $rowArray) {
echo "ID: " . $rowArray['id'] . "\n";
echo "Name: " . $rowArray['name'] . "\n";
}<?php
// Fetch limited results (safe)
$firstTen = $dataFrame->fetch(10);
foreach ($firstTen as $row) {
// Process row
}
// Fetch all results (dangerous for large datasets!)
$allRows = $dataFrame->fetch(); // Can cause memory exhaustion
⚠️ Memory Warning: Thefetch()method loads all requested rows into memory at once. Without a limit parameter, it will attempt to load the entire dataset into memory, which can cause memory exhaustion. Always use with a reasonable limit or prefer generator-based methods.
<?php
$totalCount = $dataFrame->count();
echo "Total rows: $totalCount\n";
⚠️ Performance Warning: Thecount()method must process the entire dataset to return the total count, which can be expensive for large datasets. Consider whether you actually need the exact count or if an approximation would suffice.
<?php
$dataFrame->forEach(function (Rows $rows) {
echo "Processing batch of " . $rows->count() . " rows\n";
// Custom processing logic
});