Skip to content

Commit 95ea0dd

Browse files
committed
Add Bug 17: root cause analysis of broadcast slice stride mismatch
Identified the root cause of bugs 1, 11, 12, 13, 14, and 15 in GetViewInternal (UnmanagedStorage.Slicing.cs:101). When slicing a broadcast array, Clone() materializes data to contiguous layout but _shape.Slice(slices) retains the original broadcast strides. This stride/data mismatch causes GetOffset to compute wrong memory offsets. Added 4 test methods proving the mechanism: - 17a: Column-broadcast [:, N] reads same row value for every row - 17b: np.copy workaround proves only stride assignment is wrong - 17c: Sliced-then-broadcast source causes out-of-bounds access - 17d: Isolates how stride mismatch causes sum(axis=0) Bug 15 Updated header: 16→17 distinct bugs, 34→38 test methods.
1 parent 714ff32 commit 95ea0dd

1 file changed

Lines changed: 298 additions & 1 deletion

File tree

test/NumSharp.UnitTest/OpenBugs.cs

Lines changed: 298 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,12 @@ namespace NumSharp.UnitTest
5050
/// - Bug 14: roll on broadcast produces zeros/wrong values
5151
/// - Bug 15: sum/mean/var/std axis=0 on column-broadcast under-counts
5252
/// - Bug 16: argsort crashes on any 2D array (not broadcast-specific)
53+
/// - Bug 17: ROOT CAUSE — GetViewInternal clones broadcast data to
54+
/// contiguous layout but attaches broadcast strides to the
55+
/// clone (UnmanagedStorage.Slicing.cs:101). This stride/data
56+
/// mismatch causes bugs 1, 11, 12, 13, 14, 15.
5357
///
54-
/// Total: 16 distinct bugs, 34 test methods.
58+
/// Total: 17 distinct bugs, 38 test methods.
5559
/// </summary>
5660
[TestClass]
5761
public class OpenBugs : TestClass
@@ -1643,5 +1647,298 @@ public void Bug_Clip_Broadcast_ThrowsNotSupported()
16431647
result.GetDouble(0, 1).Should().Be(5.0);
16441648
result.GetDouble(0, 2).Should().Be(7.0);
16451649
}
1650+
1651+
// ================================================================
1652+
//
1653+
// BUG 17: Slicing a broadcast array produces a view with
1654+
// mismatched strides vs data layout (ROOT CAUSE)
1655+
//
1656+
// SEVERITY: Critical — this is the ROOT CAUSE of bugs 1, 11,
1657+
// 12, 13, 14, and 15. Every operation that slices a broadcast
1658+
// array hits this code path.
1659+
//
1660+
// LOCATION: UnmanagedStorage.Slicing.cs, GetViewInternal(),
1661+
// lines 100-101:
1662+
//
1663+
// if (_shape.IsBroadcasted)
1664+
// return Clone().Alias(_shape.Slice(slices));
1665+
//
1666+
// WHAT HAPPENS:
1667+
//
1668+
// 1. Clone() correctly materializes the broadcast data into
1669+
// contiguous memory. For broadcast_to([[100],[200],[300]],
1670+
// (3,3)), CloneData() uses MultiIterator.Assign to produce
1671+
// 9 elements in row-major order:
1672+
// [100, 100, 100, 200, 200, 200, 300, 300, 300]
1673+
// This contiguous (3,3) data has implicit strides [3, 1].
1674+
//
1675+
// 2. _shape.Slice(slices) computes the sliced shape, but _shape
1676+
// is the ORIGINAL BROADCAST shape with strides [1, 0]. The
1677+
// resulting ViewInfo.OriginalShape inherits these broadcast
1678+
// strides [1, 0].
1679+
//
1680+
// 3. .Alias() attaches this broadcast-strided shape to the
1681+
// contiguous data. Now there is a MISMATCH: the data is
1682+
// laid out with strides [3, 1] but the shape claims [1, 0].
1683+
//
1684+
// 4. When GetOffset computes memory offsets (for GetInt32,
1685+
// iterators, reductions, etc.), it uses the broadcast
1686+
// strides [1, 0] from ViewInfo.OriginalShape. For a[:, 0]:
1687+
// GetOffset(0) = 0*1 + 0*0 = 0 → data[0] = 100 ✓
1688+
// GetOffset(1) = 1*1 + 0*0 = 1 → data[1] = 100 ✗
1689+
// GetOffset(2) = 2*1 + 0*0 = 2 → data[2] = 100 ✗
1690+
// But the correct contiguous offsets should be:
1691+
// GetOffset(0) = 0*3 + 0*1 = 0 → data[0] = 100 ✓
1692+
// GetOffset(1) = 1*3 + 0*1 = 3 → data[3] = 200 ✓
1693+
// GetOffset(2) = 2*3 + 0*1 = 6 → data[6] = 300 ✓
1694+
//
1695+
// PROOF: np.copy(a)[:, 0] returns [100, 200, 300] (correct)
1696+
// because np.copy creates a clean contiguous shape with strides
1697+
// [3, 1]. The direct slice a[:, 0] returns [100, 100, 100]
1698+
// because the shape retains broadcast strides [1, 0].
1699+
//
1700+
// The non-broadcast code path (line 103) does NOT have this
1701+
// problem because Alias(_shape.Slice(slices)) reuses the
1702+
// SAME memory (no Clone), so the strides in the OriginalShape
1703+
// match the actual data layout.
1704+
//
1705+
// WHY THIS IS THE ROOT CAUSE OF MANY BUGS:
1706+
//
1707+
// Every operation that indexes into a broadcast array goes
1708+
// through GetViewInternal and hits line 101. The returned
1709+
// storage has cloned contiguous data but broadcast strides:
1710+
//
1711+
// Bug 1: ToString iterates the sliced broadcast view
1712+
// Bug 11: flatten() iterates the sliced broadcast view
1713+
// Bug 12: concatenate copies from sliced broadcast views
1714+
// Bug 13: cumsum slices along axis on broadcast arrays
1715+
// Bug 14: roll slices along axis on broadcast arrays
1716+
// Bug 15: sum/mean/var/std axis reduction slices each
1717+
// "lane" via arr[slices] on the broadcast array
1718+
//
1719+
// In all cases, the reduction/iteration code does arr[slices]
1720+
// which calls GetViewInternal, which clones the data to
1721+
// contiguous layout but keeps broadcast strides, and then
1722+
// iteration reads from wrong memory offsets.
1723+
//
1724+
// PYTHON VERIFICATION:
1725+
// >>> a = np.broadcast_to(np.array([[100],[200],[300]]), (3,3))
1726+
// >>> a[:, 0]
1727+
// array([100, 200, 300])
1728+
// >>> a[0, :]
1729+
// array([100, 100, 100])
1730+
// >>> a[:, 0].sum()
1731+
// 600
1732+
//
1733+
// ================================================================
1734+
1735+
/// <summary>
1736+
/// BUG 17a: Slicing a column-broadcast array with [:, N] reads
1737+
/// the same row's value for every row.
1738+
///
1739+
/// The slice a[:, 0] on broadcast_to([[100],[200],[300]], (3,3))
1740+
/// should return [100, 200, 300] (column 0 of each row).
1741+
/// Instead it returns [100, 100, 100] — row 0's value repeated.
1742+
///
1743+
/// Root cause: GetViewInternal clones the data to contiguous
1744+
/// layout [3, 1] but the shape retains broadcast strides [1, 0].
1745+
/// GetOffset computes offsets 0, 1, 2 (using stride 1) instead
1746+
/// of 0, 3, 6 (using stride 3), so it reads data[0..2] which
1747+
/// are all 100 (row 0 repeated 3 times in the contiguous clone).
1748+
/// </summary>
1749+
[TestMethod]
1750+
public void Bug_SliceBroadcast_StrideMismatch_ColumnBroadcast_SliceColumn()
1751+
{
1752+
// Setup: column vector [[100],[200],[300]] broadcast to (3,3)
1753+
var col = np.array(new int[,] { { 100 }, { 200 }, { 300 } });
1754+
var a = np.broadcast_to(col, new Shape(3, 3));
1755+
1756+
// Verify the broadcast array itself is correct via coordinate access
1757+
a.GetInt32(0, 0).Should().Be(100, "broadcast[0,0] = 100");
1758+
a.GetInt32(1, 0).Should().Be(200, "broadcast[1,0] = 200");
1759+
a.GetInt32(2, 0).Should().Be(300, "broadcast[2,0] = 300");
1760+
1761+
// Slice a[:, 0] — should extract column 0: [100, 200, 300]
1762+
var sliced = a[":, 0"];
1763+
sliced.shape.Should().BeEquivalentTo(new[] { 3 });
1764+
1765+
// These assertions reproduce the bug:
1766+
// NumPy returns [100, 200, 300]. NumSharp returns [100, 100, 100].
1767+
sliced.GetInt32(0).Should().Be(100,
1768+
"a[:, 0][0] should be 100 (row 0, col 0).");
1769+
sliced.GetInt32(1).Should().Be(200,
1770+
"a[:, 0][1] should be 200 (row 1, col 0). " +
1771+
"NumSharp returns 100 because GetViewInternal clones the broadcast " +
1772+
"data to contiguous layout (strides [3,1]) but the shape retains " +
1773+
"broadcast strides [1,0]. GetOffset(1) computes 1*1=1 instead of " +
1774+
"1*3=3, reading data[1]=100 (still row 0) instead of data[3]=200.");
1775+
sliced.GetInt32(2).Should().Be(300,
1776+
"a[:, 0][2] should be 300 (row 2, col 0). " +
1777+
"NumSharp returns 100 because GetOffset(2) computes 2*1=2 instead " +
1778+
"of 2*3=6, reading data[2]=100 (still row 0) instead of data[6]=300.");
1779+
}
1780+
1781+
/// <summary>
1782+
/// BUG 17b: np.copy produces correct results, proving the data
1783+
/// materialization itself works — only the stride assignment is wrong.
1784+
///
1785+
/// np.copy creates a new contiguous array with clean strides [3,1],
1786+
/// so slicing the copy works correctly. This proves Clone()/CloneData()
1787+
/// correctly materializes the data; the bug is purely that
1788+
/// _shape.Slice(slices) attaches broadcast strides to the clone.
1789+
/// </summary>
1790+
[TestMethod]
1791+
public void Bug_SliceBroadcast_CopyWorkaround_Proves_StrideMismatch()
1792+
{
1793+
var col = np.array(new int[,] { { 100 }, { 200 }, { 300 } });
1794+
var a = np.broadcast_to(col, new Shape(3, 3));
1795+
1796+
// np.copy materializes with clean contiguous shape [3, 1]
1797+
var copy = np.copy(a);
1798+
copy.strides.Should().BeEquivalentTo(new[] { 3, 1 },
1799+
"np.copy produces contiguous strides [3,1]");
1800+
copy.Shape.IsBroadcasted.Should().BeFalse(
1801+
"np.copy clears broadcast flag");
1802+
1803+
// Slicing the copy works correctly
1804+
var copySliced = copy[":, 0"];
1805+
copySliced.GetInt32(0).Should().Be(100);
1806+
copySliced.GetInt32(1).Should().Be(200);
1807+
copySliced.GetInt32(2).Should().Be(300);
1808+
1809+
// Direct slice of broadcast gives wrong values
1810+
var directSliced = a[":, 0"];
1811+
directSliced.GetInt32(0).Should().Be(100);
1812+
directSliced.GetInt32(1).Should().Be(200,
1813+
"Direct slice a[:, 0][1] must equal np.copy(a)[:, 0][1] = 200. " +
1814+
"Both paths clone the same data; the difference is that np.copy " +
1815+
"creates clean strides [3,1] while GetViewInternal keeps broadcast " +
1816+
"strides [1,0] from _shape.Slice(slices). The data is identical, " +
1817+
"only the offset calculation differs.");
1818+
directSliced.GetInt32(2).Should().Be(300,
1819+
"Direct slice a[:, 0][2] must equal np.copy(a)[:, 0][2] = 300.");
1820+
}
1821+
1822+
/// <summary>
1823+
/// BUG 17c: Sliced-then-broadcast array: slicing triggers
1824+
/// memory corruption (Debug.Assert: index &lt; Count).
1825+
///
1826+
/// Setup: arange(12).reshape(3,4)[:,1:2] gives a (3,1) column
1827+
/// with values [[1],[5],[9]] and row stride=4 (stepping over
1828+
/// 4-wide rows). Broadcast to (3,3) then slice b[:, 0].
1829+
///
1830+
/// The source column has compound strides from the reshape+slice.
1831+
/// After Clone, data is 9 contiguous elements. But the shape
1832+
/// retains the compound sliced+broadcast strides. GetOffset
1833+
/// computes offsets using the original row stride (4), which
1834+
/// overshoots the 9-element cloned buffer, triggering
1835+
/// Debug.Assert("index &lt; Count, Memory corruption expected").
1836+
///
1837+
/// This is the same root cause as 17a but with a more severe
1838+
/// manifestation: instead of just reading wrong values, the
1839+
/// wrong strides cause out-of-bounds memory access.
1840+
///
1841+
/// We use np.copy as the control path: copy materializes with
1842+
/// clean strides, then slicing works correctly.
1843+
/// </summary>
1844+
[TestMethod]
1845+
public void Bug_SliceBroadcast_StrideMismatch_SlicedSourceRows()
1846+
{
1847+
// arange(12).reshape(3,4) = [[ 0, 1, 2, 3],
1848+
// [ 4, 5, 6, 7],
1849+
// [ 8, 9,10,11]]
1850+
// Slice column 1: [:,1:2] = [[1],[5],[9]]
1851+
// Broadcast to (3,3): [[1,1,1],[5,5,5],[9,9,9]]
1852+
var x = np.arange(12).reshape(3, 4);
1853+
var col = x[":, 1:2"]; // (3,1): [[1],[5],[9]]
1854+
var b = np.broadcast_to(col, new Shape(3, 3));
1855+
1856+
// Coordinate access on the broadcast is correct
1857+
b.GetInt32(0, 0).Should().Be(1);
1858+
b.GetInt32(1, 0).Should().Be(5);
1859+
b.GetInt32(2, 0).Should().Be(9);
1860+
1861+
// np.copy + slice works correctly (control path)
1862+
var copySliced = np.copy(b)[":, 0"];
1863+
copySliced.GetInt32(0).Should().Be(1, "copy[:, 0][0] = 1 (control)");
1864+
copySliced.GetInt32(1).Should().Be(5, "copy[:, 0][1] = 5 (control)");
1865+
copySliced.GetInt32(2).Should().Be(9, "copy[:, 0][2] = 9 (control)");
1866+
1867+
// Direct slice should give the same result but crashes:
1868+
// GetViewInternal clones data to 9 contiguous elements but
1869+
// attaches compound strides from the sliced+broadcast shape.
1870+
// GetOffset computes offsets using original row stride (4),
1871+
// which overshoots the 9-element buffer → Debug.Assert fires.
1872+
//
1873+
// This try-catch is necessary because Debug.Assert throws a
1874+
// DebugAssertException that the test platform translates to a
1875+
// process-level failure, bypassing normal exception handling.
1876+
try
1877+
{
1878+
var sliced = b[":, 0"];
1879+
// If it doesn't throw, verify values are correct
1880+
sliced.GetInt32(0).Should().Be(1,
1881+
"b[:, 0][0] should be 1 (row 0 value).");
1882+
sliced.GetInt32(1).Should().Be(5,
1883+
"b[:, 0][1] should be 5 (row 1 value).");
1884+
sliced.GetInt32(2).Should().Be(9,
1885+
"b[:, 0][2] should be 9 (row 2 value).");
1886+
}
1887+
catch (Exception ex)
1888+
{
1889+
Assert.Fail(
1890+
$"Slicing b[:, 0] on a sliced+broadcast array must not throw. " +
1891+
$"Threw {ex.GetType().Name}: {ex.Message}. " +
1892+
$"Root cause: GetViewInternal clones data to contiguous layout " +
1893+
$"but attaches sliced+broadcast strides, causing out-of-bounds " +
1894+
$"offset computation. The np.copy(b)[:, 0] control path returns " +
1895+
$"the correct values [1, 5, 9].");
1896+
}
1897+
}
1898+
1899+
/// <summary>
1900+
/// BUG 17d: The stride mismatch directly causes Bug 15a —
1901+
/// sum(axis=0) on column-broadcast returns 300 instead of 600.
1902+
///
1903+
/// The sum reduction does arr[Slice.All, Slice.Index(col)] for
1904+
/// each output column. This calls GetViewInternal, which clones
1905+
/// the data but keeps broadcast strides. The resulting 1D slice
1906+
/// reads [100, 100, 100] instead of [100, 200, 300], so
1907+
/// sum = 300 instead of 600.
1908+
///
1909+
/// This test isolates the exact mechanism: it manually performs
1910+
/// the same slice that the reduction code does, proving that the
1911+
/// slice itself returns wrong values.
1912+
/// </summary>
1913+
[TestMethod]
1914+
public void Bug_SliceBroadcast_StrideMismatch_Causes_Sum_Axis0_Bug()
1915+
{
1916+
var col = np.array(new int[,] { { 100 }, { 200 }, { 300 } });
1917+
var a = np.broadcast_to(col, new Shape(3, 3));
1918+
1919+
// This is exactly what the axis=0 reduction does internally:
1920+
// For each column, it slices arr[Slice.All, Slice.Index(col)]
1921+
// to get a 1D vector along axis 0, then sums it.
1922+
for (int c = 0; c < 3; c++)
1923+
{
1924+
var lane = a[$":, {c}"];
1925+
lane.size.Should().Be(3, $"lane for col {c} should have 3 elements");
1926+
1927+
// The lane should contain [100, 200, 300] for every column
1928+
// (because the column dimension is broadcast — all columns are identical)
1929+
var laneValues = new int[3];
1930+
for (int r = 0; r < 3; r++)
1931+
laneValues[r] = lane.GetInt32(r);
1932+
1933+
var laneSum = laneValues.Sum();
1934+
laneSum.Should().Be(600,
1935+
$"Sum of lane a[:, {c}] should be 100+200+300=600. " +
1936+
$"Got values [{string.Join(", ", laneValues)}] summing to {laneSum}. " +
1937+
"The reduction reads [100, 100, 100] (sum=300) because the slice " +
1938+
"has broadcast strides [1,0] on contiguous data [3,1]. " +
1939+
"GetOffset maps rows 0,1,2 to memory offsets 0,1,2 instead of " +
1940+
"0,3,6, so all three reads land in row 0's data region.");
1941+
}
1942+
}
16461943
}
16471944
}

0 commit comments

Comments
 (0)