Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading

# Bug report

### Bug description:


Several methods from the C implementation of the itertools module are not yet safe to use under the free-threading build. In this issue we list several issues to be addressed. The issues below are discussed for `itertools.product`, but the issues are similar for the other classes.

- When iterating over `product` the result tuple is re-used when the reference count is 1.  We can use the new [`_PyObject_IsUniquelyReferenced`](https://github.com/python/cpython/blob/58ce131037ecb34d506a613f21993cde2056f628/Include/internal/pycore_object.h#L165) method to perform the check whether we can re-use the tuple. (this issue was also reported in https://github.com/python/cpython/issues/121464)

- On the first invocation of `product` a new result is constructed.

https://github.com/python/cpython/blob/58ce131037ecb34d506a613f21993cde2056f628/Modules/itertoolsmodule.c#L2038-L2044

This is not thread-safe, as multiple threads could have `result == NULL` evaluate to true. We could move the construction of the `productobject.result` to the constructor of `product`. This does mean that `product` will use more memory before the first invocation of `next`. This seems to be acceptable, as constructing a `product` without iterating over it seems rare in practice. 
The tuple also needs to be filled with data. For `product` it seems safe to do this in the constructor, as the data is coming
from `productobject->pools` which is a tuple of tuples. But for `pairwise` the data is coming from an iterable

https://github.com/python/cpython/blob/58ce131037ecb34d506a613f21993cde2056f628/Modules/itertoolsmodule.c#L337-L343

which could be a generator. Reading data from the iterator before the first invocation of `pairwise_next` seems like a behavior change we do not want to make.

An alternative is to use some kind of locking inside `product_next`, but the locking should not add any overhead in the common path otherwise the single-thread performance will suffer.
 
- In case iterables are exhausted some cleaning up is done. For example in `pairwise_next` at

https://github.com/python/cpython/blob/58ce131037ecb34d506a613f21993cde2056f628/Modules/itertoolsmodule.c#L352-L356

This cleaning up is not safe in concurrent iteration. Instead we can defer the cleaning up untill the object itself is decallocated (this approach was used for `reversed`, see https://github.com/python/cpython/pull/120971/files#r1653313765)

- Actually constructing the new result requires some care as well. Even if we are fine with having funny results under concurrent iteration (see the discussion https://github.com/python/cpython/issues/120496), the concurrent iteration should not corrupt the interpreter. For example this code is not safe:

https://github.com/python/cpython/blob/58ce131037ecb34d506a613f21993cde2056f628/Modules/itertoolsmodule.c#L2077-L2088

If two threads both increment `indices[i]` the check on line 2078 is never true end we end up indexing `pool` with `PyTuple_GET_ITEM` outside the bounds on line 2088. Here we could change the check into `indices[i] >= PyTuple_GET_SIZE(pool)`. That is equivalent for the single-threaded case, but does not lead to out-of-bounds indexing in the multi-threaded case (although it does lead to funny results!)

@rhettinger @colesbury Any input on the points above would be welcome.


### CPython versions tested on:

CPython main branch

### Operating systems tested on:

_No response_


### Linked PRs
* gh-123848
* gh-125417
* gh-129416
* gh-131212
* gh-131247
* gh-132814
* gh-135689
* gh-144402
* gh-144486
* gh-144489
* gh-144528
* gh-146021
* gh-146033
* gh-148348

	if (result == NULL) {
	/* On the first pass, return an initial tuple filled with the
	first element from each pool. */
	result = PyTuple_New(npools);
	if (result == NULL)
	goto empty;
	lz->result = result;

	if (old == NULL) {
	old = (*Py_TYPE(it)->tp_iternext)(it);
	Py_XSETREF(po->old, old);
	if (old == NULL) {
	Py_CLEAR(po->it);
	return NULL;
	}

	if (new == NULL) {
	Py_CLEAR(po->it);
	Py_CLEAR(po->old);
	Py_DECREF(old);
	return NULL;

	indices[i]++;
	if (indices[i] == PyTuple_GET_SIZE(pool)) {
	/* Roll-over and advance to next pool */
	indices[i] = 0;
	elem = PyTuple_GET_ITEM(pool, 0);
	Py_INCREF(elem);
	oldelem = PyTuple_GET_ITEM(result, i);
	PyTuple_SET_ITEM(result, i, elem);
	Py_DECREF(oldelem);
	} else {
	/* No rollover. Just increment and stop here. */
	elem = PyTuple_GET_ITEM(pool, indices[i]);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading #123471

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading #123471

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions