Skip to content

Commit e51fc18

Browse files
authored
Merge pull request #1466 from Unidata/fix_issue1464
change default encoding for stringtochar/chartostring
2 parents a332de8 + aab4fb3 commit e51fc18

5 files changed

Lines changed: 38 additions & 21 deletions

File tree

Changelog

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
version 1.7.4.1 (tag v1.7.4.1rel)
2+
=================================
3+
* Change default encoding for stringtochar/chartostring functions from 'utf-8' to 'utf-8'/'ascii' for dtype.kind='U'/'S'
4+
(issue #1464).
5+
16
version 1.7.4 (tag v1.7.4rel)
27
================================
38
* Make sure automatic conversion of character arrays <--> string arrays works for Unicode strings (issue #1440).

docs/index.html

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1226,7 +1226,7 @@ <h2 id="support-for-complex-numbers">Support for complex numbers</h2>
12261226
<h2 class="section-title" id="header-functions">Functions</h2>
12271227
<dl>
12281228
<dt id="netCDF4.chartostring"><code class="name flex">
1229-
<span>def <span class="ident">chartostring</span></span>(<span>b, encoding='utf-8')</span>
1229+
<span>def <span class="ident">chartostring</span></span>(<span>b, encoding=None)</span>
12301230
</code></dt>
12311231
<dd>
12321232
<div class="desc"><p><strong><code>chartostring(b,encoding='utf-8')</code></strong></p>
@@ -1236,8 +1236,8 @@ <h2 class="section-title" id="header-functions">Functions</h2>
12361236
Will be converted to a array of strings, where each string has a fixed
12371237
length of <code>b.shape[-1]</code> characters.</p>
12381238
<p>optional kwarg <code>encoding</code> can be used to specify character encoding (default
1239-
<code>utf-8</code>). If <code>encoding</code> is 'none' or 'bytes', a <code>numpy.string_</code> byte array is
1240-
returned.</p>
1239+
<code>utf-8</code> for dtype=<code>'UN'</code> or <code>ascii</code> for dtype=<code>'SN'</code>). If <code>encoding</code> is 'none' or 'bytes',
1240+
a <code>numpy.string_</code> byte array is returned.</p>
12411241
<p>returns a numpy string array with datatype <code>'UN'</code> (or <code>'SN'</code>) and shape
12421242
<code>b.shape[:-1]</code> where where <code>N=b.shape[-1]</code>.</p></div>
12431243
</dd>
@@ -1254,7 +1254,7 @@ <h2 class="section-title" id="header-functions">Functions</h2>
12541254
<p><strong>calendar</strong>: describes the calendar to be used in the time calculations.
12551255
All the values currently defined in the
12561256
<code>CF metadata convention &lt;http://cfconventions.org/cf-conventions/cf-conventions#calendar&gt;</code>__ are supported.
1257-
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian'
1257+
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian', 'tai',
12581258
'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'</strong>.
12591259
Default is <code>None</code> which means the calendar associated with the first
12601260
input datetime instance will be used.</p>
@@ -1305,7 +1305,7 @@ <h2 class="section-title" id="header-functions">Functions</h2>
13051305
<p><strong>calendar</strong>: describes the calendar to be used in the time calculations.
13061306
All the values currently defined in the
13071307
<code>CF metadata convention &lt;http://cfconventions.org/cf-conventions/cf-conventions#calendar&gt;</code>__ are supported.
1308-
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian'
1308+
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian', 'tai',
13091309
'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'</strong>.
13101310
Default is <code>None</code> which means the calendar associated with the first
13111311
input datetime instance will be used.</p>
@@ -1381,7 +1381,7 @@ <h2 class="section-title" id="header-functions">Functions</h2>
13811381
<p><strong>calendar</strong>: describes the calendar used in the time calculations.
13821382
All the values currently defined in the
13831383
<code>CF metadata convention &lt;http://cfconventions.org/cf-conventions/cf-conventions#calendar&gt;</code>__ are supported.
1384-
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian'
1384+
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian', 'tai',
13851385
'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'</strong>.
13861386
Default is <strong>'standard'</strong>, which is a mixed Julian/Gregorian calendar.</p>
13871387
<p><strong>only_use_cftime_datetimes</strong>: if False, python datetime.datetime
@@ -1476,7 +1476,7 @@ <h2 class="section-title" id="header-functions">Functions</h2>
14761476
(default) or <code>'U1'</code> (if dtype=<code>'U'</code>)</p></div>
14771477
</dd>
14781478
<dt id="netCDF4.stringtochar"><code class="name flex">
1479-
<span>def <span class="ident">stringtochar</span></span>(<span>a, encoding='utf-8', n_strlen=None)</span>
1479+
<span>def <span class="ident">stringtochar</span></span>(<span>a, encoding=None, n_strlen=None)</span>
14801480
</code></dt>
14811481
<dd>
14821482
<div class="desc"><p><strong><code>stringtochar(a,encoding='utf-8',n_strlen=None)</code></strong></p>
@@ -1487,8 +1487,8 @@ <h2 class="section-title" id="header-functions">Functions</h2>
14871487
Will be converted to
14881488
an array of characters (datatype <code>'S1'</code> or <code>'U1'</code>) of shape <code>a.shape + (N,)</code>.</p>
14891489
<p>optional kwarg <code>encoding</code> can be used to specify character encoding (default
1490-
<code>utf-8</code>). If <code>encoding</code> is 'none' or 'bytes', a <code>numpy.string_</code> the input array
1491-
is treated a raw byte strings (<code>numpy.string_</code>).</p>
1490+
<code>utf-8</code> for dtype=<code>'UN'</code> or <code>ascii</code> for dtype=<code>'SN'</code>). If <code>encoding</code> is 'none' or 'bytes',
1491+
a <code>numpy.string_</code> the input array is treated a raw byte strings (<code>numpy.string_</code>).</p>
14921492
<p>optional kwarg <code>n_strlen</code> is the number of characters in each string.
14931493
Default
14941494
is None, which means <code>n_strlen</code> will be set to a.itemsize (the number of bytes

examples/tutorial.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,7 @@ def walktree(top):
163163
datac2.imag = datain['imag']
164164
print(datac.dtype,datac)
165165
print(datac2.dtype,datac2)
166+
nc.close()
166167

167168
# more complex compound type example.
168169
nc = Dataset('compound_example.nc','w') # create a new dataset.

src/netCDF4/__init__.pyi

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -704,7 +704,8 @@ def stringtochar(
704704
@overload
705705
def stringtochar(
706706
a: npt.NDArray[np.character],
707-
encoding: str = ...,
707+
encoding: str | None = None,
708+
n_strlen: int | None = None,
708709
) -> npt.NDArray[np.str_] | npt.NDArray[np.bytes_]: ...
709710
@overload
710711
def chartostring(
@@ -714,7 +715,7 @@ def chartostring(
714715
@overload
715716
def chartostring(
716717
b: npt.NDArray[np.character],
717-
encoding: str = ...,
718+
encoding: str | None = None,
718719
) -> npt.NDArray[np.str_] | npt.NDArray[np.bytes_]: ...
719720
def getlibversion() -> str: ...
720721
def rc_get(key: str) -> str | None: ...

src/netCDF4/_netCDF4.pyx

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Version 1.7.4
1+
"""Version 1.7.4.1
22
-------------
33
44
# Introduction
@@ -1283,7 +1283,7 @@ import sys
12831283
import functools
12841284
from typing import Union
12851285

1286-
__version__ = "1.7.4"
1286+
__version__ = "1.7.4.1"
12871287

12881288
# Initialize numpy
12891289
import posixpath
@@ -6788,7 +6788,7 @@ returns a rank 1 numpy character array of length NUMCHARS with datatype `'S1'`
67886788
arr[0:len(string)] = tuple(string)
67896789
return arr
67906790

6791-
def stringtochar(a,encoding='utf-8',n_strlen=None):
6791+
def stringtochar(a,encoding=None,n_strlen=None):
67926792
"""
67936793
**`stringtochar(a,encoding='utf-8',n_strlen=None)`**
67946794
@@ -6799,8 +6799,8 @@ is the number of characters in each string. Will be converted to
67996799
an array of characters (datatype `'S1'` or `'U1'`) of shape `a.shape + (N,)`.
68006800
68016801
optional kwarg `encoding` can be used to specify character encoding (default
6802-
`utf-8`). If `encoding` is 'none' or 'bytes', a `numpy.string_` the input array
6803-
is treated a raw byte strings (`numpy.string_`).
6802+
`utf-8` for dtype=`'UN'` or `ascii` for dtype=`'SN'`). If `encoding` is 'none' or 'bytes',
6803+
a `numpy.string_` the input array is treated a raw byte strings (`numpy.string_`).
68046804
68056805
optional kwarg `n_strlen` is the number of characters in each string. Default
68066806
is None, which means `n_strlen` will be set to a.itemsize (the number of bytes
@@ -6809,10 +6809,15 @@ used to represent each string in the input array).
68096809
returns a numpy character array with datatype `'S1'` or `'U1'`
68106810
and shape `a.shape + (N,)`, where N is the length of each string in a."""
68116811
dtype = a.dtype.kind
6812-
if n_strlen is None:
6813-
n_strlen = a.dtype.itemsize
68146812
if dtype not in ["S","U"]:
68156813
raise ValueError("type must string or unicode ('S' or 'U')")
6814+
if encoding is None:
6815+
if dtype == 'S':
6816+
encoding = 'ascii'
6817+
else:
6818+
encoding = 'utf-8'
6819+
if n_strlen is None:
6820+
n_strlen = a.dtype.itemsize
68166821
if encoding in ['none','None','bytes']:
68176822
b = numpy.array(tuple(a.tobytes()),'S1')
68186823
elif encoding == 'ascii':
@@ -6827,7 +6832,7 @@ and shape `a.shape + (N,)`, where N is the length of each string in a."""
68276832
b = numpy.array([[bb[i:i+1] for i in range(n_strlen)] for bb in bbytes])
68286833
return b
68296834

6830-
def chartostring(b,encoding='utf-8'):
6835+
def chartostring(b,encoding=None):
68316836
"""
68326837
**`chartostring(b,encoding='utf-8')`**
68336838
@@ -6838,14 +6843,19 @@ Will be converted to a array of strings, where each string has a fixed
68386843
length of `b.shape[-1]` characters.
68396844
68406845
optional kwarg `encoding` can be used to specify character encoding (default
6841-
`utf-8`). If `encoding` is 'none' or 'bytes', a `numpy.string_` byte array is
6842-
returned.
6846+
`utf-8` for dtype=`'UN'` or `ascii` for dtype=`'SN'`). If `encoding` is 'none' or 'bytes',
6847+
a `numpy.string_` byte array is returned.
68436848
68446849
returns a numpy string array with datatype `'UN'` (or `'SN'`) and shape
68456850
`b.shape[:-1]` where where `N=b.shape[-1]`."""
68466851
dtype = b.dtype.kind
68476852
if dtype not in ["S","U"]:
68486853
raise ValueError("type must be string or unicode ('S' or 'U')")
6854+
if encoding is None:
6855+
if dtype == 'S':
6856+
encoding = 'ascii'
6857+
else:
6858+
encoding = 'utf-8'
68496859
bs = b.tobytes()
68506860
slen = int(b.shape[-1])
68516861
if encoding in ['none','None','bytes']:

0 commit comments

Comments
 (0)