improve some Sphinx markup

6722061c · Stefan Behnel · a9963a76 · 6722061c
Commit 6722061c authored Jan 25, 2014 by Stefan Behnel
Hide whitespace changes
Inline Side-by-side

Showing with 40 additions and 38 deletions

docs/src/tutorial/strings.rst docs/src/tutorial/strings.rst +40 -38

No files found.
--- a/docs/src/tutorial/strings.rst
+++ b/docs/src/tutorial/strings.rst
@@ -16,46 +16,46 @@ implicitly insert these encoding/decoding steps.
 Python string types in Cython code
 ----------------------------------

-Cython supports four Python string types: ``bytes``, ``str``,
-``unicode`` and ``basestring``.  The ``bytes`` and ``unicode`` types
-are the specific types known from normal Python 2.x (named ``bytes``
-and ``str`` in Python 3).  Additionally, Cython also supports the
-``bytearray`` type starting with Python 2.6.  It behaves like the
-``bytes`` type, except that it is mutable.
-
-The ``str`` type is special in that it is the byte string in Python 2
+Cython supports four Python string types: :obj:`bytes`, :obj:`str`,
+:obj:`unicode` and :obj:`basestring`.  The :obj:`bytes` and :obj:`unicode` types
+are the specific types known from normal Python 2.x (named :obj:`bytes`
+and :obj:`str` in Python 3).  Additionally, Cython also supports the
+:obj:`bytearray` type starting with Python 2.6.  It behaves like the
+:obj:`bytes` type, except that it is mutable.
+
+The :obj:`str` type is special in that it is the byte string in Python 2
 and the Unicode string in Python 3 (for Cython code compiled with
 language level 2, i.e. the default).  Meaning, it always corresponds
-exactly with the type that the Python runtime itself calls ``str``.
-Thus, in Python 2, both ``bytes`` and ``str`` represent the byte string
-type, whereas in Python 3, both ``str`` and ``unicode`` represent the
+exactly with the type that the Python runtime itself calls :obj:`str`.
+Thus, in Python 2, both :obj:`bytes` and :obj:`str` represent the byte string
+type, whereas in Python 3, both :obj:`str` and :obj:`unicode` represent the
 Python Unicode string type.  The switch is made at C compile time, the
 Python version that is used to run Cython is not relevant.

-When compiling Cython code with language level 3, the ``str`` type is
+When compiling Cython code with language level 3, the :obj:`str` type is
 identified with exactly the Unicode string type at Cython compile time,
-i.e. it does not identify with ``bytes`` when running in Python 2.
+i.e. it does not identify with :obj:`bytes` when running in Python 2.

-Note that the ``str`` type is not compatible with the ``unicode``
+Note that the :obj:`str` type is not compatible with the :obj:`unicode`
 type in Python 2, i.e. you cannot assign a Unicode string to a variable
-or argument that is typed ``str``.  The attempt will result in either
-a compile time error (if detectable) or a ``TypeError`` exception at
+or argument that is typed :obj:`str`.  The attempt will result in either
+a compile time error (if detectable) or a :obj:`TypeError` exception at
 runtime.  You should therefore be careful when you statically type a
 string variable in code that must be compatible with Python 2, as this
 Python version allows a mix of byte strings and unicode strings for data
 and users normally expect code to be able to work with both.  Code that
 only targets Python 3 can safely type variables and arguments as either
-``bytes`` or ``unicode``.
+:obj:`bytes` or :obj:`unicode`.

-The ``basestring`` type represents both the types ``str`` and ``unicode``,
+The :obj:`basestring` type represents both the types :obj:`str` and :obj:`unicode`,
 i.e. all Python text string types in Python 2 and Python 3.  This can be
 used for typing text variables that normally contain Unicode text (at
-least in Python 3) but must additionally accept the ``str`` type in
+least in Python 3) but must additionally accept the :obj:`str` type in
 Python 2 for backwards compatibility reasons.  It is not compatible with
-the ``bytes`` type.  Its usage should be rare in normal Cython code as
-the generic ``object`` type (i.e. untyped code) will normally be good
+the :obj:`bytes` type.  Its usage should be rare in normal Cython code as
+the generic :obj:`object` type (i.e. untyped code) will normally be good
 enough and has the additional advantage of supporting the assignment of
-string subtypes.  Support for the ``basestring`` type is new in Cython
+string subtypes.  Support for the :obj:`basestring` type is new in Cython
 0.20.


@@ -100,7 +100,7 @@ Python variable::
    cdef char* c_string = c_call_returning_a_c_string()
    cdef bytes py_string = c_string

-A type cast to ``object`` or ``bytes`` will do the same thing::
+A type cast to :obj:`object` or :obj:`bytes` will do the same thing::

    py_string = <bytes> c_string

@@ -163,8 +163,8 @@ however, when the C function stores the pointer for later use.  Apart
 from keeping a Python reference to the string object, no manual memory
 management is required.

-Starting with Cython 0.20, the ``bytearray`` type is supported and
-coerces in the same way as the ``bytes`` type.  However, when using it
+Starting with Cython 0.20, the :obj:`bytearray` type is supported and
+coerces in the same way as the :obj:`bytes` type.  However, when using it
 in a C context, special care must be taken not to grow or shrink the
 object buffer after converting it to a C string pointer.  These
 modifications can change the internal buffer address, which will make
@@ -224,6 +224,7 @@ In Cython 0.18, these standard declarations have been changed to
 use the correct ``const`` modifier, so your code will automatically
 benefit from the new ``const`` support if it uses them.

+
 Decoding bytes to text
 ----------------------

@@ -234,7 +235,7 @@ the C byte strings to Python Unicode strings on reception, and to
 encode Python Unicode strings to C byte strings on the way out.

 With a Python byte string object, you would normally just call the
-``.decode()`` method to decode it into a Unicode string::
+``bytes.decode()`` method to decode it into a Unicode string::

    ustring = byte_string.decode('UTF-8')

@@ -318,6 +319,7 @@ assignment.  Later access to the invalidated pointer will read invalid
 memory and likely result in a segfault.  Cython will therefore refuse
 to compile this code.

+
 C++ strings
 -----------

@@ -375,7 +377,7 @@ There are two use cases where this is inconvenient.  First, if all
 C strings that are being processed (or the large majority) contain
 text, automatic encoding and decoding from and to Python unicode
 objects can reduce the code overhead a little.  In this case, you
-can set the ``c_string_type`` directive in your module to ``unicode``
+can set the ``c_string_type`` directive in your module to :obj:`unicode`
 and the ``c_string_encoding`` to the encoding that your C code uses,
 for example::

@@ -393,7 +395,7 @@ The second use case is when all C strings that are being processed
 only contain ASCII encodable characters (e.g. numbers) and you want
 your code to use the native legacy string type in Python 2 for them,
 instead of always using Unicode. In this case, you can set the
-string type to ``str``::
+string type to :obj:`str`::

    # cython: c_string_type=str, c_string_encoding=ascii

@@ -472,15 +474,15 @@ whereas the following ``ISO-8859-15`` encoded source file will print

 Note that the unicode literal ``u'abcö'`` is a correctly decoded four
 character Unicode string in both cases, whereas the unprefixed Python
-``str`` literal ``'abcö'`` will become a byte string in Python 2 (thus
+:obj:`str` literal ``'abcö'`` will become a byte string in Python 2 (thus
 having length 4 or 5 in the examples above), and a 4 character Unicode
 string in Python 3.  If you are not familiar with encodings, this may
 not appear obvious at first read.  See `CEP 108`_ for details.

-As a rule of thumb, it is best to avoid unprefixed non-ASCII ``str``
+As a rule of thumb, it is best to avoid unprefixed non-ASCII :obj:`str`
 literals and to use unicode string literals for all text.  Cython also
 supports the ``__future__`` import ``unicode_literals`` that instructs
-the parser to read all unprefixed ``str`` literals in a source file as
+the parser to read all unprefixed :obj:`str` literals in a source file as
 unicode string literals, just like Python 3.

 .. _`CEP 108`: http://wiki.cython.org/enhancements/stringliterals
@@ -522,7 +524,7 @@ explicitly, and the following will print ``A`` (or ``b'A'`` in Python

 The explicit coercion works for any C integer type.  Values outside of
 the range of a :c:type:`char` or :c:type:`unsigned char` will raise an
-``OverflowError`` at runtime.  Coercion will also happen automatically
+:obj:`OverflowError` at runtime.  Coercion will also happen automatically
 when assigning to a typed variable, e.g.::

    cdef bytes py_byte_string
@@ -544,10 +546,10 @@ The following will print 65::
    cdef Py_UCS4 uchar_val = u'A'
    print( <long>uchar_val )

-Note that casting to a C ``long`` (or ``unsigned long``) will work
+Note that casting to a C :c:type:`long` (or :c:type:`unsigned long`) will work
 just fine, as the maximum code point value that a Unicode character
 can have is 1114111 (``0x10FFFF``).  On platforms with 32bit or more,
-``int`` is just as good.
+:c:type:`int` is just as good.


 Narrow Unicode builds
@@ -682,15 +684,15 @@ zero-terminated UTF-16 encoded :c:type:`wchar_t*` strings, so called
 "wide strings".

 By default, Windows builds of CPython define :c:type:`Py_UNICODE` as
-a synonym for :c:type:`wchar_t`. This makes internal ``unicode``
+a synonym for :c:type:`wchar_t`. This makes internal :obj:`unicode`
 representation compatible with UTF-16 and allows for efficient zero-copy
 conversions. This also means that Windows builds are always
 `Narrow Unicode builds`_ with all the caveats.

 To aid interoperation with Windows APIs, Cython 0.19 supports wide
 strings (in the form of :c:type:`Py_UNICODE*`) and implicitly converts
-them to and from ``unicode`` string objects.  These conversions behave the
-same way as they do for :c:type:`char*` and ``bytes`` as described in
+them to and from :obj:`unicode` string objects.  These conversions behave the
+same way as they do for :c:type:`char*` and :obj:`bytes` as described in
 `Passing byte strings`_.

 In addition to automatic conversion, unicode literals that appear
@@ -722,7 +724,7 @@ Here is an example of how one would call a Unicode API on Windows::
    APIs deprecated and inefficient.

 One consequence of CPython 3.3 changes is that :py:func:`len` of
-``unicode`` strings is always measured in *code points* ("characters"),
+:obj:`unicode` strings is always measured in *code points* ("characters"),
 while Windows API expect the number of UTF-16 *code units*
 (where each surrogate is counted individually). To always get the number
 of code units, call :c:func:`PyUnicode_GetSize` directly.