Enhanced C#
Language of your choice: library documentation
|
UString is a slice of a string. It is a wrapper around string that provides a IBRange{T} of 21-bit UCS-4 characters. "U" stands for "Unicode", as in UCS-4, as opposed to a normal string that is UTF-16. More...
UString is a slice of a string. It is a wrapper around string that provides a IBRange{T} of 21-bit UCS-4 characters. "U" stands for "Unicode", as in UCS-4, as opposed to a normal string that is UTF-16.
UString is a slice type: it represents either an entire string, or a region of characters in a string. .NET strings are converted implicitly to UString.
It has been suggested that Java and .NET's reliance on 16-bit "unicode" characters was a mistake, because it turned out that 16 bits was not enough to represent all the world's characters.
Instead it has been suggested that we should use UTF-8 everywhere. To scan UTF-8 data instead of UTF-16 while still supporting non-English characters (or "ĉĥáràĉtérŝ", as I like to say), it is useful to have a bidirectional iterator that scans characters one codepoint at a time. UString provides that functionality for .NET, and the nice thing about UString is that it's portable to UTF-8 environments. That is, by using UString, your code is portable to a UTF-8 environment that uses an equivalent implementation of UString for UTF-8. Eventually I want Loyc to target native environments, where UTF-8 is common, and UString can provide a common data type for both UTF-8 and UTF-16 environments.
UString is a bidirectional range of "uchar", which is an alias for int (uchar means "Unicode" or "UCS-4", rather than "unsigned").
The difference between StringSlice and UString is that StringSlice is a random-access range of char, while UString is a bidirectional range of uchar (int). Since UString implements IListSource{Char}, it requires StringSlice in order to support the Slice method.
UString has a DecodeAt(int) method that tries to decode a UTF character to UCS at a particular index.
Since UString and StringSlice are just slightly different views of the same data, you can implicitly cast between them.
Unfortunately, it's not possible for UString to compare equal to its equivalent string, for two reasons: (1) System.String.Equals cannot be changed, and (2) UString.GetHashCode cannot return the same value as String.GetHashCode without actually generating a String object, which would be inefficient (String.GetHashCode cannot be emulated because it changes between versions of the .NET framework and even between 32- and 64-bit builds.)
TODO: add Right, Normalize, EndsWith, FindLast, ReplaceAll, etc.
Public fields | |
int | _count |
Public static fields | |
static readonly UString | Null = default(UString) |
static readonly UString | Empty = new UString("") |
Properties | |
string | InternalString [get] |
Returns the original string. More... | |
int | InternalStart [get] |
int | InternalStop [get] |
int | Length [get] |
int | Count [get] |
bool | IsEmpty [get] |
uchar | Front [get] |
uchar | Back [get] |
char | this[int index] [get] |
Returns the code unit (16-bit value) at the specified index. More... | |
char | this[int index, char defaultValue] [get] |
Returns the code unit (16-bit value) at the specified index, or a default value if the specified index was out of range. More... | |
int | this[int index, int defaultValue] [get] |
Returns the code point (21-bit value) at the specified index, or a default value if the specified index was out of range. More... | |
Public Member Functions | |
UString (string str, int start, int count=int.MaxValue) | |
Initializes a UString slice. More... | |
UString (string str) | |
uchar | PopFront (out bool fail) |
uchar | PopBack (out bool fail) |
IFRange< uchar > ICloneable < IFRange< uchar > >. | Clone () |
IBRange< uchar > ICloneable < IBRange< uchar > >. | Clone () |
UString | Clone () |
IEnumerator< uchar > IEnumerable< uchar >. | GetEnumerator () |
IEnumerator< char > IEnumerable< char >. | GetEnumerator () |
System.Collections.IEnumerator System.Collections.IEnumerable. | GetEnumerator () |
RangeEnumerator< UString, uchar > | GetEnumerator () |
uchar | TryDecodeAt (int index) |
Returns the UCS code point that starts at the specified index. More... | |
uchar | DecodeAt (int index) |
Returns the UCS code point that starts at the specified index. More... | |
void | ThrowIndexOutOfRange (int i) |
char | TryGet (int index, out bool fail) |
Gets the item at the specified index, and does not throw an exception on failure. More... | |
IRange< char > IListSource< char >. | Slice (int start, int count) |
Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters. More... | |
StringSlice | Slice (int start, int count=int.MaxValue) |
Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters. More... | |
override int | GetHashCode () |
override bool | Equals (object obj) |
bool | Equals (UString other) |
bool | Equals (UString other, bool ignoreCase) |
override string | ToString () |
UString | Substring (int start, int count) |
Synonym for Slice() More... | |
UString | Substring (int start) |
UString | Find (uchar what, bool ignoreCase=false) |
Finds the specified UCS-4 character. More... | |
UString | Find (UString what, bool ignoreCase=false) |
Finds the specified string within this string. More... | |
UString | ShedExcessMemory (int maxExtra) |
This method makes a copy of the string if this is a sufficiently small slice of a larger string. More... | |
UString | ToUpper () |
Converts the string to uppercase using the 'invariant' culture. More... | |
bool | StartsWith (UString what, bool ignoreCase=false) |
Determines whether this string starts with the specified other string. More... | |
UString | Replace (UString what, UString replacement, bool ignoreCase=false, int maxReplacements=int.MaxValue) |
Returns a new string in which all occurrences (or a specified number of occurrences) of a specified string in the current instance are replaced with another specified string. More... | |
UString | ReplaceOne (UString what, UString replacement, bool ignoreCase=false) |
int | IndexOf (char find, bool ignoreCase=false) |
int | IndexOf (UString find, bool ignoreCase=false) |
Pair< UString, UString > | SplitAt (char delimiter, bool ignoreCase=false) |
Pair< UString, UString > | SplitAt (UString delimiter) |
Static Public Member Functions | |
static bool | operator== (UString x, UString y) |
static bool | operator!= (UString x, UString y) |
static | operator string (UString s) |
static implicit | operator UString (string s) |
static implicit | operator UString (StringSlice s) |
static bool | SubstringEqualHelper (string _str, int _start, UString what, bool ignoreCase=false) |
|
inline |
Initializes a UString slice.
ArgumentException | The start index was below zero. |
The (start, count) range is allowed to be invalid, as long as 'start' is zero or above.
list.Length - start
. Referenced by Loyc.UString.Find(), and Loyc.UString.Substring().
|
inline |
Returns the UCS code point that starts at the specified index.
index | Code unit index at which to decode. |
IndexOutOfRangeException | Oops. |
If decoding fails, either because the index points to the "middle" of a multi-code-unit sequence or because the string contains an invalid UTF sequence, this method returns a negative value (the bitwise 'not' of the invalid char). If the index is invalid, this method returns -1.
References Loyc.UString.TryDecodeAt().
|
inline |
Finds the specified UCS-4 character.
References Loyc.UString.UString().
Referenced by Loyc.UString.Find(), and Loyc.UString.Replace().
Finds the specified string within this string.
References Loyc.UString.Find(), and Loyc.UString.UString().
|
inline |
Returns a new string in which all occurrences (or a specified number of occurrences) of a specified string in the current instance are replaced with another specified string.
what | |
replacement | |
ignoreCase | |
maxReplacements |
References Loyc.UString.Find().
|
inline |
This method makes a copy of the string if this is a sufficiently small slice of a larger string.
InternalString.Length - Length > maxExtra
, otherwise this.
|
inline |
Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters.
startIndex | Index of first character to return. If startIndex >= Count, an empty string is returned. |
length | Number of characters desired. |
ArgumentException | Thrown if startIndex or length are negative. |
Implements Loyc.Collections.ICharSource.
References Loyc.UString.Slice().
Referenced by Loyc.UString.Slice().
|
inline |
Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters.
startIndex | Index of first character to return. If startIndex >= Count, an empty string is returned. |
length | Number of characters desired. |
ArgumentException | Thrown if startIndex or length are negative. |
Implements Loyc.Collections.ICharSource.
|
inline |
Determines whether this string starts with the specified other string.
|
inline |
Synonym for Slice()
References Loyc.UString.UString().
|
inline |
Converts the string to uppercase using the 'invariant' culture.
|
inline |
Returns the UCS code point that starts at the specified index.
Works the same way as DecodeAt(int) except that if the index is invalid, this method returns -1 rather than throwing.
Referenced by Loyc.UString.DecodeAt().
|
inline |
Gets the item at the specified index, and does not throw an exception on failure.
index | An index in the range 0 to Count-1. |
fail | A flag that is set on failure. |
In my original design, the caller could provide a value to return on failure, but this would not allow T to be marked as "out" in C# 4. For the same reason, we cannot have a ref/out T parameter. Instead, the following extension methods are provided:
Implements Loyc.Collections.IListSource< out T >.
|
get |
Returns the original string.
Ideally, keep the string private, there would be no way to access its contents beyond the boundaries of the slice. However, the reality in .NET today is that many methods accept "slices" in the form of a triple (string, start index, count). In order to call such an old-style API using a slice, one must be able to extract the internal string and start index values.
|
get |
Returns the code unit (16-bit value) at the specified index, or a default value if the specified index was out of range.
|
get |
Returns the code point (21-bit value) at the specified index, or a default value if the specified index was out of range.
|
get |
Returns the code unit (16-bit value) at the specified index.
IndexOutOfRangeException | Oops. |