Enhanced C#
Language of your choice: library documentation

Documentation moved to ecsharp.net

GitHub doesn't support HTTP redirects, so you'll be redirected in 3 seconds.

 All Classes Namespaces Functions Variables Enumerations Enumerator Properties Events Pages
Public fields | Public static fields | Properties | Public Member Functions | Static Public Member Functions | List of all members
Loyc.UString Struct Reference

UString is a slice of a string. It is a wrapper around string that provides a IBRange{T} of 21-bit UCS-4 characters. "U" stands for "Unicode", as in UCS-4, as opposed to a normal string that is UTF-16. More...


Source file:
Inheritance diagram for Loyc.UString:
Loyc.Collections.ICharSource Loyc.ICloneable< out T > Loyc.Collections.IListSource< out T >

Remarks

UString is a slice of a string. It is a wrapper around string that provides a IBRange{T} of 21-bit UCS-4 characters. "U" stands for "Unicode", as in UCS-4, as opposed to a normal string that is UTF-16.

UString is a slice type: it represents either an entire string, or a region of characters in a string. .NET strings are converted implicitly to UString.

It has been suggested that Java and .NET's reliance on 16-bit "unicode" characters was a mistake, because it turned out that 16 bits was not enough to represent all the world's characters.

Instead it has been suggested that we should use UTF-8 everywhere. To scan UTF-8 data instead of UTF-16 while still supporting non-English characters (or "ĉĥáràĉtérŝ", as I like to say), it is useful to have a bidirectional iterator that scans characters one codepoint at a time. UString provides that functionality for .NET, and the nice thing about UString is that it's portable to UTF-8 environments. That is, by using UString, your code is portable to a UTF-8 environment that uses an equivalent implementation of UString for UTF-8. Eventually I want Loyc to target native environments, where UTF-8 is common, and UString can provide a common data type for both UTF-8 and UTF-16 environments.

UString is a bidirectional range of "uchar", which is an alias for int (uchar means "Unicode" or "UCS-4", rather than "unsigned").

The difference between StringSlice and UString is that StringSlice is a random-access range of char, while UString is a bidirectional range of uchar (int). Since UString implements IListSource{Char}, it requires StringSlice in order to support the Slice method.

UString has a DecodeAt(int) method that tries to decode a UTF character to UCS at a particular index.

Since UString and StringSlice are just slightly different views of the same data, you can implicitly cast between them.

Unfortunately, it's not possible for UString to compare equal to its equivalent string, for two reasons: (1) System.String.Equals cannot be changed, and (2) UString.GetHashCode cannot return the same value as String.GetHashCode without actually generating a String object, which would be inefficient (String.GetHashCode cannot be emulated because it changes between versions of the .NET framework and even between 32- and 64-bit builds.)

TODO: add Right, Normalize, EndsWith, FindLast, ReplaceAll, etc.

Public fields

int _count
 

Public static fields

static readonly UString Null = default(UString)
 
static readonly UString Empty = new UString("")
 

Properties

string InternalString [get]
 Returns the original string. More...
 
int InternalStart [get]
 
int InternalStop [get]
 
int Length [get]
 
int Count [get]
 
bool IsEmpty [get]
 
uchar Front [get]
 
uchar Back [get]
 
char this[int index] [get]
 Returns the code unit (16-bit value) at the specified index. More...
 
char this[int index, char defaultValue] [get]
 Returns the code unit (16-bit value) at the specified index, or a default value if the specified index was out of range. More...
 
int this[int index, int defaultValue] [get]
 Returns the code point (21-bit value) at the specified index, or a default value if the specified index was out of range. More...
 

Public Member Functions

 UString (string str, int start, int count=int.MaxValue)
 Initializes a UString slice. More...
 
 UString (string str)
 
uchar PopFront (out bool fail)
 
uchar PopBack (out bool fail)
 
IFRange< uchar > ICloneable
< IFRange< uchar > >. 
Clone ()
 
IBRange< uchar > ICloneable
< IBRange< uchar > >. 
Clone ()
 
UString Clone ()
 
IEnumerator< uchar >
IEnumerable< uchar >. 
GetEnumerator ()
 
IEnumerator< char >
IEnumerable< char >. 
GetEnumerator ()
 
System.Collections.IEnumerator
System.Collections.IEnumerable. 
GetEnumerator ()
 
RangeEnumerator< UString, uchar > GetEnumerator ()
 
uchar TryDecodeAt (int index)
 Returns the UCS code point that starts at the specified index. More...
 
uchar DecodeAt (int index)
 Returns the UCS code point that starts at the specified index. More...
 
void ThrowIndexOutOfRange (int i)
 
char TryGet (int index, out bool fail)
 Gets the item at the specified index, and does not throw an exception on failure. More...
 
IRange< char > IListSource< char >. Slice (int start, int count)
 Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters. More...
 
StringSlice Slice (int start, int count=int.MaxValue)
 Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters. More...
 
override int GetHashCode ()
 
override bool Equals (object obj)
 
bool Equals (UString other)
 
bool Equals (UString other, bool ignoreCase)
 
override string ToString ()
 
UString Substring (int start, int count)
 Synonym for Slice() More...
 
UString Substring (int start)
 
UString Find (uchar what, bool ignoreCase=false)
 Finds the specified UCS-4 character. More...
 
UString Find (UString what, bool ignoreCase=false)
 Finds the specified string within this string. More...
 
UString ShedExcessMemory (int maxExtra)
 This method makes a copy of the string if this is a sufficiently small slice of a larger string. More...
 
UString ToUpper ()
 Converts the string to uppercase using the 'invariant' culture. More...
 
bool StartsWith (UString what, bool ignoreCase=false)
 Determines whether this string starts with the specified other string. More...
 
UString Replace (UString what, UString replacement, bool ignoreCase=false, int maxReplacements=int.MaxValue)
 Returns a new string in which all occurrences (or a specified number of occurrences) of a specified string in the current instance are replaced with another specified string. More...
 
UString ReplaceOne (UString what, UString replacement, bool ignoreCase=false)
 
int IndexOf (char find, bool ignoreCase=false)
 
int IndexOf (UString find, bool ignoreCase=false)
 
Pair< UString, UStringSplitAt (char delimiter, bool ignoreCase=false)
 
Pair< UString, UStringSplitAt (UString delimiter)
 

Static Public Member Functions

static bool operator== (UString x, UString y)
 
static bool operator!= (UString x, UString y)
 
static operator string (UString s)
 
static implicit operator UString (string s)
 
static implicit operator UString (StringSlice s)
 
static bool SubstringEqualHelper (string _str, int _start, UString what, bool ignoreCase=false)
 

Constructor & Destructor Documentation

Loyc.UString.UString ( string  str,
int  start,
int  count = int.MaxValue 
)
inline

Initializes a UString slice.

Exceptions
ArgumentExceptionThe start index was below zero.

The (start, count) range is allowed to be invalid, as long as 'start' is zero or above.

  • If 'count' is below zero, or if 'start' is above the original Length, the Count of the new slice is set to zero.
  • if (start + count) is above the original Length, the Count of the new slice is reduced to list.Length - start.

Referenced by Loyc.UString.Find(), and Loyc.UString.Substring().

Member Function Documentation

uchar Loyc.UString.DecodeAt ( int  index)
inline

Returns the UCS code point that starts at the specified index.

Parameters
indexCode unit index at which to decode.
Returns
The code point starting at this index, or a negative number.
Exceptions
IndexOutOfRangeExceptionOops.

If decoding fails, either because the index points to the "middle" of a multi-code-unit sequence or because the string contains an invalid UTF sequence, this method returns a negative value (the bitwise 'not' of the invalid char). If the index is invalid, this method returns -1.

References Loyc.UString.TryDecodeAt().

UString Loyc.UString.Find ( uchar  what,
bool  ignoreCase = false 
)
inline

Finds the specified UCS-4 character.

Returns
returns a range from the first occurrence of 'what' to the original end of this UString. If the character is not found, an empty string (slicing the end of this range) is returned.

References Loyc.UString.UString().

Referenced by Loyc.UString.Find(), and Loyc.UString.Replace().

UString Loyc.UString.Find ( UString  what,
bool  ignoreCase = false 
)
inline

Finds the specified string within this string.

Returns
Returns a range from the first occurrence of 'what' to the original end of this UString. If 'what' is not found, an empty string (slicing the end of this range) is returned.

References Loyc.UString.Find(), and Loyc.UString.UString().

UString Loyc.UString.Replace ( UString  what,
UString  replacement,
bool  ignoreCase = false,
int  maxReplacements = int.MaxValue 
)
inline

Returns a new string in which all occurrences (or a specified number of occurrences) of a specified string in the current instance are replaced with another specified string.

Parameters
what
replacement
ignoreCase
maxReplacements
Returns
Returns a new string with replacements made, or the same string if no replacements occurred.

References Loyc.UString.Find().

UString Loyc.UString.ShedExcessMemory ( int  maxExtra)
inline

This method makes a copy of the string if this is a sufficiently small slice of a larger string.

Returns
returns ToString() if InternalString.Length - Length > maxExtra, otherwise this.
IRange<char> IListSource<char>. Loyc.UString.Slice ( int  startIndex,
int  length 
)
inline

Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters.

Parameters
startIndexIndex of first character to return. If startIndex >= Count, an empty string is returned.
lengthNumber of characters desired.
Exceptions
ArgumentExceptionThrown if startIndex or length are negative.

Implements Loyc.Collections.ICharSource.

References Loyc.UString.Slice().

Referenced by Loyc.UString.Slice().

StringSlice Loyc.UString.Slice ( int  startIndex,
int  length = int.MaxValue 
)
inline

Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters.

Parameters
startIndexIndex of first character to return. If startIndex >= Count, an empty string is returned.
lengthNumber of characters desired.
Exceptions
ArgumentExceptionThrown if startIndex or length are negative.

Implements Loyc.Collections.ICharSource.

bool Loyc.UString.StartsWith ( UString  what,
bool  ignoreCase = false 
)
inline

Determines whether this string starts with the specified other string.

Returns
true if this string starts with the contents of 'what'
UString Loyc.UString.Substring ( int  start,
int  count 
)
inline

Synonym for Slice()

References Loyc.UString.UString().

UString Loyc.UString.ToUpper ( )
inline

Converts the string to uppercase using the 'invariant' culture.

uchar Loyc.UString.TryDecodeAt ( int  index)
inline

Returns the UCS code point that starts at the specified index.

Works the same way as DecodeAt(int) except that if the index is invalid, this method returns -1 rather than throwing.

Referenced by Loyc.UString.DecodeAt().

char Loyc.UString.TryGet ( int  index,
out bool  fail 
)
inline

Gets the item at the specified index, and does not throw an exception on failure.

Parameters
indexAn index in the range 0 to Count-1.
failA flag that is set on failure.
Returns
The element at the specified index, or default(T) if the index is not valid.

In my original design, the caller could provide a value to return on failure, but this would not allow T to be marked as "out" in C# 4. For the same reason, we cannot have a ref/out T parameter. Instead, the following extension methods are provided:

bool TryGet(int index, ref T value);
T TryGet(int, T defaultValue);

Implements Loyc.Collections.IListSource< out T >.

Property Documentation

string Loyc.UString.InternalString
get

Returns the original string.

Ideally, keep the string private, there would be no way to access its contents beyond the boundaries of the slice. However, the reality in .NET today is that many methods accept "slices" in the form of a triple (string, start index, count). In order to call such an old-style API using a slice, one must be able to extract the internal string and start index values.

char Loyc.UString.this[int index, char defaultValue]
get

Returns the code unit (16-bit value) at the specified index, or a default value if the specified index was out of range.

int Loyc.UString.this[int index, int defaultValue]
get

Returns the code point (21-bit value) at the specified index, or a default value if the specified index was out of range.

char Loyc.UString.this[int index]
get

Returns the code unit (16-bit value) at the specified index.

Exceptions
IndexOutOfRangeExceptionOops.