Mercurial Hosting > luan
diff website/src/manual.html.luan @ 556:d02f43598ba3
finish String documentation
author | Franklin Schmidt <fschmidt@gmail.com> |
---|---|
date | Fri, 19 Jun 2015 19:39:41 -0600 |
parents | e25ba7a2e816 |
children | 7cc9d4a53d3b |
line wrap: on
line diff
--- a/website/src/manual.html.luan Fri Jun 19 04:29:06 2015 -0600 +++ b/website/src/manual.html.luan Fri Jun 19 19:39:41 2015 -0600 @@ -2159,40 +2159,6 @@ - -<p> -<hr><h3><a name="pdf-tonumber"><code>tonumber (e [, base])</code></a></h3> - - -<p> -When called with no <code>base</code>, -<code>tonumber</code> tries to convert its argument to a number. -If the argument is already a number or -a string convertible to a number, -then <code>tonumber</code> returns this number; -otherwise, it returns <b>nil</b>. - - -<p> -The conversion of strings can result in integers or floats, -according to the lexical conventions of Lua (see <a href="#3.1">§3.1</a>). -(The string may have leading and trailing spaces and a sign.) - - -<p> -When called with <code>base</code>, -then <code>e</code> must be a string to be interpreted as -an integer numeral in that base. -The base may be any integer between 2 and 36, inclusive. -In bases above 10, the letter '<code>A</code>' (in either upper or lower case) -represents 10, '<code>B</code>' represents 11, and so forth, -with '<code>Z</code>' representing 35. -If the string <code>e</code> is not a valid numeral in the given base, -the function returns <b>nil</b>. - - - - <h4 <%=heading_options%> ><a name="Luan.to_string"><tt>Luan.to_string (v)</tt></a></h4> <p> @@ -2367,22 +2333,6 @@ - -<p> -<hr><h3><a name="pdf-string.byte"><code>string.byte (s [, i [, j]])</code></a></h3> -Returns the internal numerical codes of the characters <code>s[i]</code>, -<code>s[i+1]</code>, ..., <code>s[j]</code>. -The default value for <code>i</code> is 1; -the default value for <code>j</code> is <code>i</code>. -These indices are corrected -following the same rules of function <a href="#pdf-string.sub"><code>string.sub</code></a>. - - -<p> -Numerical codes are not necessarily portable across platforms. - - - <h4 <%=heading_options%> ><a name="String.char"><tt>String.char (···)</tt></a></h4> <p> @@ -2411,7 +2361,7 @@ <p> Looks for the first match of -<tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) in the string <tt>s</tt>. +<tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) in the string <tt>s</tt>. If it finds a match, then <tt>find</tt> returns the indices of <tt>s</tt> where this occurrence starts and ends; otherwise, it returns <b>nil</b>. @@ -2451,7 +2401,7 @@ <p> Returns an iterator function that, each time it is called, -returns the next captures from <tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) +returns the next captures from <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) over the string <tt>s</tt>. If <tt>pattern</tt> specifies no captures, then the whole match is produced in each call. @@ -2492,7 +2442,7 @@ <p> Returns a copy of <tt>s</tt> in which all (or the first <tt>n</tt>, if given) -occurrences of the <tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) have been +occurrences of the <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) have been replaced by a replacement string specified by <tt>repl</tt>, which can be a string, a table, or a function. <tt>gsub</tt> also returns, as its second value, @@ -2560,6 +2510,11 @@ +<h4 <%=heading_options%> ><a name="String.literal"><tt>String.literal (s)</tt></a></h4> +<p> +Returns a string which matches the literal string <tt>s</tt> in a regular expression. This function is simply the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#quote(java.lang.String)"><tt>Pattern.quote</tt></a>. + + <h4 <%=heading_options%> ><a name="String.lower"><tt>String.lower (s)</tt></a></h4> <p> Receives a string and returns a copy of this string with all @@ -2569,109 +2524,128 @@ -<p> -<hr><h3><a name="pdf-string.match"><code>string.match (s, pattern [, init])</code></a></h3> -Looks for the first <em>match</em> of -<code>pattern</code> (see <a href="#6.4.1">§6.4.1</a>) in the string <code>s</code>. -If it finds one, then <code>match</code> returns +<h4 <%=heading_options%> ><a name="String.match"><tt>String.match (s, pattern [, init])</tt></a></h4> + +<p> +Looks for the first <i>match</i> of +<tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) in the string <tt>s</tt>. +If it finds one, then <tt>match</tt> returns the captures from the pattern; otherwise it returns <b>nil</b>. -If <code>pattern</code> specifies no captures, +If <tt>pattern</tt> specifies no captures, then the whole match is returned. -A third, optional numerical argument <code>init</code> specifies +A third, optional numerical argument <tt>init</tt> specifies where to start the search; its default value is 1 and can be negative. - - -<p> -<hr><h3><a name="pdf-string.pack"><code>string.pack (fmt, v1, v2, ···)</code></a></h3> - - -<p> -Returns a binary string containing the values <code>v1</code>, <code>v2</code>, etc. -packed (that is, serialized in binary form) -according to the format string <code>fmt</code> (see <a href="#6.4.2">§6.4.2</a>). - - - - -<p> -<hr><h3><a name="pdf-string.packsize"><code>string.packsize (fmt)</code></a></h3> - - -<p> -Returns the size of a string resulting from <a href="#pdf-string.pack"><code>string.pack</code></a> -with the given format. -The format string cannot have the variable-length options -'<code>s</code>' or '<code>z</code>' (see <a href="#6.4.2">§6.4.2</a>). - - - - -<p> -<hr><h3><a name="pdf-string.rep"><code>string.rep (s, n [, sep])</code></a></h3> -Returns a string that is the concatenation of <code>n</code> copies of -the string <code>s</code> separated by the string <code>sep</code>. -The default value for <code>sep</code> is the empty string +<h4 <%=heading_options%> ><a name="String.matches"><tt>String.matches (s, pattern)</tt></a></h4> +<p> +Returns a boolean indicating whether the entire string <tt>s</tt> matches <tt>pattern</tt>. + + + +<h4 <%=heading_options%> ><a name="String.rep"><tt>String.rep (s, n [, sep])</tt></a></h4> +<p> +Returns a string that is the concatenation of <tt>n</tt> copies of +the string <tt>s</tt> separated by the string <tt>sep</tt>. +The default value for <tt>sep</tt> is the empty string (that is, no separator). -Returns the empty string if <code>n</code> is not positive. - - - - -<p> -<hr><h3><a name="pdf-string.reverse"><code>string.reverse (s)</code></a></h3> -Returns a string that is the string <code>s</code> reversed. - - - - -<p> -<hr><h3><a name="pdf-string.sub"><code>string.sub (s, i [, j])</code></a></h3> -Returns the substring of <code>s</code> that -starts at <code>i</code> and continues until <code>j</code>; -<code>i</code> and <code>j</code> can be negative. -If <code>j</code> is absent, then it is assumed to be equal to -1 +Returns the empty string if <tt>n</tt> is not positive. + + + + +<h4 <%=heading_options%> ><a name="String.reverse"><tt>String.reverse (s)</tt></a></h4> +<p> +Returns a string that is the string <tt>s</tt> reversed. + + + + +<h4 <%=heading_options%> ><a name="String.sub"><tt>String.sub (s, i [, j])</tt></a></h4> + +<p> +Returns the substring of <tt>s</tt> that +starts at <tt>i</tt> and continues until <tt>j</tt>; +<tt>i</tt> and <tt>j</tt> can be negative. +If <tt>j</tt> is absent, then it is assumed to be equal to -1 (which is the same as the string length). In particular, -the call <code>string.sub(s,1,j)</code> returns a prefix of <code>s</code> -with length <code>j</code>, -and <code>string.sub(s, -i)</code> returns a suffix of <code>s</code> -with length <code>i</code>. +the call <tt>string.sub(s,1,j)</tt> returns a prefix of <tt>s</tt> +with length <tt>j</tt>, +and <tt>string.sub(s, -i)</tt> returns a suffix of <tt>s</tt> +with length <tt>i</tt>. <p> If, after the translation of negative indices, -<code>i</code> is less than 1, +<tt>i</tt> is less than 1, it is corrected to 1. -If <code>j</code> is greater than the string length, +If <tt>j</tt> is greater than the string length, it is corrected to that length. If, after these corrections, -<code>i</code> is greater than <code>j</code>, +<tt>i</tt> is greater than <tt>j</tt>, the function returns the empty string. - -<p> -<hr><h3><a name="pdf-string.unpack"><code>string.unpack (fmt, s [, pos])</code></a></h3> - - -<p> -Returns the values packed in string <code>s</code> (see <a href="#pdf-string.pack"><code>string.pack</code></a>) -according to the format string <code>fmt</code> (see <a href="#6.4.2">§6.4.2</a>). -An optional <code>pos</code> marks where -to start reading in <code>s</code> (default is 1). -After the read values, -this function also returns the index of the first unread byte in <code>s</code>. - - - - -<p> -<hr><h3><a name="pdf-string.upper"><code>string.upper (s)</code></a></h3> +<h4 <%=heading_options%> ><a name="String.to_binary"><tt>String.to_binary (s)</tt></a></h4> + +<p> +Converts a string to a binary by calling the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()"><tt>String.getBytes</tt></a>. + + + +<h4 <%=heading_options%> ><a name="String.to_number"><tt>String.to_number (s [, base])</tt></a></h4> + +<p> +When called with no <tt>base</tt>, +<tt>to_number</tt> tries to convert its argument to a number. +If the argument is +a string convertible to a number, +then <tt>to_number</tt> returns this number; +otherwise, it returns <b>nil</b>. + +The conversion of strings can result in integers or floats. + + +<p> +When called with <tt>base</tt>, +then <tt>s</tt> must be a string to be interpreted as +an integer numeral in that base. +In bases above 10, the letter '<tt>A</tt>' (in either upper or lower case) +represents 10, '<tt>B</tt>' represents 11, and so forth, +with '<tt>Z</tt>' representing 35. +If the string <tt>s</tt> is not a valid numeral in the given base, +the function returns <b>nil</b>. + + + +<h4 <%=heading_options%> ><a name="String.trim"><tt>String.trim (s)</tt></a></h4> + +<p> +Removes the leading and trailing whitespace by calling the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#trim()"><tt>String.trim</tt></a>. + + + + +<h4 <%=heading_options%> ><a name="String.unicode"><tt>String.unicode (s [, i [, j]])</tt></a></h4> + +<p> +Returns the internal numerical codes of the characters <tt>s[i]</tt>, +<tt>s[i+1]</tt>, ..., <tt>s[j]</tt>. +The default value for <tt>i</tt> is 1; +the default value for <tt>j</tt> is <tt>i</tt>. +These indices are corrected +following the same rules of function <a href="#String.sub"><tt>String.sub</tt></a>. + + + + + +<h4 <%=heading_options%> ><a name="String.upper"><tt>String.upper (s)</tt></a></h4> +<p> Receives a string and returns a copy of this string with all lowercase letters changed to uppercase. All other characters are left unchanged. @@ -2681,404 +2655,6 @@ -<h3>6.4.1 – <a name="6.4.1">Patterns</a></h3> - -<p> -Patterns in Lua are described by regular strings, -which are interpreted as patterns by the pattern-matching functions -<a href="#pdf-string.find"><code>string.find</code></a>, -<a href="#pdf-string.gmatch"><code>string.gmatch</code></a>, -<a href="#pdf-string.gsub"><code>string.gsub</code></a>, -and <a href="#pdf-string.match"><code>string.match</code></a>. -This section describes the syntax and the meaning -(that is, what they match) of these strings. - - - -<h4>Character Class:</h4><p> -A <em>character class</em> is used to represent a set of characters. -The following combinations are allowed in describing a character class: - -<ul> - -<li><b><em>x</em>: </b> -(where <em>x</em> is not one of the <em>magic characters</em> -<code>^$()%.[]*+-?</code>) -represents the character <em>x</em> itself. -</li> - -<li><b><code>.</code>: </b> (a dot) represents all characters.</li> - -<li><b><code>%a</code>: </b> represents all letters.</li> - -<li><b><code>%c</code>: </b> represents all control characters.</li> - -<li><b><code>%d</code>: </b> represents all digits.</li> - -<li><b><code>%g</code>: </b> represents all printable characters except space.</li> - -<li><b><code>%l</code>: </b> represents all lowercase letters.</li> - -<li><b><code>%p</code>: </b> represents all punctuation characters.</li> - -<li><b><code>%s</code>: </b> represents all space characters.</li> - -<li><b><code>%u</code>: </b> represents all uppercase letters.</li> - -<li><b><code>%w</code>: </b> represents all alphanumeric characters.</li> - -<li><b><code>%x</code>: </b> represents all hexadecimal digits.</li> - -<li><b><code>%<em>x</em></code>: </b> (where <em>x</em> is any non-alphanumeric character) -represents the character <em>x</em>. -This is the standard way to escape the magic characters. -Any non-alphanumeric character -(including all punctuations, even the non-magical) -can be preceded by a '<code>%</code>' -when used to represent itself in a pattern. -</li> - -<li><b><code>[<em>set</em>]</code>: </b> -represents the class which is the union of all -characters in <em>set</em>. -A range of characters can be specified by -separating the end characters of the range, -in ascending order, with a '<code>-</code>'. -All classes <code>%</code><em>x</em> described above can also be used as -components in <em>set</em>. -All other characters in <em>set</em> represent themselves. -For example, <code>[%w_]</code> (or <code>[_%w]</code>) -represents all alphanumeric characters plus the underscore, -<code>[0-7]</code> represents the octal digits, -and <code>[0-7%l%-]</code> represents the octal digits plus -the lowercase letters plus the '<code>-</code>' character. - - -<p> -The interaction between ranges and classes is not defined. -Therefore, patterns like <code>[%a-z]</code> or <code>[a-%%]</code> -have no meaning. -</li> - -<li><b><code>[^<em>set</em>]</code>: </b> -represents the complement of <em>set</em>, -where <em>set</em> is interpreted as above. -</li> - -</ul><p> -For all classes represented by single letters (<code>%a</code>, <code>%c</code>, etc.), -the corresponding uppercase letter represents the complement of the class. -For instance, <code>%S</code> represents all non-space characters. - - -<p> -The definitions of letter, space, and other character groups -depend on the current locale. -In particular, the class <code>[a-z]</code> may not be equivalent to <code>%l</code>. - - - - - -<h4>Pattern Item:</h4><p> -A <em>pattern item</em> can be - -<ul> - -<li> -a single character class, -which matches any single character in the class; -</li> - -<li> -a single character class followed by '<code>*</code>', -which matches zero or more repetitions of characters in the class. -These repetition items will always match the longest possible sequence; -</li> - -<li> -a single character class followed by '<code>+</code>', -which matches one or more repetitions of characters in the class. -These repetition items will always match the longest possible sequence; -</li> - -<li> -a single character class followed by '<code>-</code>', -which also matches zero or more repetitions of characters in the class. -Unlike '<code>*</code>', -these repetition items will always match the shortest possible sequence; -</li> - -<li> -a single character class followed by '<code>?</code>', -which matches zero or one occurrence of a character in the class. -It always matches one occurrence if possible; -</li> - -<li> -<code>%<em>n</em></code>, for <em>n</em> between 1 and 9; -such item matches a substring equal to the <em>n</em>-th captured string -(see below); -</li> - -<li> -<code>%b<em>xy</em></code>, where <em>x</em> and <em>y</em> are two distinct characters; -such item matches strings that start with <em>x</em>, end with <em>y</em>, -and where the <em>x</em> and <em>y</em> are <em>balanced</em>. -This means that, if one reads the string from left to right, -counting <em>+1</em> for an <em>x</em> and <em>-1</em> for a <em>y</em>, -the ending <em>y</em> is the first <em>y</em> where the count reaches 0. -For instance, the item <code>%b()</code> matches expressions with -balanced parentheses. -</li> - -<li> -<code>%f[<em>set</em>]</code>, a <em>frontier pattern</em>; -such item matches an empty string at any position such that -the next character belongs to <em>set</em> -and the previous character does not belong to <em>set</em>. -The set <em>set</em> is interpreted as previously described. -The beginning and the end of the subject are handled as if -they were the character '<code>\0</code>'. -</li> - -</ul> - - - - -<h4>Pattern:</h4><p> -A <em>pattern</em> is a sequence of pattern items. -A caret '<code>^</code>' at the beginning of a pattern anchors the match at the -beginning of the subject string. -A '<code>$</code>' at the end of a pattern anchors the match at the -end of the subject string. -At other positions, -'<code>^</code>' and '<code>$</code>' have no special meaning and represent themselves. - - - - - -<h4>Captures:</h4><p> -A pattern can contain sub-patterns enclosed in parentheses; -they describe <em>captures</em>. -When a match succeeds, the substrings of the subject string -that match captures are stored (<em>captured</em>) for future use. -Captures are numbered according to their left parentheses. -For instance, in the pattern <code>"(a*(.)%w(%s*))"</code>, -the part of the string matching <code>"a*(.)%w(%s*)"</code> is -stored as the first capture (and therefore has number 1); -the character matching "<code>.</code>" is captured with number 2, -and the part matching "<code>%s*</code>" has number 3. - - -<p> -As a special case, the empty capture <code>()</code> captures -the current string position (a number). -For instance, if we apply the pattern <code>"()aa()"</code> on the -string <code>"flaaap"</code>, there will be two captures: 3 and 5. - - - - - - - -<h3>6.4.2 – <a name="6.4.2">Format Strings for Pack and Unpack</a></h3> - -<p> -The first argument to <a href="#pdf-string.pack"><code>string.pack</code></a>, -<a href="#pdf-string.packsize"><code>string.packsize</code></a>, and <a href="#pdf-string.unpack"><code>string.unpack</code></a> -is a format string, -which describes the layout of the structure being created or read. - - -<p> -A format string is a sequence of conversion options. -The conversion options are as follows: - -<ul> -<li><b><code><</code>: </b>sets little endian</li> -<li><b><code>></code>: </b>sets big endian</li> -<li><b><code>=</code>: </b>sets native endian</li> -<li><b><code>![<em>n</em>]</code>: </b>sets maximum alignment to <code>n</code> -(default is native alignment)</li> -<li><b><code>b</code>: </b>a signed byte (<code>char</code>)</li> -<li><b><code>B</code>: </b>an unsigned byte (<code>char</code>)</li> -<li><b><code>h</code>: </b>a signed <code>short</code> (native size)</li> -<li><b><code>H</code>: </b>an unsigned <code>short</code> (native size)</li> -<li><b><code>l</code>: </b>a signed <code>long</code> (native size)</li> -<li><b><code>L</code>: </b>an unsigned <code>long</code> (native size)</li> -<li><b><code>j</code>: </b>a <code>lua_Integer</code></li> -<li><b><code>J</code>: </b>a <code>lua_Unsigned</code></li> -<li><b><code>T</code>: </b>a <code>size_t</code> (native size)</li> -<li><b><code>i[<em>n</em>]</code>: </b>a signed <code>int</code> with <code>n</code> bytes -(default is native size)</li> -<li><b><code>I[<em>n</em>]</code>: </b>an unsigned <code>int</code> with <code>n</code> bytes -(default is native size)</li> -<li><b><code>f</code>: </b>a <code>float</code> (native size)</li> -<li><b><code>d</code>: </b>a <code>double</code> (native size)</li> -<li><b><code>n</code>: </b>a <code>lua_Number</code></li> -<li><b><code>c<em>n</em></code>: </b>a fixed-sized string with <code>n</code> bytes</li> -<li><b><code>z</code>: </b>a zero-terminated string</li> -<li><b><code>s[<em>n</em>]</code>: </b>a string preceded by its length -coded as an unsigned integer with <code>n</code> bytes -(default is a <code>size_t</code>)</li> -<li><b><code>x</code>: </b>one byte of padding</li> -<li><b><code>X<em>op</em></code>: </b>an empty item that aligns -according to option <code>op</code> -(which is otherwise ignored)</li> -<li><b>'<code> </code>': </b>(empty space) ignored</li> -</ul><p> -(A "<code>[<em>n</em>]</code>" means an optional integral numeral.) -Except for padding, spaces, and configurations -(options "<code>xX <=>!</code>"), -each option corresponds to an argument (in <a href="#pdf-string.pack"><code>string.pack</code></a>) -or a result (in <a href="#pdf-string.unpack"><code>string.unpack</code></a>). - - -<p> -For options "<code>!<em>n</em></code>", "<code>s<em>n</em></code>", "<code>i<em>n</em></code>", and "<code>I<em>n</em></code>", -<code>n</code> can be any integer between 1 and 16. -All integral options check overflows; -<a href="#pdf-string.pack"><code>string.pack</code></a> checks whether the given value fits in the given size; -<a href="#pdf-string.unpack"><code>string.unpack</code></a> checks whether the read value fits in a Lua integer. - - -<p> -Any format string starts as if prefixed by "<code>!1=</code>", -that is, -with maximum alignment of 1 (no alignment) -and native endianness. - - -<p> -Alignment works as follows: -For each option, -the format gets extra padding until the data starts -at an offset that is a multiple of the minimum between the -option size and the maximum alignment; -this minimum must be a power of 2. -Options "<code>c</code>" and "<code>z</code>" are not aligned; -option "<code>s</code>" follows the alignment of its starting integer. - - -<p> -All padding is filled with zeros by <a href="#pdf-string.pack"><code>string.pack</code></a> -(and ignored by <a href="#pdf-string.unpack"><code>string.unpack</code></a>). - - - - - - - -<h2>6.5 – <a name="6.5">UTF-8 Support</a></h2> - -<p> -This library provides basic support for UTF-8 encoding. -It provides all its functions inside the table <a name="pdf-utf8"><code>utf8</code></a>. -This library does not provide any support for Unicode other -than the handling of the encoding. -Any operation that needs the meaning of a character, -such as character classification, is outside its scope. - - -<p> -Unless stated otherwise, -all functions that expect a byte position as a parameter -assume that the given position is either the start of a byte sequence -or one plus the length of the subject string. -As in the string library, -negative indices count from the end of the string. - - -<p> -<hr><h3><a name="pdf-utf8.char"><code>utf8.char (···)</code></a></h3> -Receives zero or more integers, -converts each one to its corresponding UTF-8 byte sequence -and returns a string with the concatenation of all these sequences. - - - - -<p> -<hr><h3><a name="pdf-utf8.charpattern"><code>utf8.charpattern</code></a></h3> -The pattern (a string, not a function) "<code>[\0-\x7F\xC2-\xF4][\x80-\xBF]*</code>" -(see <a href="#6.4.1">§6.4.1</a>), -which matches exactly one UTF-8 byte sequence, -assuming that the subject is a valid UTF-8 string. - - - - -<p> -<hr><h3><a name="pdf-utf8.codes"><code>utf8.codes (s)</code></a></h3> - - -<p> -Returns values so that the construction - -<pre> - for p, c in utf8.codes(s) do <em>body</em> end -</pre><p> -will iterate over all characters in string <code>s</code>, -with <code>p</code> being the position (in bytes) and <code>c</code> the code point -of each character. -It raises an error if it meets any invalid byte sequence. - - - - -<p> -<hr><h3><a name="pdf-utf8.codepoint"><code>utf8.codepoint (s [, i [, j]])</code></a></h3> -Returns the codepoints (as integers) from all characters in <code>s</code> -that start between byte position <code>i</code> and <code>j</code> (both included). -The default for <code>i</code> is 1 and for <code>j</code> is <code>i</code>. -It raises an error if it meets any invalid byte sequence. - - - - -<p> -<hr><h3><a name="pdf-utf8.len"><code>utf8.len (s [, i [, j]])</code></a></h3> -Returns the number of UTF-8 characters in string <code>s</code> -that start between positions <code>i</code> and <code>j</code> (both inclusive). -The default for <code>i</code> is 1 and for <code>j</code> is -1. -If it finds any invalid byte sequence, -returns a false value plus the position of the first invalid byte. - - - - -<p> -<hr><h3><a name="pdf-utf8.offset"><code>utf8.offset (s, n [, i])</code></a></h3> -Returns the position (in bytes) where the encoding of the -<code>n</code>-th character of <code>s</code> -(counting from position <code>i</code>) starts. -A negative <code>n</code> gets characters before position <code>i</code>. -The default for <code>i</code> is 1 when <code>n</code> is non-negative -and <code>#s + 1</code> otherwise, -so that <code>utf8.offset(s, -n)</code> gets the offset of the -<code>n</code>-th character from the end of the string. -If the specified character is neither in the subject -nor right after its end, -the function returns <b>nil</b>. - - -<p> -As a special case, -when <code>n</code> is 0 the function returns the start of the encoding -of the character that contains the <code>i</code>-th byte of <code>s</code>. - - -<p> -This function assumes that <code>s</code> is a valid UTF-8 string. - - - -