Mercurial Hosting > luan
comparison website/src/manual.html.luan @ 556:d02f43598ba3
finish String documentation
| author | Franklin Schmidt <fschmidt@gmail.com> |
|---|---|
| date | Fri, 19 Jun 2015 19:39:41 -0600 |
| parents | e25ba7a2e816 |
| children | 7cc9d4a53d3b |
comparison
equal
deleted
inserted
replaced
| 555:e25ba7a2e816 | 556:d02f43598ba3 |
|---|---|
| 2157 If the original metatable has a <tt>"__metatable"</tt> field, | 2157 If the original metatable has a <tt>"__metatable"</tt> field, |
| 2158 raises an error. | 2158 raises an error. |
| 2159 | 2159 |
| 2160 | 2160 |
| 2161 | 2161 |
| 2162 | |
| 2163 <p> | |
| 2164 <hr><h3><a name="pdf-tonumber"><code>tonumber (e [, base])</code></a></h3> | |
| 2165 | |
| 2166 | |
| 2167 <p> | |
| 2168 When called with no <code>base</code>, | |
| 2169 <code>tonumber</code> tries to convert its argument to a number. | |
| 2170 If the argument is already a number or | |
| 2171 a string convertible to a number, | |
| 2172 then <code>tonumber</code> returns this number; | |
| 2173 otherwise, it returns <b>nil</b>. | |
| 2174 | |
| 2175 | |
| 2176 <p> | |
| 2177 The conversion of strings can result in integers or floats, | |
| 2178 according to the lexical conventions of Lua (see <a href="#3.1">§3.1</a>). | |
| 2179 (The string may have leading and trailing spaces and a sign.) | |
| 2180 | |
| 2181 | |
| 2182 <p> | |
| 2183 When called with <code>base</code>, | |
| 2184 then <code>e</code> must be a string to be interpreted as | |
| 2185 an integer numeral in that base. | |
| 2186 The base may be any integer between 2 and 36, inclusive. | |
| 2187 In bases above 10, the letter '<code>A</code>' (in either upper or lower case) | |
| 2188 represents 10, '<code>B</code>' represents 11, and so forth, | |
| 2189 with '<code>Z</code>' representing 35. | |
| 2190 If the string <code>e</code> is not a valid numeral in the given base, | |
| 2191 the function returns <b>nil</b>. | |
| 2192 | |
| 2193 | |
| 2194 | |
| 2195 | |
| 2196 <h4 <%=heading_options%> ><a name="Luan.to_string"><tt>Luan.to_string (v)</tt></a></h4> | 2162 <h4 <%=heading_options%> ><a name="Luan.to_string"><tt>Luan.to_string (v)</tt></a></h4> |
| 2197 | 2163 |
| 2198 <p> | 2164 <p> |
| 2199 Receives a value of any type and | 2165 Receives a value of any type and |
| 2200 converts it to a string in a human-readable format. | 2166 converts it to a string in a human-readable format. |
| 2365 from the end of the string. | 2331 from the end of the string. |
| 2366 Thus, the last character is at position -1, and so on. | 2332 Thus, the last character is at position -1, and so on. |
| 2367 | 2333 |
| 2368 | 2334 |
| 2369 | 2335 |
| 2370 | |
| 2371 <p> | |
| 2372 <hr><h3><a name="pdf-string.byte"><code>string.byte (s [, i [, j]])</code></a></h3> | |
| 2373 Returns the internal numerical codes of the characters <code>s[i]</code>, | |
| 2374 <code>s[i+1]</code>, ..., <code>s[j]</code>. | |
| 2375 The default value for <code>i</code> is 1; | |
| 2376 the default value for <code>j</code> is <code>i</code>. | |
| 2377 These indices are corrected | |
| 2378 following the same rules of function <a href="#pdf-string.sub"><code>string.sub</code></a>. | |
| 2379 | |
| 2380 | |
| 2381 <p> | |
| 2382 Numerical codes are not necessarily portable across platforms. | |
| 2383 | |
| 2384 | |
| 2385 | |
| 2386 <h4 <%=heading_options%> ><a name="String.char"><tt>String.char (···)</tt></a></h4> | 2336 <h4 <%=heading_options%> ><a name="String.char"><tt>String.char (···)</tt></a></h4> |
| 2387 | 2337 |
| 2388 <p> | 2338 <p> |
| 2389 Receives zero or more integers. | 2339 Receives zero or more integers. |
| 2390 Returns a string with length equal to the number of arguments, | 2340 Returns a string with length equal to the number of arguments, |
| 2409 | 2359 |
| 2410 <h4 <%=heading_options%> ><a name="String.find"><tt>String.find (s, pattern [, init [, plain]])</tt></a></h4> | 2360 <h4 <%=heading_options%> ><a name="String.find"><tt>String.find (s, pattern [, init [, plain]])</tt></a></h4> |
| 2411 | 2361 |
| 2412 <p> | 2362 <p> |
| 2413 Looks for the first match of | 2363 Looks for the first match of |
| 2414 <tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) in the string <tt>s</tt>. | 2364 <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) in the string <tt>s</tt>. |
| 2415 If it finds a match, then <tt>find</tt> returns the indices of <tt>s</tt> | 2365 If it finds a match, then <tt>find</tt> returns the indices of <tt>s</tt> |
| 2416 where this occurrence starts and ends; | 2366 where this occurrence starts and ends; |
| 2417 otherwise, it returns <b>nil</b>. | 2367 otherwise, it returns <b>nil</b>. |
| 2418 A third, optional numerical argument <tt>init</tt> specifies | 2368 A third, optional numerical argument <tt>init</tt> specifies |
| 2419 where to start the search; | 2369 where to start the search; |
| 2449 <h4 <%=heading_options%> ><a name="String.gmatch"><tt>String.gmatch (s, pattern)</tt></a></h4> | 2399 <h4 <%=heading_options%> ><a name="String.gmatch"><tt>String.gmatch (s, pattern)</tt></a></h4> |
| 2450 | 2400 |
| 2451 <p> | 2401 <p> |
| 2452 Returns an iterator function that, | 2402 Returns an iterator function that, |
| 2453 each time it is called, | 2403 each time it is called, |
| 2454 returns the next captures from <tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) | 2404 returns the next captures from <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) |
| 2455 over the string <tt>s</tt>. | 2405 over the string <tt>s</tt>. |
| 2456 If <tt>pattern</tt> specifies no captures, | 2406 If <tt>pattern</tt> specifies no captures, |
| 2457 then the whole match is produced in each call. | 2407 then the whole match is produced in each call. |
| 2458 | 2408 |
| 2459 | 2409 |
| 2490 <h4 <%=heading_options%> ><a name="String.gsub"><tt>String.gsub (s, pattern, repl [, n])</tt></a></h4> | 2440 <h4 <%=heading_options%> ><a name="String.gsub"><tt>String.gsub (s, pattern, repl [, n])</tt></a></h4> |
| 2491 | 2441 |
| 2492 <p> | 2442 <p> |
| 2493 Returns a copy of <tt>s</tt> | 2443 Returns a copy of <tt>s</tt> |
| 2494 in which all (or the first <tt>n</tt>, if given) | 2444 in which all (or the first <tt>n</tt>, if given) |
| 2495 occurrences of the <tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) have been | 2445 occurrences of the <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) have been |
| 2496 replaced by a replacement string specified by <tt>repl</tt>, | 2446 replaced by a replacement string specified by <tt>repl</tt>, |
| 2497 which can be a string, a table, or a function. | 2447 which can be a string, a table, or a function. |
| 2498 <tt>gsub</tt> also returns, as its second value, | 2448 <tt>gsub</tt> also returns, as its second value, |
| 2499 the total number of matches that occurred. | 2449 the total number of matches that occurred. |
| 2500 The name <tt>gsub</tt> comes from <i>Global SUBstitution</i>. | 2450 The name <tt>gsub</tt> comes from <i>Global SUBstitution</i>. |
| 2558 --> x="lua-5.3.tar.gz" | 2508 --> x="lua-5.3.tar.gz" |
| 2559 </pre></tt></p> | 2509 </pre></tt></p> |
| 2560 | 2510 |
| 2561 | 2511 |
| 2562 | 2512 |
| 2513 <h4 <%=heading_options%> ><a name="String.literal"><tt>String.literal (s)</tt></a></h4> | |
| 2514 <p> | |
| 2515 Returns a string which matches the literal string <tt>s</tt> in a regular expression. This function is simply the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#quote(java.lang.String)"><tt>Pattern.quote</tt></a>. | |
| 2516 | |
| 2517 | |
| 2563 <h4 <%=heading_options%> ><a name="String.lower"><tt>String.lower (s)</tt></a></h4> | 2518 <h4 <%=heading_options%> ><a name="String.lower"><tt>String.lower (s)</tt></a></h4> |
| 2564 <p> | 2519 <p> |
| 2565 Receives a string and returns a copy of this string with all | 2520 Receives a string and returns a copy of this string with all |
| 2566 uppercase letters changed to lowercase. | 2521 uppercase letters changed to lowercase. |
| 2567 All other characters are left unchanged. | 2522 All other characters are left unchanged. |
| 2568 | 2523 |
| 2569 | 2524 |
| 2570 | 2525 |
| 2571 | 2526 |
| 2572 <p> | 2527 <h4 <%=heading_options%> ><a name="String.match"><tt>String.match (s, pattern [, init])</tt></a></h4> |
| 2573 <hr><h3><a name="pdf-string.match"><code>string.match (s, pattern [, init])</code></a></h3> | 2528 |
| 2574 Looks for the first <em>match</em> of | 2529 <p> |
| 2575 <code>pattern</code> (see <a href="#6.4.1">§6.4.1</a>) in the string <code>s</code>. | 2530 Looks for the first <i>match</i> of |
| 2576 If it finds one, then <code>match</code> returns | 2531 <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) in the string <tt>s</tt>. |
| 2532 If it finds one, then <tt>match</tt> returns | |
| 2577 the captures from the pattern; | 2533 the captures from the pattern; |
| 2578 otherwise it returns <b>nil</b>. | 2534 otherwise it returns <b>nil</b>. |
| 2579 If <code>pattern</code> specifies no captures, | 2535 If <tt>pattern</tt> specifies no captures, |
| 2580 then the whole match is returned. | 2536 then the whole match is returned. |
| 2581 A third, optional numerical argument <code>init</code> specifies | 2537 A third, optional numerical argument <tt>init</tt> specifies |
| 2582 where to start the search; | 2538 where to start the search; |
| 2583 its default value is 1 and can be negative. | 2539 its default value is 1 and can be negative. |
| 2584 | 2540 |
| 2585 | 2541 |
| 2586 | 2542 <h4 <%=heading_options%> ><a name="String.matches"><tt>String.matches (s, pattern)</tt></a></h4> |
| 2587 | 2543 <p> |
| 2588 <p> | 2544 Returns a boolean indicating whether the entire string <tt>s</tt> matches <tt>pattern</tt>. |
| 2589 <hr><h3><a name="pdf-string.pack"><code>string.pack (fmt, v1, v2, ···)</code></a></h3> | 2545 |
| 2590 | 2546 |
| 2591 | 2547 |
| 2592 <p> | 2548 <h4 <%=heading_options%> ><a name="String.rep"><tt>String.rep (s, n [, sep])</tt></a></h4> |
| 2593 Returns a binary string containing the values <code>v1</code>, <code>v2</code>, etc. | 2549 <p> |
| 2594 packed (that is, serialized in binary form) | 2550 Returns a string that is the concatenation of <tt>n</tt> copies of |
| 2595 according to the format string <code>fmt</code> (see <a href="#6.4.2">§6.4.2</a>). | 2551 the string <tt>s</tt> separated by the string <tt>sep</tt>. |
| 2596 | 2552 The default value for <tt>sep</tt> is the empty string |
| 2597 | |
| 2598 | |
| 2599 | |
| 2600 <p> | |
| 2601 <hr><h3><a name="pdf-string.packsize"><code>string.packsize (fmt)</code></a></h3> | |
| 2602 | |
| 2603 | |
| 2604 <p> | |
| 2605 Returns the size of a string resulting from <a href="#pdf-string.pack"><code>string.pack</code></a> | |
| 2606 with the given format. | |
| 2607 The format string cannot have the variable-length options | |
| 2608 '<code>s</code>' or '<code>z</code>' (see <a href="#6.4.2">§6.4.2</a>). | |
| 2609 | |
| 2610 | |
| 2611 | |
| 2612 | |
| 2613 <p> | |
| 2614 <hr><h3><a name="pdf-string.rep"><code>string.rep (s, n [, sep])</code></a></h3> | |
| 2615 Returns a string that is the concatenation of <code>n</code> copies of | |
| 2616 the string <code>s</code> separated by the string <code>sep</code>. | |
| 2617 The default value for <code>sep</code> is the empty string | |
| 2618 (that is, no separator). | 2553 (that is, no separator). |
| 2619 Returns the empty string if <code>n</code> is not positive. | 2554 Returns the empty string if <tt>n</tt> is not positive. |
| 2620 | 2555 |
| 2621 | 2556 |
| 2622 | 2557 |
| 2623 | 2558 |
| 2624 <p> | 2559 <h4 <%=heading_options%> ><a name="String.reverse"><tt>String.reverse (s)</tt></a></h4> |
| 2625 <hr><h3><a name="pdf-string.reverse"><code>string.reverse (s)</code></a></h3> | 2560 <p> |
| 2626 Returns a string that is the string <code>s</code> reversed. | 2561 Returns a string that is the string <tt>s</tt> reversed. |
| 2627 | 2562 |
| 2628 | 2563 |
| 2629 | 2564 |
| 2630 | 2565 |
| 2631 <p> | 2566 <h4 <%=heading_options%> ><a name="String.sub"><tt>String.sub (s, i [, j])</tt></a></h4> |
| 2632 <hr><h3><a name="pdf-string.sub"><code>string.sub (s, i [, j])</code></a></h3> | 2567 |
| 2633 Returns the substring of <code>s</code> that | 2568 <p> |
| 2634 starts at <code>i</code> and continues until <code>j</code>; | 2569 Returns the substring of <tt>s</tt> that |
| 2635 <code>i</code> and <code>j</code> can be negative. | 2570 starts at <tt>i</tt> and continues until <tt>j</tt>; |
| 2636 If <code>j</code> is absent, then it is assumed to be equal to -1 | 2571 <tt>i</tt> and <tt>j</tt> can be negative. |
| 2572 If <tt>j</tt> is absent, then it is assumed to be equal to -1 | |
| 2637 (which is the same as the string length). | 2573 (which is the same as the string length). |
| 2638 In particular, | 2574 In particular, |
| 2639 the call <code>string.sub(s,1,j)</code> returns a prefix of <code>s</code> | 2575 the call <tt>string.sub(s,1,j)</tt> returns a prefix of <tt>s</tt> |
| 2640 with length <code>j</code>, | 2576 with length <tt>j</tt>, |
| 2641 and <code>string.sub(s, -i)</code> returns a suffix of <code>s</code> | 2577 and <tt>string.sub(s, -i)</tt> returns a suffix of <tt>s</tt> |
| 2642 with length <code>i</code>. | 2578 with length <tt>i</tt>. |
| 2643 | 2579 |
| 2644 | 2580 |
| 2645 <p> | 2581 <p> |
| 2646 If, after the translation of negative indices, | 2582 If, after the translation of negative indices, |
| 2647 <code>i</code> is less than 1, | 2583 <tt>i</tt> is less than 1, |
| 2648 it is corrected to 1. | 2584 it is corrected to 1. |
| 2649 If <code>j</code> is greater than the string length, | 2585 If <tt>j</tt> is greater than the string length, |
| 2650 it is corrected to that length. | 2586 it is corrected to that length. |
| 2651 If, after these corrections, | 2587 If, after these corrections, |
| 2652 <code>i</code> is greater than <code>j</code>, | 2588 <tt>i</tt> is greater than <tt>j</tt>, |
| 2653 the function returns the empty string. | 2589 the function returns the empty string. |
| 2654 | 2590 |
| 2655 | 2591 |
| 2656 | 2592 |
| 2657 | 2593 <h4 <%=heading_options%> ><a name="String.to_binary"><tt>String.to_binary (s)</tt></a></h4> |
| 2658 <p> | 2594 |
| 2659 <hr><h3><a name="pdf-string.unpack"><code>string.unpack (fmt, s [, pos])</code></a></h3> | 2595 <p> |
| 2660 | 2596 Converts a string to a binary by calling the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()"><tt>String.getBytes</tt></a>. |
| 2661 | 2597 |
| 2662 <p> | 2598 |
| 2663 Returns the values packed in string <code>s</code> (see <a href="#pdf-string.pack"><code>string.pack</code></a>) | 2599 |
| 2664 according to the format string <code>fmt</code> (see <a href="#6.4.2">§6.4.2</a>). | 2600 <h4 <%=heading_options%> ><a name="String.to_number"><tt>String.to_number (s [, base])</tt></a></h4> |
| 2665 An optional <code>pos</code> marks where | 2601 |
| 2666 to start reading in <code>s</code> (default is 1). | 2602 <p> |
| 2667 After the read values, | 2603 When called with no <tt>base</tt>, |
| 2668 this function also returns the index of the first unread byte in <code>s</code>. | 2604 <tt>to_number</tt> tries to convert its argument to a number. |
| 2669 | 2605 If the argument is |
| 2670 | 2606 a string convertible to a number, |
| 2671 | 2607 then <tt>to_number</tt> returns this number; |
| 2672 | 2608 otherwise, it returns <b>nil</b>. |
| 2673 <p> | 2609 |
| 2674 <hr><h3><a name="pdf-string.upper"><code>string.upper (s)</code></a></h3> | 2610 The conversion of strings can result in integers or floats. |
| 2611 | |
| 2612 | |
| 2613 <p> | |
| 2614 When called with <tt>base</tt>, | |
| 2615 then <tt>s</tt> must be a string to be interpreted as | |
| 2616 an integer numeral in that base. | |
| 2617 In bases above 10, the letter '<tt>A</tt>' (in either upper or lower case) | |
| 2618 represents 10, '<tt>B</tt>' represents 11, and so forth, | |
| 2619 with '<tt>Z</tt>' representing 35. | |
| 2620 If the string <tt>s</tt> is not a valid numeral in the given base, | |
| 2621 the function returns <b>nil</b>. | |
| 2622 | |
| 2623 | |
| 2624 | |
| 2625 <h4 <%=heading_options%> ><a name="String.trim"><tt>String.trim (s)</tt></a></h4> | |
| 2626 | |
| 2627 <p> | |
| 2628 Removes the leading and trailing whitespace by calling the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#trim()"><tt>String.trim</tt></a>. | |
| 2629 | |
| 2630 | |
| 2631 | |
| 2632 | |
| 2633 <h4 <%=heading_options%> ><a name="String.unicode"><tt>String.unicode (s [, i [, j]])</tt></a></h4> | |
| 2634 | |
| 2635 <p> | |
| 2636 Returns the internal numerical codes of the characters <tt>s[i]</tt>, | |
| 2637 <tt>s[i+1]</tt>, ..., <tt>s[j]</tt>. | |
| 2638 The default value for <tt>i</tt> is 1; | |
| 2639 the default value for <tt>j</tt> is <tt>i</tt>. | |
| 2640 These indices are corrected | |
| 2641 following the same rules of function <a href="#String.sub"><tt>String.sub</tt></a>. | |
| 2642 | |
| 2643 | |
| 2644 | |
| 2645 | |
| 2646 | |
| 2647 <h4 <%=heading_options%> ><a name="String.upper"><tt>String.upper (s)</tt></a></h4> | |
| 2648 <p> | |
| 2675 Receives a string and returns a copy of this string with all | 2649 Receives a string and returns a copy of this string with all |
| 2676 lowercase letters changed to uppercase. | 2650 lowercase letters changed to uppercase. |
| 2677 All other characters are left unchanged. | 2651 All other characters are left unchanged. |
| 2678 The definition of what a lowercase letter is depends on the current locale. | 2652 The definition of what a lowercase letter is depends on the current locale. |
| 2679 | 2653 |
| 2680 | |
| 2681 | |
| 2682 | |
| 2683 | |
| 2684 <h3>6.4.1 – <a name="6.4.1">Patterns</a></h3> | |
| 2685 | |
| 2686 <p> | |
| 2687 Patterns in Lua are described by regular strings, | |
| 2688 which are interpreted as patterns by the pattern-matching functions | |
| 2689 <a href="#pdf-string.find"><code>string.find</code></a>, | |
| 2690 <a href="#pdf-string.gmatch"><code>string.gmatch</code></a>, | |
| 2691 <a href="#pdf-string.gsub"><code>string.gsub</code></a>, | |
| 2692 and <a href="#pdf-string.match"><code>string.match</code></a>. | |
| 2693 This section describes the syntax and the meaning | |
| 2694 (that is, what they match) of these strings. | |
| 2695 | |
| 2696 | |
| 2697 | |
| 2698 <h4>Character Class:</h4><p> | |
| 2699 A <em>character class</em> is used to represent a set of characters. | |
| 2700 The following combinations are allowed in describing a character class: | |
| 2701 | |
| 2702 <ul> | |
| 2703 | |
| 2704 <li><b><em>x</em>: </b> | |
| 2705 (where <em>x</em> is not one of the <em>magic characters</em> | |
| 2706 <code>^$()%.[]*+-?</code>) | |
| 2707 represents the character <em>x</em> itself. | |
| 2708 </li> | |
| 2709 | |
| 2710 <li><b><code>.</code>: </b> (a dot) represents all characters.</li> | |
| 2711 | |
| 2712 <li><b><code>%a</code>: </b> represents all letters.</li> | |
| 2713 | |
| 2714 <li><b><code>%c</code>: </b> represents all control characters.</li> | |
| 2715 | |
| 2716 <li><b><code>%d</code>: </b> represents all digits.</li> | |
| 2717 | |
| 2718 <li><b><code>%g</code>: </b> represents all printable characters except space.</li> | |
| 2719 | |
| 2720 <li><b><code>%l</code>: </b> represents all lowercase letters.</li> | |
| 2721 | |
| 2722 <li><b><code>%p</code>: </b> represents all punctuation characters.</li> | |
| 2723 | |
| 2724 <li><b><code>%s</code>: </b> represents all space characters.</li> | |
| 2725 | |
| 2726 <li><b><code>%u</code>: </b> represents all uppercase letters.</li> | |
| 2727 | |
| 2728 <li><b><code>%w</code>: </b> represents all alphanumeric characters.</li> | |
| 2729 | |
| 2730 <li><b><code>%x</code>: </b> represents all hexadecimal digits.</li> | |
| 2731 | |
| 2732 <li><b><code>%<em>x</em></code>: </b> (where <em>x</em> is any non-alphanumeric character) | |
| 2733 represents the character <em>x</em>. | |
| 2734 This is the standard way to escape the magic characters. | |
| 2735 Any non-alphanumeric character | |
| 2736 (including all punctuations, even the non-magical) | |
| 2737 can be preceded by a '<code>%</code>' | |
| 2738 when used to represent itself in a pattern. | |
| 2739 </li> | |
| 2740 | |
| 2741 <li><b><code>[<em>set</em>]</code>: </b> | |
| 2742 represents the class which is the union of all | |
| 2743 characters in <em>set</em>. | |
| 2744 A range of characters can be specified by | |
| 2745 separating the end characters of the range, | |
| 2746 in ascending order, with a '<code>-</code>'. | |
| 2747 All classes <code>%</code><em>x</em> described above can also be used as | |
| 2748 components in <em>set</em>. | |
| 2749 All other characters in <em>set</em> represent themselves. | |
| 2750 For example, <code>[%w_]</code> (or <code>[_%w]</code>) | |
| 2751 represents all alphanumeric characters plus the underscore, | |
| 2752 <code>[0-7]</code> represents the octal digits, | |
| 2753 and <code>[0-7%l%-]</code> represents the octal digits plus | |
| 2754 the lowercase letters plus the '<code>-</code>' character. | |
| 2755 | |
| 2756 | |
| 2757 <p> | |
| 2758 The interaction between ranges and classes is not defined. | |
| 2759 Therefore, patterns like <code>[%a-z]</code> or <code>[a-%%]</code> | |
| 2760 have no meaning. | |
| 2761 </li> | |
| 2762 | |
| 2763 <li><b><code>[^<em>set</em>]</code>: </b> | |
| 2764 represents the complement of <em>set</em>, | |
| 2765 where <em>set</em> is interpreted as above. | |
| 2766 </li> | |
| 2767 | |
| 2768 </ul><p> | |
| 2769 For all classes represented by single letters (<code>%a</code>, <code>%c</code>, etc.), | |
| 2770 the corresponding uppercase letter represents the complement of the class. | |
| 2771 For instance, <code>%S</code> represents all non-space characters. | |
| 2772 | |
| 2773 | |
| 2774 <p> | |
| 2775 The definitions of letter, space, and other character groups | |
| 2776 depend on the current locale. | |
| 2777 In particular, the class <code>[a-z]</code> may not be equivalent to <code>%l</code>. | |
| 2778 | |
| 2779 | |
| 2780 | |
| 2781 | |
| 2782 | |
| 2783 <h4>Pattern Item:</h4><p> | |
| 2784 A <em>pattern item</em> can be | |
| 2785 | |
| 2786 <ul> | |
| 2787 | |
| 2788 <li> | |
| 2789 a single character class, | |
| 2790 which matches any single character in the class; | |
| 2791 </li> | |
| 2792 | |
| 2793 <li> | |
| 2794 a single character class followed by '<code>*</code>', | |
| 2795 which matches zero or more repetitions of characters in the class. | |
| 2796 These repetition items will always match the longest possible sequence; | |
| 2797 </li> | |
| 2798 | |
| 2799 <li> | |
| 2800 a single character class followed by '<code>+</code>', | |
| 2801 which matches one or more repetitions of characters in the class. | |
| 2802 These repetition items will always match the longest possible sequence; | |
| 2803 </li> | |
| 2804 | |
| 2805 <li> | |
| 2806 a single character class followed by '<code>-</code>', | |
| 2807 which also matches zero or more repetitions of characters in the class. | |
| 2808 Unlike '<code>*</code>', | |
| 2809 these repetition items will always match the shortest possible sequence; | |
| 2810 </li> | |
| 2811 | |
| 2812 <li> | |
| 2813 a single character class followed by '<code>?</code>', | |
| 2814 which matches zero or one occurrence of a character in the class. | |
| 2815 It always matches one occurrence if possible; | |
| 2816 </li> | |
| 2817 | |
| 2818 <li> | |
| 2819 <code>%<em>n</em></code>, for <em>n</em> between 1 and 9; | |
| 2820 such item matches a substring equal to the <em>n</em>-th captured string | |
| 2821 (see below); | |
| 2822 </li> | |
| 2823 | |
| 2824 <li> | |
| 2825 <code>%b<em>xy</em></code>, where <em>x</em> and <em>y</em> are two distinct characters; | |
| 2826 such item matches strings that start with <em>x</em>, end with <em>y</em>, | |
| 2827 and where the <em>x</em> and <em>y</em> are <em>balanced</em>. | |
| 2828 This means that, if one reads the string from left to right, | |
| 2829 counting <em>+1</em> for an <em>x</em> and <em>-1</em> for a <em>y</em>, | |
| 2830 the ending <em>y</em> is the first <em>y</em> where the count reaches 0. | |
| 2831 For instance, the item <code>%b()</code> matches expressions with | |
| 2832 balanced parentheses. | |
| 2833 </li> | |
| 2834 | |
| 2835 <li> | |
| 2836 <code>%f[<em>set</em>]</code>, a <em>frontier pattern</em>; | |
| 2837 such item matches an empty string at any position such that | |
| 2838 the next character belongs to <em>set</em> | |
| 2839 and the previous character does not belong to <em>set</em>. | |
| 2840 The set <em>set</em> is interpreted as previously described. | |
| 2841 The beginning and the end of the subject are handled as if | |
| 2842 they were the character '<code>\0</code>'. | |
| 2843 </li> | |
| 2844 | |
| 2845 </ul> | |
| 2846 | |
| 2847 | |
| 2848 | |
| 2849 | |
| 2850 <h4>Pattern:</h4><p> | |
| 2851 A <em>pattern</em> is a sequence of pattern items. | |
| 2852 A caret '<code>^</code>' at the beginning of a pattern anchors the match at the | |
| 2853 beginning of the subject string. | |
| 2854 A '<code>$</code>' at the end of a pattern anchors the match at the | |
| 2855 end of the subject string. | |
| 2856 At other positions, | |
| 2857 '<code>^</code>' and '<code>$</code>' have no special meaning and represent themselves. | |
| 2858 | |
| 2859 | |
| 2860 | |
| 2861 | |
| 2862 | |
| 2863 <h4>Captures:</h4><p> | |
| 2864 A pattern can contain sub-patterns enclosed in parentheses; | |
| 2865 they describe <em>captures</em>. | |
| 2866 When a match succeeds, the substrings of the subject string | |
| 2867 that match captures are stored (<em>captured</em>) for future use. | |
| 2868 Captures are numbered according to their left parentheses. | |
| 2869 For instance, in the pattern <code>"(a*(.)%w(%s*))"</code>, | |
| 2870 the part of the string matching <code>"a*(.)%w(%s*)"</code> is | |
| 2871 stored as the first capture (and therefore has number 1); | |
| 2872 the character matching "<code>.</code>" is captured with number 2, | |
| 2873 and the part matching "<code>%s*</code>" has number 3. | |
| 2874 | |
| 2875 | |
| 2876 <p> | |
| 2877 As a special case, the empty capture <code>()</code> captures | |
| 2878 the current string position (a number). | |
| 2879 For instance, if we apply the pattern <code>"()aa()"</code> on the | |
| 2880 string <code>"flaaap"</code>, there will be two captures: 3 and 5. | |
| 2881 | |
| 2882 | |
| 2883 | |
| 2884 | |
| 2885 | |
| 2886 | |
| 2887 | |
| 2888 <h3>6.4.2 – <a name="6.4.2">Format Strings for Pack and Unpack</a></h3> | |
| 2889 | |
| 2890 <p> | |
| 2891 The first argument to <a href="#pdf-string.pack"><code>string.pack</code></a>, | |
| 2892 <a href="#pdf-string.packsize"><code>string.packsize</code></a>, and <a href="#pdf-string.unpack"><code>string.unpack</code></a> | |
| 2893 is a format string, | |
| 2894 which describes the layout of the structure being created or read. | |
| 2895 | |
| 2896 | |
| 2897 <p> | |
| 2898 A format string is a sequence of conversion options. | |
| 2899 The conversion options are as follows: | |
| 2900 | |
| 2901 <ul> | |
| 2902 <li><b><code><</code>: </b>sets little endian</li> | |
| 2903 <li><b><code>></code>: </b>sets big endian</li> | |
| 2904 <li><b><code>=</code>: </b>sets native endian</li> | |
| 2905 <li><b><code>![<em>n</em>]</code>: </b>sets maximum alignment to <code>n</code> | |
| 2906 (default is native alignment)</li> | |
| 2907 <li><b><code>b</code>: </b>a signed byte (<code>char</code>)</li> | |
| 2908 <li><b><code>B</code>: </b>an unsigned byte (<code>char</code>)</li> | |
| 2909 <li><b><code>h</code>: </b>a signed <code>short</code> (native size)</li> | |
| 2910 <li><b><code>H</code>: </b>an unsigned <code>short</code> (native size)</li> | |
| 2911 <li><b><code>l</code>: </b>a signed <code>long</code> (native size)</li> | |
| 2912 <li><b><code>L</code>: </b>an unsigned <code>long</code> (native size)</li> | |
| 2913 <li><b><code>j</code>: </b>a <code>lua_Integer</code></li> | |
| 2914 <li><b><code>J</code>: </b>a <code>lua_Unsigned</code></li> | |
| 2915 <li><b><code>T</code>: </b>a <code>size_t</code> (native size)</li> | |
| 2916 <li><b><code>i[<em>n</em>]</code>: </b>a signed <code>int</code> with <code>n</code> bytes | |
| 2917 (default is native size)</li> | |
| 2918 <li><b><code>I[<em>n</em>]</code>: </b>an unsigned <code>int</code> with <code>n</code> bytes | |
| 2919 (default is native size)</li> | |
| 2920 <li><b><code>f</code>: </b>a <code>float</code> (native size)</li> | |
| 2921 <li><b><code>d</code>: </b>a <code>double</code> (native size)</li> | |
| 2922 <li><b><code>n</code>: </b>a <code>lua_Number</code></li> | |
| 2923 <li><b><code>c<em>n</em></code>: </b>a fixed-sized string with <code>n</code> bytes</li> | |
| 2924 <li><b><code>z</code>: </b>a zero-terminated string</li> | |
| 2925 <li><b><code>s[<em>n</em>]</code>: </b>a string preceded by its length | |
| 2926 coded as an unsigned integer with <code>n</code> bytes | |
| 2927 (default is a <code>size_t</code>)</li> | |
| 2928 <li><b><code>x</code>: </b>one byte of padding</li> | |
| 2929 <li><b><code>X<em>op</em></code>: </b>an empty item that aligns | |
| 2930 according to option <code>op</code> | |
| 2931 (which is otherwise ignored)</li> | |
| 2932 <li><b>'<code> </code>': </b>(empty space) ignored</li> | |
| 2933 </ul><p> | |
| 2934 (A "<code>[<em>n</em>]</code>" means an optional integral numeral.) | |
| 2935 Except for padding, spaces, and configurations | |
| 2936 (options "<code>xX <=>!</code>"), | |
| 2937 each option corresponds to an argument (in <a href="#pdf-string.pack"><code>string.pack</code></a>) | |
| 2938 or a result (in <a href="#pdf-string.unpack"><code>string.unpack</code></a>). | |
| 2939 | |
| 2940 | |
| 2941 <p> | |
| 2942 For options "<code>!<em>n</em></code>", "<code>s<em>n</em></code>", "<code>i<em>n</em></code>", and "<code>I<em>n</em></code>", | |
| 2943 <code>n</code> can be any integer between 1 and 16. | |
| 2944 All integral options check overflows; | |
| 2945 <a href="#pdf-string.pack"><code>string.pack</code></a> checks whether the given value fits in the given size; | |
| 2946 <a href="#pdf-string.unpack"><code>string.unpack</code></a> checks whether the read value fits in a Lua integer. | |
| 2947 | |
| 2948 | |
| 2949 <p> | |
| 2950 Any format string starts as if prefixed by "<code>!1=</code>", | |
| 2951 that is, | |
| 2952 with maximum alignment of 1 (no alignment) | |
| 2953 and native endianness. | |
| 2954 | |
| 2955 | |
| 2956 <p> | |
| 2957 Alignment works as follows: | |
| 2958 For each option, | |
| 2959 the format gets extra padding until the data starts | |
| 2960 at an offset that is a multiple of the minimum between the | |
| 2961 option size and the maximum alignment; | |
| 2962 this minimum must be a power of 2. | |
| 2963 Options "<code>c</code>" and "<code>z</code>" are not aligned; | |
| 2964 option "<code>s</code>" follows the alignment of its starting integer. | |
| 2965 | |
| 2966 | |
| 2967 <p> | |
| 2968 All padding is filled with zeros by <a href="#pdf-string.pack"><code>string.pack</code></a> | |
| 2969 (and ignored by <a href="#pdf-string.unpack"><code>string.unpack</code></a>). | |
| 2970 | |
| 2971 | |
| 2972 | |
| 2973 | |
| 2974 | |
| 2975 | |
| 2976 | |
| 2977 <h2>6.5 – <a name="6.5">UTF-8 Support</a></h2> | |
| 2978 | |
| 2979 <p> | |
| 2980 This library provides basic support for UTF-8 encoding. | |
| 2981 It provides all its functions inside the table <a name="pdf-utf8"><code>utf8</code></a>. | |
| 2982 This library does not provide any support for Unicode other | |
| 2983 than the handling of the encoding. | |
| 2984 Any operation that needs the meaning of a character, | |
| 2985 such as character classification, is outside its scope. | |
| 2986 | |
| 2987 | |
| 2988 <p> | |
| 2989 Unless stated otherwise, | |
| 2990 all functions that expect a byte position as a parameter | |
| 2991 assume that the given position is either the start of a byte sequence | |
| 2992 or one plus the length of the subject string. | |
| 2993 As in the string library, | |
| 2994 negative indices count from the end of the string. | |
| 2995 | |
| 2996 | |
| 2997 <p> | |
| 2998 <hr><h3><a name="pdf-utf8.char"><code>utf8.char (···)</code></a></h3> | |
| 2999 Receives zero or more integers, | |
| 3000 converts each one to its corresponding UTF-8 byte sequence | |
| 3001 and returns a string with the concatenation of all these sequences. | |
| 3002 | |
| 3003 | |
| 3004 | |
| 3005 | |
| 3006 <p> | |
| 3007 <hr><h3><a name="pdf-utf8.charpattern"><code>utf8.charpattern</code></a></h3> | |
| 3008 The pattern (a string, not a function) "<code>[\0-\x7F\xC2-\xF4][\x80-\xBF]*</code>" | |
| 3009 (see <a href="#6.4.1">§6.4.1</a>), | |
| 3010 which matches exactly one UTF-8 byte sequence, | |
| 3011 assuming that the subject is a valid UTF-8 string. | |
| 3012 | |
| 3013 | |
| 3014 | |
| 3015 | |
| 3016 <p> | |
| 3017 <hr><h3><a name="pdf-utf8.codes"><code>utf8.codes (s)</code></a></h3> | |
| 3018 | |
| 3019 | |
| 3020 <p> | |
| 3021 Returns values so that the construction | |
| 3022 | |
| 3023 <pre> | |
| 3024 for p, c in utf8.codes(s) do <em>body</em> end | |
| 3025 </pre><p> | |
| 3026 will iterate over all characters in string <code>s</code>, | |
| 3027 with <code>p</code> being the position (in bytes) and <code>c</code> the code point | |
| 3028 of each character. | |
| 3029 It raises an error if it meets any invalid byte sequence. | |
| 3030 | |
| 3031 | |
| 3032 | |
| 3033 | |
| 3034 <p> | |
| 3035 <hr><h3><a name="pdf-utf8.codepoint"><code>utf8.codepoint (s [, i [, j]])</code></a></h3> | |
| 3036 Returns the codepoints (as integers) from all characters in <code>s</code> | |
| 3037 that start between byte position <code>i</code> and <code>j</code> (both included). | |
| 3038 The default for <code>i</code> is 1 and for <code>j</code> is <code>i</code>. | |
| 3039 It raises an error if it meets any invalid byte sequence. | |
| 3040 | |
| 3041 | |
| 3042 | |
| 3043 | |
| 3044 <p> | |
| 3045 <hr><h3><a name="pdf-utf8.len"><code>utf8.len (s [, i [, j]])</code></a></h3> | |
| 3046 Returns the number of UTF-8 characters in string <code>s</code> | |
| 3047 that start between positions <code>i</code> and <code>j</code> (both inclusive). | |
| 3048 The default for <code>i</code> is 1 and for <code>j</code> is -1. | |
| 3049 If it finds any invalid byte sequence, | |
| 3050 returns a false value plus the position of the first invalid byte. | |
| 3051 | |
| 3052 | |
| 3053 | |
| 3054 | |
| 3055 <p> | |
| 3056 <hr><h3><a name="pdf-utf8.offset"><code>utf8.offset (s, n [, i])</code></a></h3> | |
| 3057 Returns the position (in bytes) where the encoding of the | |
| 3058 <code>n</code>-th character of <code>s</code> | |
| 3059 (counting from position <code>i</code>) starts. | |
| 3060 A negative <code>n</code> gets characters before position <code>i</code>. | |
| 3061 The default for <code>i</code> is 1 when <code>n</code> is non-negative | |
| 3062 and <code>#s + 1</code> otherwise, | |
| 3063 so that <code>utf8.offset(s, -n)</code> gets the offset of the | |
| 3064 <code>n</code>-th character from the end of the string. | |
| 3065 If the specified character is neither in the subject | |
| 3066 nor right after its end, | |
| 3067 the function returns <b>nil</b>. | |
| 3068 | |
| 3069 | |
| 3070 <p> | |
| 3071 As a special case, | |
| 3072 when <code>n</code> is 0 the function returns the start of the encoding | |
| 3073 of the character that contains the <code>i</code>-th byte of <code>s</code>. | |
| 3074 | |
| 3075 | |
| 3076 <p> | |
| 3077 This function assumes that <code>s</code> is a valid UTF-8 string. | |
| 3078 | 2654 |
| 3079 | 2655 |
| 3080 | 2656 |
| 3081 | 2657 |
| 3082 | 2658 |
