Mercurial Hosting > luan
comparison website/src/manual.html.luan @ 556:d02f43598ba3
finish String documentation
author | Franklin Schmidt <fschmidt@gmail.com> |
---|---|
date | Fri, 19 Jun 2015 19:39:41 -0600 |
parents | e25ba7a2e816 |
children | 7cc9d4a53d3b |
comparison
equal
deleted
inserted
replaced
555:e25ba7a2e816 | 556:d02f43598ba3 |
---|---|
2157 If the original metatable has a <tt>"__metatable"</tt> field, | 2157 If the original metatable has a <tt>"__metatable"</tt> field, |
2158 raises an error. | 2158 raises an error. |
2159 | 2159 |
2160 | 2160 |
2161 | 2161 |
2162 | |
2163 <p> | |
2164 <hr><h3><a name="pdf-tonumber"><code>tonumber (e [, base])</code></a></h3> | |
2165 | |
2166 | |
2167 <p> | |
2168 When called with no <code>base</code>, | |
2169 <code>tonumber</code> tries to convert its argument to a number. | |
2170 If the argument is already a number or | |
2171 a string convertible to a number, | |
2172 then <code>tonumber</code> returns this number; | |
2173 otherwise, it returns <b>nil</b>. | |
2174 | |
2175 | |
2176 <p> | |
2177 The conversion of strings can result in integers or floats, | |
2178 according to the lexical conventions of Lua (see <a href="#3.1">§3.1</a>). | |
2179 (The string may have leading and trailing spaces and a sign.) | |
2180 | |
2181 | |
2182 <p> | |
2183 When called with <code>base</code>, | |
2184 then <code>e</code> must be a string to be interpreted as | |
2185 an integer numeral in that base. | |
2186 The base may be any integer between 2 and 36, inclusive. | |
2187 In bases above 10, the letter '<code>A</code>' (in either upper or lower case) | |
2188 represents 10, '<code>B</code>' represents 11, and so forth, | |
2189 with '<code>Z</code>' representing 35. | |
2190 If the string <code>e</code> is not a valid numeral in the given base, | |
2191 the function returns <b>nil</b>. | |
2192 | |
2193 | |
2194 | |
2195 | |
2196 <h4 <%=heading_options%> ><a name="Luan.to_string"><tt>Luan.to_string (v)</tt></a></h4> | 2162 <h4 <%=heading_options%> ><a name="Luan.to_string"><tt>Luan.to_string (v)</tt></a></h4> |
2197 | 2163 |
2198 <p> | 2164 <p> |
2199 Receives a value of any type and | 2165 Receives a value of any type and |
2200 converts it to a string in a human-readable format. | 2166 converts it to a string in a human-readable format. |
2365 from the end of the string. | 2331 from the end of the string. |
2366 Thus, the last character is at position -1, and so on. | 2332 Thus, the last character is at position -1, and so on. |
2367 | 2333 |
2368 | 2334 |
2369 | 2335 |
2370 | |
2371 <p> | |
2372 <hr><h3><a name="pdf-string.byte"><code>string.byte (s [, i [, j]])</code></a></h3> | |
2373 Returns the internal numerical codes of the characters <code>s[i]</code>, | |
2374 <code>s[i+1]</code>, ..., <code>s[j]</code>. | |
2375 The default value for <code>i</code> is 1; | |
2376 the default value for <code>j</code> is <code>i</code>. | |
2377 These indices are corrected | |
2378 following the same rules of function <a href="#pdf-string.sub"><code>string.sub</code></a>. | |
2379 | |
2380 | |
2381 <p> | |
2382 Numerical codes are not necessarily portable across platforms. | |
2383 | |
2384 | |
2385 | |
2386 <h4 <%=heading_options%> ><a name="String.char"><tt>String.char (···)</tt></a></h4> | 2336 <h4 <%=heading_options%> ><a name="String.char"><tt>String.char (···)</tt></a></h4> |
2387 | 2337 |
2388 <p> | 2338 <p> |
2389 Receives zero or more integers. | 2339 Receives zero or more integers. |
2390 Returns a string with length equal to the number of arguments, | 2340 Returns a string with length equal to the number of arguments, |
2409 | 2359 |
2410 <h4 <%=heading_options%> ><a name="String.find"><tt>String.find (s, pattern [, init [, plain]])</tt></a></h4> | 2360 <h4 <%=heading_options%> ><a name="String.find"><tt>String.find (s, pattern [, init [, plain]])</tt></a></h4> |
2411 | 2361 |
2412 <p> | 2362 <p> |
2413 Looks for the first match of | 2363 Looks for the first match of |
2414 <tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) in the string <tt>s</tt>. | 2364 <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) in the string <tt>s</tt>. |
2415 If it finds a match, then <tt>find</tt> returns the indices of <tt>s</tt> | 2365 If it finds a match, then <tt>find</tt> returns the indices of <tt>s</tt> |
2416 where this occurrence starts and ends; | 2366 where this occurrence starts and ends; |
2417 otherwise, it returns <b>nil</b>. | 2367 otherwise, it returns <b>nil</b>. |
2418 A third, optional numerical argument <tt>init</tt> specifies | 2368 A third, optional numerical argument <tt>init</tt> specifies |
2419 where to start the search; | 2369 where to start the search; |
2449 <h4 <%=heading_options%> ><a name="String.gmatch"><tt>String.gmatch (s, pattern)</tt></a></h4> | 2399 <h4 <%=heading_options%> ><a name="String.gmatch"><tt>String.gmatch (s, pattern)</tt></a></h4> |
2450 | 2400 |
2451 <p> | 2401 <p> |
2452 Returns an iterator function that, | 2402 Returns an iterator function that, |
2453 each time it is called, | 2403 each time it is called, |
2454 returns the next captures from <tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) | 2404 returns the next captures from <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) |
2455 over the string <tt>s</tt>. | 2405 over the string <tt>s</tt>. |
2456 If <tt>pattern</tt> specifies no captures, | 2406 If <tt>pattern</tt> specifies no captures, |
2457 then the whole match is produced in each call. | 2407 then the whole match is produced in each call. |
2458 | 2408 |
2459 | 2409 |
2490 <h4 <%=heading_options%> ><a name="String.gsub"><tt>String.gsub (s, pattern, repl [, n])</tt></a></h4> | 2440 <h4 <%=heading_options%> ><a name="String.gsub"><tt>String.gsub (s, pattern, repl [, n])</tt></a></h4> |
2491 | 2441 |
2492 <p> | 2442 <p> |
2493 Returns a copy of <tt>s</tt> | 2443 Returns a copy of <tt>s</tt> |
2494 in which all (or the first <tt>n</tt>, if given) | 2444 in which all (or the first <tt>n</tt>, if given) |
2495 occurrences of the <tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) have been | 2445 occurrences of the <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) have been |
2496 replaced by a replacement string specified by <tt>repl</tt>, | 2446 replaced by a replacement string specified by <tt>repl</tt>, |
2497 which can be a string, a table, or a function. | 2447 which can be a string, a table, or a function. |
2498 <tt>gsub</tt> also returns, as its second value, | 2448 <tt>gsub</tt> also returns, as its second value, |
2499 the total number of matches that occurred. | 2449 the total number of matches that occurred. |
2500 The name <tt>gsub</tt> comes from <i>Global SUBstitution</i>. | 2450 The name <tt>gsub</tt> comes from <i>Global SUBstitution</i>. |
2558 --> x="lua-5.3.tar.gz" | 2508 --> x="lua-5.3.tar.gz" |
2559 </pre></tt></p> | 2509 </pre></tt></p> |
2560 | 2510 |
2561 | 2511 |
2562 | 2512 |
2513 <h4 <%=heading_options%> ><a name="String.literal"><tt>String.literal (s)</tt></a></h4> | |
2514 <p> | |
2515 Returns a string which matches the literal string <tt>s</tt> in a regular expression. This function is simply the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#quote(java.lang.String)"><tt>Pattern.quote</tt></a>. | |
2516 | |
2517 | |
2563 <h4 <%=heading_options%> ><a name="String.lower"><tt>String.lower (s)</tt></a></h4> | 2518 <h4 <%=heading_options%> ><a name="String.lower"><tt>String.lower (s)</tt></a></h4> |
2564 <p> | 2519 <p> |
2565 Receives a string and returns a copy of this string with all | 2520 Receives a string and returns a copy of this string with all |
2566 uppercase letters changed to lowercase. | 2521 uppercase letters changed to lowercase. |
2567 All other characters are left unchanged. | 2522 All other characters are left unchanged. |
2568 | 2523 |
2569 | 2524 |
2570 | 2525 |
2571 | 2526 |
2572 <p> | 2527 <h4 <%=heading_options%> ><a name="String.match"><tt>String.match (s, pattern [, init])</tt></a></h4> |
2573 <hr><h3><a name="pdf-string.match"><code>string.match (s, pattern [, init])</code></a></h3> | 2528 |
2574 Looks for the first <em>match</em> of | 2529 <p> |
2575 <code>pattern</code> (see <a href="#6.4.1">§6.4.1</a>) in the string <code>s</code>. | 2530 Looks for the first <i>match</i> of |
2576 If it finds one, then <code>match</code> returns | 2531 <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) in the string <tt>s</tt>. |
2532 If it finds one, then <tt>match</tt> returns | |
2577 the captures from the pattern; | 2533 the captures from the pattern; |
2578 otherwise it returns <b>nil</b>. | 2534 otherwise it returns <b>nil</b>. |
2579 If <code>pattern</code> specifies no captures, | 2535 If <tt>pattern</tt> specifies no captures, |
2580 then the whole match is returned. | 2536 then the whole match is returned. |
2581 A third, optional numerical argument <code>init</code> specifies | 2537 A third, optional numerical argument <tt>init</tt> specifies |
2582 where to start the search; | 2538 where to start the search; |
2583 its default value is 1 and can be negative. | 2539 its default value is 1 and can be negative. |
2584 | 2540 |
2585 | 2541 |
2586 | 2542 <h4 <%=heading_options%> ><a name="String.matches"><tt>String.matches (s, pattern)</tt></a></h4> |
2587 | 2543 <p> |
2588 <p> | 2544 Returns a boolean indicating whether the entire string <tt>s</tt> matches <tt>pattern</tt>. |
2589 <hr><h3><a name="pdf-string.pack"><code>string.pack (fmt, v1, v2, ···)</code></a></h3> | 2545 |
2590 | 2546 |
2591 | 2547 |
2592 <p> | 2548 <h4 <%=heading_options%> ><a name="String.rep"><tt>String.rep (s, n [, sep])</tt></a></h4> |
2593 Returns a binary string containing the values <code>v1</code>, <code>v2</code>, etc. | 2549 <p> |
2594 packed (that is, serialized in binary form) | 2550 Returns a string that is the concatenation of <tt>n</tt> copies of |
2595 according to the format string <code>fmt</code> (see <a href="#6.4.2">§6.4.2</a>). | 2551 the string <tt>s</tt> separated by the string <tt>sep</tt>. |
2596 | 2552 The default value for <tt>sep</tt> is the empty string |
2597 | |
2598 | |
2599 | |
2600 <p> | |
2601 <hr><h3><a name="pdf-string.packsize"><code>string.packsize (fmt)</code></a></h3> | |
2602 | |
2603 | |
2604 <p> | |
2605 Returns the size of a string resulting from <a href="#pdf-string.pack"><code>string.pack</code></a> | |
2606 with the given format. | |
2607 The format string cannot have the variable-length options | |
2608 '<code>s</code>' or '<code>z</code>' (see <a href="#6.4.2">§6.4.2</a>). | |
2609 | |
2610 | |
2611 | |
2612 | |
2613 <p> | |
2614 <hr><h3><a name="pdf-string.rep"><code>string.rep (s, n [, sep])</code></a></h3> | |
2615 Returns a string that is the concatenation of <code>n</code> copies of | |
2616 the string <code>s</code> separated by the string <code>sep</code>. | |
2617 The default value for <code>sep</code> is the empty string | |
2618 (that is, no separator). | 2553 (that is, no separator). |
2619 Returns the empty string if <code>n</code> is not positive. | 2554 Returns the empty string if <tt>n</tt> is not positive. |
2620 | 2555 |
2621 | 2556 |
2622 | 2557 |
2623 | 2558 |
2624 <p> | 2559 <h4 <%=heading_options%> ><a name="String.reverse"><tt>String.reverse (s)</tt></a></h4> |
2625 <hr><h3><a name="pdf-string.reverse"><code>string.reverse (s)</code></a></h3> | 2560 <p> |
2626 Returns a string that is the string <code>s</code> reversed. | 2561 Returns a string that is the string <tt>s</tt> reversed. |
2627 | 2562 |
2628 | 2563 |
2629 | 2564 |
2630 | 2565 |
2631 <p> | 2566 <h4 <%=heading_options%> ><a name="String.sub"><tt>String.sub (s, i [, j])</tt></a></h4> |
2632 <hr><h3><a name="pdf-string.sub"><code>string.sub (s, i [, j])</code></a></h3> | 2567 |
2633 Returns the substring of <code>s</code> that | 2568 <p> |
2634 starts at <code>i</code> and continues until <code>j</code>; | 2569 Returns the substring of <tt>s</tt> that |
2635 <code>i</code> and <code>j</code> can be negative. | 2570 starts at <tt>i</tt> and continues until <tt>j</tt>; |
2636 If <code>j</code> is absent, then it is assumed to be equal to -1 | 2571 <tt>i</tt> and <tt>j</tt> can be negative. |
2572 If <tt>j</tt> is absent, then it is assumed to be equal to -1 | |
2637 (which is the same as the string length). | 2573 (which is the same as the string length). |
2638 In particular, | 2574 In particular, |
2639 the call <code>string.sub(s,1,j)</code> returns a prefix of <code>s</code> | 2575 the call <tt>string.sub(s,1,j)</tt> returns a prefix of <tt>s</tt> |
2640 with length <code>j</code>, | 2576 with length <tt>j</tt>, |
2641 and <code>string.sub(s, -i)</code> returns a suffix of <code>s</code> | 2577 and <tt>string.sub(s, -i)</tt> returns a suffix of <tt>s</tt> |
2642 with length <code>i</code>. | 2578 with length <tt>i</tt>. |
2643 | 2579 |
2644 | 2580 |
2645 <p> | 2581 <p> |
2646 If, after the translation of negative indices, | 2582 If, after the translation of negative indices, |
2647 <code>i</code> is less than 1, | 2583 <tt>i</tt> is less than 1, |
2648 it is corrected to 1. | 2584 it is corrected to 1. |
2649 If <code>j</code> is greater than the string length, | 2585 If <tt>j</tt> is greater than the string length, |
2650 it is corrected to that length. | 2586 it is corrected to that length. |
2651 If, after these corrections, | 2587 If, after these corrections, |
2652 <code>i</code> is greater than <code>j</code>, | 2588 <tt>i</tt> is greater than <tt>j</tt>, |
2653 the function returns the empty string. | 2589 the function returns the empty string. |
2654 | 2590 |
2655 | 2591 |
2656 | 2592 |
2657 | 2593 <h4 <%=heading_options%> ><a name="String.to_binary"><tt>String.to_binary (s)</tt></a></h4> |
2658 <p> | 2594 |
2659 <hr><h3><a name="pdf-string.unpack"><code>string.unpack (fmt, s [, pos])</code></a></h3> | 2595 <p> |
2660 | 2596 Converts a string to a binary by calling the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()"><tt>String.getBytes</tt></a>. |
2661 | 2597 |
2662 <p> | 2598 |
2663 Returns the values packed in string <code>s</code> (see <a href="#pdf-string.pack"><code>string.pack</code></a>) | 2599 |
2664 according to the format string <code>fmt</code> (see <a href="#6.4.2">§6.4.2</a>). | 2600 <h4 <%=heading_options%> ><a name="String.to_number"><tt>String.to_number (s [, base])</tt></a></h4> |
2665 An optional <code>pos</code> marks where | 2601 |
2666 to start reading in <code>s</code> (default is 1). | 2602 <p> |
2667 After the read values, | 2603 When called with no <tt>base</tt>, |
2668 this function also returns the index of the first unread byte in <code>s</code>. | 2604 <tt>to_number</tt> tries to convert its argument to a number. |
2669 | 2605 If the argument is |
2670 | 2606 a string convertible to a number, |
2671 | 2607 then <tt>to_number</tt> returns this number; |
2672 | 2608 otherwise, it returns <b>nil</b>. |
2673 <p> | 2609 |
2674 <hr><h3><a name="pdf-string.upper"><code>string.upper (s)</code></a></h3> | 2610 The conversion of strings can result in integers or floats. |
2611 | |
2612 | |
2613 <p> | |
2614 When called with <tt>base</tt>, | |
2615 then <tt>s</tt> must be a string to be interpreted as | |
2616 an integer numeral in that base. | |
2617 In bases above 10, the letter '<tt>A</tt>' (in either upper or lower case) | |
2618 represents 10, '<tt>B</tt>' represents 11, and so forth, | |
2619 with '<tt>Z</tt>' representing 35. | |
2620 If the string <tt>s</tt> is not a valid numeral in the given base, | |
2621 the function returns <b>nil</b>. | |
2622 | |
2623 | |
2624 | |
2625 <h4 <%=heading_options%> ><a name="String.trim"><tt>String.trim (s)</tt></a></h4> | |
2626 | |
2627 <p> | |
2628 Removes the leading and trailing whitespace by calling the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#trim()"><tt>String.trim</tt></a>. | |
2629 | |
2630 | |
2631 | |
2632 | |
2633 <h4 <%=heading_options%> ><a name="String.unicode"><tt>String.unicode (s [, i [, j]])</tt></a></h4> | |
2634 | |
2635 <p> | |
2636 Returns the internal numerical codes of the characters <tt>s[i]</tt>, | |
2637 <tt>s[i+1]</tt>, ..., <tt>s[j]</tt>. | |
2638 The default value for <tt>i</tt> is 1; | |
2639 the default value for <tt>j</tt> is <tt>i</tt>. | |
2640 These indices are corrected | |
2641 following the same rules of function <a href="#String.sub"><tt>String.sub</tt></a>. | |
2642 | |
2643 | |
2644 | |
2645 | |
2646 | |
2647 <h4 <%=heading_options%> ><a name="String.upper"><tt>String.upper (s)</tt></a></h4> | |
2648 <p> | |
2675 Receives a string and returns a copy of this string with all | 2649 Receives a string and returns a copy of this string with all |
2676 lowercase letters changed to uppercase. | 2650 lowercase letters changed to uppercase. |
2677 All other characters are left unchanged. | 2651 All other characters are left unchanged. |
2678 The definition of what a lowercase letter is depends on the current locale. | 2652 The definition of what a lowercase letter is depends on the current locale. |
2679 | 2653 |
2680 | |
2681 | |
2682 | |
2683 | |
2684 <h3>6.4.1 – <a name="6.4.1">Patterns</a></h3> | |
2685 | |
2686 <p> | |
2687 Patterns in Lua are described by regular strings, | |
2688 which are interpreted as patterns by the pattern-matching functions | |
2689 <a href="#pdf-string.find"><code>string.find</code></a>, | |
2690 <a href="#pdf-string.gmatch"><code>string.gmatch</code></a>, | |
2691 <a href="#pdf-string.gsub"><code>string.gsub</code></a>, | |
2692 and <a href="#pdf-string.match"><code>string.match</code></a>. | |
2693 This section describes the syntax and the meaning | |
2694 (that is, what they match) of these strings. | |
2695 | |
2696 | |
2697 | |
2698 <h4>Character Class:</h4><p> | |
2699 A <em>character class</em> is used to represent a set of characters. | |
2700 The following combinations are allowed in describing a character class: | |
2701 | |
2702 <ul> | |
2703 | |
2704 <li><b><em>x</em>: </b> | |
2705 (where <em>x</em> is not one of the <em>magic characters</em> | |
2706 <code>^$()%.[]*+-?</code>) | |
2707 represents the character <em>x</em> itself. | |
2708 </li> | |
2709 | |
2710 <li><b><code>.</code>: </b> (a dot) represents all characters.</li> | |
2711 | |
2712 <li><b><code>%a</code>: </b> represents all letters.</li> | |
2713 | |
2714 <li><b><code>%c</code>: </b> represents all control characters.</li> | |
2715 | |
2716 <li><b><code>%d</code>: </b> represents all digits.</li> | |
2717 | |
2718 <li><b><code>%g</code>: </b> represents all printable characters except space.</li> | |
2719 | |
2720 <li><b><code>%l</code>: </b> represents all lowercase letters.</li> | |
2721 | |
2722 <li><b><code>%p</code>: </b> represents all punctuation characters.</li> | |
2723 | |
2724 <li><b><code>%s</code>: </b> represents all space characters.</li> | |
2725 | |
2726 <li><b><code>%u</code>: </b> represents all uppercase letters.</li> | |
2727 | |
2728 <li><b><code>%w</code>: </b> represents all alphanumeric characters.</li> | |
2729 | |
2730 <li><b><code>%x</code>: </b> represents all hexadecimal digits.</li> | |
2731 | |
2732 <li><b><code>%<em>x</em></code>: </b> (where <em>x</em> is any non-alphanumeric character) | |
2733 represents the character <em>x</em>. | |
2734 This is the standard way to escape the magic characters. | |
2735 Any non-alphanumeric character | |
2736 (including all punctuations, even the non-magical) | |
2737 can be preceded by a '<code>%</code>' | |
2738 when used to represent itself in a pattern. | |
2739 </li> | |
2740 | |
2741 <li><b><code>[<em>set</em>]</code>: </b> | |
2742 represents the class which is the union of all | |
2743 characters in <em>set</em>. | |
2744 A range of characters can be specified by | |
2745 separating the end characters of the range, | |
2746 in ascending order, with a '<code>-</code>'. | |
2747 All classes <code>%</code><em>x</em> described above can also be used as | |
2748 components in <em>set</em>. | |
2749 All other characters in <em>set</em> represent themselves. | |
2750 For example, <code>[%w_]</code> (or <code>[_%w]</code>) | |
2751 represents all alphanumeric characters plus the underscore, | |
2752 <code>[0-7]</code> represents the octal digits, | |
2753 and <code>[0-7%l%-]</code> represents the octal digits plus | |
2754 the lowercase letters plus the '<code>-</code>' character. | |
2755 | |
2756 | |
2757 <p> | |
2758 The interaction between ranges and classes is not defined. | |
2759 Therefore, patterns like <code>[%a-z]</code> or <code>[a-%%]</code> | |
2760 have no meaning. | |
2761 </li> | |
2762 | |
2763 <li><b><code>[^<em>set</em>]</code>: </b> | |
2764 represents the complement of <em>set</em>, | |
2765 where <em>set</em> is interpreted as above. | |
2766 </li> | |
2767 | |
2768 </ul><p> | |
2769 For all classes represented by single letters (<code>%a</code>, <code>%c</code>, etc.), | |
2770 the corresponding uppercase letter represents the complement of the class. | |
2771 For instance, <code>%S</code> represents all non-space characters. | |
2772 | |
2773 | |
2774 <p> | |
2775 The definitions of letter, space, and other character groups | |
2776 depend on the current locale. | |
2777 In particular, the class <code>[a-z]</code> may not be equivalent to <code>%l</code>. | |
2778 | |
2779 | |
2780 | |
2781 | |
2782 | |
2783 <h4>Pattern Item:</h4><p> | |
2784 A <em>pattern item</em> can be | |
2785 | |
2786 <ul> | |
2787 | |
2788 <li> | |
2789 a single character class, | |
2790 which matches any single character in the class; | |
2791 </li> | |
2792 | |
2793 <li> | |
2794 a single character class followed by '<code>*</code>', | |
2795 which matches zero or more repetitions of characters in the class. | |
2796 These repetition items will always match the longest possible sequence; | |
2797 </li> | |
2798 | |
2799 <li> | |
2800 a single character class followed by '<code>+</code>', | |
2801 which matches one or more repetitions of characters in the class. | |
2802 These repetition items will always match the longest possible sequence; | |
2803 </li> | |
2804 | |
2805 <li> | |
2806 a single character class followed by '<code>-</code>', | |
2807 which also matches zero or more repetitions of characters in the class. | |
2808 Unlike '<code>*</code>', | |
2809 these repetition items will always match the shortest possible sequence; | |
2810 </li> | |
2811 | |
2812 <li> | |
2813 a single character class followed by '<code>?</code>', | |
2814 which matches zero or one occurrence of a character in the class. | |
2815 It always matches one occurrence if possible; | |
2816 </li> | |
2817 | |
2818 <li> | |
2819 <code>%<em>n</em></code>, for <em>n</em> between 1 and 9; | |
2820 such item matches a substring equal to the <em>n</em>-th captured string | |
2821 (see below); | |
2822 </li> | |
2823 | |
2824 <li> | |
2825 <code>%b<em>xy</em></code>, where <em>x</em> and <em>y</em> are two distinct characters; | |
2826 such item matches strings that start with <em>x</em>, end with <em>y</em>, | |
2827 and where the <em>x</em> and <em>y</em> are <em>balanced</em>. | |
2828 This means that, if one reads the string from left to right, | |
2829 counting <em>+1</em> for an <em>x</em> and <em>-1</em> for a <em>y</em>, | |
2830 the ending <em>y</em> is the first <em>y</em> where the count reaches 0. | |
2831 For instance, the item <code>%b()</code> matches expressions with | |
2832 balanced parentheses. | |
2833 </li> | |
2834 | |
2835 <li> | |
2836 <code>%f[<em>set</em>]</code>, a <em>frontier pattern</em>; | |
2837 such item matches an empty string at any position such that | |
2838 the next character belongs to <em>set</em> | |
2839 and the previous character does not belong to <em>set</em>. | |
2840 The set <em>set</em> is interpreted as previously described. | |
2841 The beginning and the end of the subject are handled as if | |
2842 they were the character '<code>\0</code>'. | |
2843 </li> | |
2844 | |
2845 </ul> | |
2846 | |
2847 | |
2848 | |
2849 | |
2850 <h4>Pattern:</h4><p> | |
2851 A <em>pattern</em> is a sequence of pattern items. | |
2852 A caret '<code>^</code>' at the beginning of a pattern anchors the match at the | |
2853 beginning of the subject string. | |
2854 A '<code>$</code>' at the end of a pattern anchors the match at the | |
2855 end of the subject string. | |
2856 At other positions, | |
2857 '<code>^</code>' and '<code>$</code>' have no special meaning and represent themselves. | |
2858 | |
2859 | |
2860 | |
2861 | |
2862 | |
2863 <h4>Captures:</h4><p> | |
2864 A pattern can contain sub-patterns enclosed in parentheses; | |
2865 they describe <em>captures</em>. | |
2866 When a match succeeds, the substrings of the subject string | |
2867 that match captures are stored (<em>captured</em>) for future use. | |
2868 Captures are numbered according to their left parentheses. | |
2869 For instance, in the pattern <code>"(a*(.)%w(%s*))"</code>, | |
2870 the part of the string matching <code>"a*(.)%w(%s*)"</code> is | |
2871 stored as the first capture (and therefore has number 1); | |
2872 the character matching "<code>.</code>" is captured with number 2, | |
2873 and the part matching "<code>%s*</code>" has number 3. | |
2874 | |
2875 | |
2876 <p> | |
2877 As a special case, the empty capture <code>()</code> captures | |
2878 the current string position (a number). | |
2879 For instance, if we apply the pattern <code>"()aa()"</code> on the | |
2880 string <code>"flaaap"</code>, there will be two captures: 3 and 5. | |
2881 | |
2882 | |
2883 | |
2884 | |
2885 | |
2886 | |
2887 | |
2888 <h3>6.4.2 – <a name="6.4.2">Format Strings for Pack and Unpack</a></h3> | |
2889 | |
2890 <p> | |
2891 The first argument to <a href="#pdf-string.pack"><code>string.pack</code></a>, | |
2892 <a href="#pdf-string.packsize"><code>string.packsize</code></a>, and <a href="#pdf-string.unpack"><code>string.unpack</code></a> | |
2893 is a format string, | |
2894 which describes the layout of the structure being created or read. | |
2895 | |
2896 | |
2897 <p> | |
2898 A format string is a sequence of conversion options. | |
2899 The conversion options are as follows: | |
2900 | |
2901 <ul> | |
2902 <li><b><code><</code>: </b>sets little endian</li> | |
2903 <li><b><code>></code>: </b>sets big endian</li> | |
2904 <li><b><code>=</code>: </b>sets native endian</li> | |
2905 <li><b><code>![<em>n</em>]</code>: </b>sets maximum alignment to <code>n</code> | |
2906 (default is native alignment)</li> | |
2907 <li><b><code>b</code>: </b>a signed byte (<code>char</code>)</li> | |
2908 <li><b><code>B</code>: </b>an unsigned byte (<code>char</code>)</li> | |
2909 <li><b><code>h</code>: </b>a signed <code>short</code> (native size)</li> | |
2910 <li><b><code>H</code>: </b>an unsigned <code>short</code> (native size)</li> | |
2911 <li><b><code>l</code>: </b>a signed <code>long</code> (native size)</li> | |
2912 <li><b><code>L</code>: </b>an unsigned <code>long</code> (native size)</li> | |
2913 <li><b><code>j</code>: </b>a <code>lua_Integer</code></li> | |
2914 <li><b><code>J</code>: </b>a <code>lua_Unsigned</code></li> | |
2915 <li><b><code>T</code>: </b>a <code>size_t</code> (native size)</li> | |
2916 <li><b><code>i[<em>n</em>]</code>: </b>a signed <code>int</code> with <code>n</code> bytes | |
2917 (default is native size)</li> | |
2918 <li><b><code>I[<em>n</em>]</code>: </b>an unsigned <code>int</code> with <code>n</code> bytes | |
2919 (default is native size)</li> | |
2920 <li><b><code>f</code>: </b>a <code>float</code> (native size)</li> | |
2921 <li><b><code>d</code>: </b>a <code>double</code> (native size)</li> | |
2922 <li><b><code>n</code>: </b>a <code>lua_Number</code></li> | |
2923 <li><b><code>c<em>n</em></code>: </b>a fixed-sized string with <code>n</code> bytes</li> | |
2924 <li><b><code>z</code>: </b>a zero-terminated string</li> | |
2925 <li><b><code>s[<em>n</em>]</code>: </b>a string preceded by its length | |
2926 coded as an unsigned integer with <code>n</code> bytes | |
2927 (default is a <code>size_t</code>)</li> | |
2928 <li><b><code>x</code>: </b>one byte of padding</li> | |
2929 <li><b><code>X<em>op</em></code>: </b>an empty item that aligns | |
2930 according to option <code>op</code> | |
2931 (which is otherwise ignored)</li> | |
2932 <li><b>'<code> </code>': </b>(empty space) ignored</li> | |
2933 </ul><p> | |
2934 (A "<code>[<em>n</em>]</code>" means an optional integral numeral.) | |
2935 Except for padding, spaces, and configurations | |
2936 (options "<code>xX <=>!</code>"), | |
2937 each option corresponds to an argument (in <a href="#pdf-string.pack"><code>string.pack</code></a>) | |
2938 or a result (in <a href="#pdf-string.unpack"><code>string.unpack</code></a>). | |
2939 | |
2940 | |
2941 <p> | |
2942 For options "<code>!<em>n</em></code>", "<code>s<em>n</em></code>", "<code>i<em>n</em></code>", and "<code>I<em>n</em></code>", | |
2943 <code>n</code> can be any integer between 1 and 16. | |
2944 All integral options check overflows; | |
2945 <a href="#pdf-string.pack"><code>string.pack</code></a> checks whether the given value fits in the given size; | |
2946 <a href="#pdf-string.unpack"><code>string.unpack</code></a> checks whether the read value fits in a Lua integer. | |
2947 | |
2948 | |
2949 <p> | |
2950 Any format string starts as if prefixed by "<code>!1=</code>", | |
2951 that is, | |
2952 with maximum alignment of 1 (no alignment) | |
2953 and native endianness. | |
2954 | |
2955 | |
2956 <p> | |
2957 Alignment works as follows: | |
2958 For each option, | |
2959 the format gets extra padding until the data starts | |
2960 at an offset that is a multiple of the minimum between the | |
2961 option size and the maximum alignment; | |
2962 this minimum must be a power of 2. | |
2963 Options "<code>c</code>" and "<code>z</code>" are not aligned; | |
2964 option "<code>s</code>" follows the alignment of its starting integer. | |
2965 | |
2966 | |
2967 <p> | |
2968 All padding is filled with zeros by <a href="#pdf-string.pack"><code>string.pack</code></a> | |
2969 (and ignored by <a href="#pdf-string.unpack"><code>string.unpack</code></a>). | |
2970 | |
2971 | |
2972 | |
2973 | |
2974 | |
2975 | |
2976 | |
2977 <h2>6.5 – <a name="6.5">UTF-8 Support</a></h2> | |
2978 | |
2979 <p> | |
2980 This library provides basic support for UTF-8 encoding. | |
2981 It provides all its functions inside the table <a name="pdf-utf8"><code>utf8</code></a>. | |
2982 This library does not provide any support for Unicode other | |
2983 than the handling of the encoding. | |
2984 Any operation that needs the meaning of a character, | |
2985 such as character classification, is outside its scope. | |
2986 | |
2987 | |
2988 <p> | |
2989 Unless stated otherwise, | |
2990 all functions that expect a byte position as a parameter | |
2991 assume that the given position is either the start of a byte sequence | |
2992 or one plus the length of the subject string. | |
2993 As in the string library, | |
2994 negative indices count from the end of the string. | |
2995 | |
2996 | |
2997 <p> | |
2998 <hr><h3><a name="pdf-utf8.char"><code>utf8.char (···)</code></a></h3> | |
2999 Receives zero or more integers, | |
3000 converts each one to its corresponding UTF-8 byte sequence | |
3001 and returns a string with the concatenation of all these sequences. | |
3002 | |
3003 | |
3004 | |
3005 | |
3006 <p> | |
3007 <hr><h3><a name="pdf-utf8.charpattern"><code>utf8.charpattern</code></a></h3> | |
3008 The pattern (a string, not a function) "<code>[\0-\x7F\xC2-\xF4][\x80-\xBF]*</code>" | |
3009 (see <a href="#6.4.1">§6.4.1</a>), | |
3010 which matches exactly one UTF-8 byte sequence, | |
3011 assuming that the subject is a valid UTF-8 string. | |
3012 | |
3013 | |
3014 | |
3015 | |
3016 <p> | |
3017 <hr><h3><a name="pdf-utf8.codes"><code>utf8.codes (s)</code></a></h3> | |
3018 | |
3019 | |
3020 <p> | |
3021 Returns values so that the construction | |
3022 | |
3023 <pre> | |
3024 for p, c in utf8.codes(s) do <em>body</em> end | |
3025 </pre><p> | |
3026 will iterate over all characters in string <code>s</code>, | |
3027 with <code>p</code> being the position (in bytes) and <code>c</code> the code point | |
3028 of each character. | |
3029 It raises an error if it meets any invalid byte sequence. | |
3030 | |
3031 | |
3032 | |
3033 | |
3034 <p> | |
3035 <hr><h3><a name="pdf-utf8.codepoint"><code>utf8.codepoint (s [, i [, j]])</code></a></h3> | |
3036 Returns the codepoints (as integers) from all characters in <code>s</code> | |
3037 that start between byte position <code>i</code> and <code>j</code> (both included). | |
3038 The default for <code>i</code> is 1 and for <code>j</code> is <code>i</code>. | |
3039 It raises an error if it meets any invalid byte sequence. | |
3040 | |
3041 | |
3042 | |
3043 | |
3044 <p> | |
3045 <hr><h3><a name="pdf-utf8.len"><code>utf8.len (s [, i [, j]])</code></a></h3> | |
3046 Returns the number of UTF-8 characters in string <code>s</code> | |
3047 that start between positions <code>i</code> and <code>j</code> (both inclusive). | |
3048 The default for <code>i</code> is 1 and for <code>j</code> is -1. | |
3049 If it finds any invalid byte sequence, | |
3050 returns a false value plus the position of the first invalid byte. | |
3051 | |
3052 | |
3053 | |
3054 | |
3055 <p> | |
3056 <hr><h3><a name="pdf-utf8.offset"><code>utf8.offset (s, n [, i])</code></a></h3> | |
3057 Returns the position (in bytes) where the encoding of the | |
3058 <code>n</code>-th character of <code>s</code> | |
3059 (counting from position <code>i</code>) starts. | |
3060 A negative <code>n</code> gets characters before position <code>i</code>. | |
3061 The default for <code>i</code> is 1 when <code>n</code> is non-negative | |
3062 and <code>#s + 1</code> otherwise, | |
3063 so that <code>utf8.offset(s, -n)</code> gets the offset of the | |
3064 <code>n</code>-th character from the end of the string. | |
3065 If the specified character is neither in the subject | |
3066 nor right after its end, | |
3067 the function returns <b>nil</b>. | |
3068 | |
3069 | |
3070 <p> | |
3071 As a special case, | |
3072 when <code>n</code> is 0 the function returns the start of the encoding | |
3073 of the character that contains the <code>i</code>-th byte of <code>s</code>. | |
3074 | |
3075 | |
3076 <p> | |
3077 This function assumes that <code>s</code> is a valid UTF-8 string. | |
3078 | 2654 |
3079 | 2655 |
3080 | 2656 |
3081 | 2657 |
3082 | 2658 |