Mercurial Hosting > luan
comparison website/src/manual.html.luan @ 555:e25ba7a2e816
some String documentation and fixes
author | Franklin Schmidt <fschmidt@gmail.com> |
---|---|
date | Fri, 19 Jun 2015 04:29:06 -0600 |
parents | b1256e2d19a3 |
children | d02f43598ba3 |
comparison
equal
deleted
inserted
replaced
554:18504c41b0be | 555:e25ba7a2e816 |
---|---|
89 <a href="#libs">Standard Libraries</a> | 89 <a href="#libs">Standard Libraries</a> |
90 <ul> | 90 <ul> |
91 <li><a href="#default_lib">Default Environment</a></li> | 91 <li><a href="#default_lib">Default Environment</a></li> |
92 <li><a href="#luan_lib">Basic Functions</a></li> | 92 <li><a href="#luan_lib">Basic Functions</a></li> |
93 <li><a href="#package_lib">Modules</a></li> | 93 <li><a href="#package_lib">Modules</a></li> |
94 <li><a href="#string_lib">String Manipulation</a></li> | |
94 </ul> | 95 </ul> |
95 </div> | 96 </div> |
96 | 97 |
97 <hr/> | 98 <hr/> |
98 | 99 |
2344 | 2345 |
2345 | 2346 |
2346 | 2347 |
2347 | 2348 |
2348 | 2349 |
2349 <h2>6.4 – <a name="6.4">String Manipulation</a></h2> | 2350 <h3 <%=heading_options%> ><a name="string_lib">String Manipulation</a></h3> |
2351 | |
2352 <p> | |
2353 Include this library by: | |
2354 | |
2355 <p><tt><pre> | |
2356 local String = require "luan:String" | |
2357 </pre></tt></p> | |
2350 | 2358 |
2351 <p> | 2359 <p> |
2352 This library provides generic functions for string manipulation, | 2360 This library provides generic functions for string manipulation, |
2353 such as finding and extracting substrings, and pattern matching. | 2361 such as finding and extracting substrings, and pattern matching. |
2354 When indexing a string in Lua, the first character is at position 1 | 2362 When indexing a string in Luan, the first character is at position 1 |
2355 (not at 0, as in C). | 2363 (not at 0, as in Java). |
2356 Indices are allowed to be negative and are interpreted as indexing backwards, | 2364 Indices are allowed to be negative and are interpreted as indexing backwards, |
2357 from the end of the string. | 2365 from the end of the string. |
2358 Thus, the last character is at position -1, and so on. | 2366 Thus, the last character is at position -1, and so on. |
2359 | 2367 |
2360 | 2368 |
2361 <p> | |
2362 The string library provides all its functions inside the table | |
2363 <a name="pdf-string"><code>string</code></a>. | |
2364 It also sets a metatable for strings | |
2365 where the <code>__index</code> field points to the <code>string</code> table. | |
2366 Therefore, you can use the string functions in object-oriented style. | |
2367 For instance, <code>string.byte(s,i)</code> | |
2368 can be written as <code>s:byte(i)</code>. | |
2369 | |
2370 | |
2371 <p> | |
2372 The string library assumes one-byte character encodings. | |
2373 | 2369 |
2374 | 2370 |
2375 <p> | 2371 <p> |
2376 <hr><h3><a name="pdf-string.byte"><code>string.byte (s [, i [, j]])</code></a></h3> | 2372 <hr><h3><a name="pdf-string.byte"><code>string.byte (s [, i [, j]])</code></a></h3> |
2377 Returns the internal numerical codes of the characters <code>s[i]</code>, | 2373 Returns the internal numerical codes of the characters <code>s[i]</code>, |
2385 <p> | 2381 <p> |
2386 Numerical codes are not necessarily portable across platforms. | 2382 Numerical codes are not necessarily portable across platforms. |
2387 | 2383 |
2388 | 2384 |
2389 | 2385 |
2390 | 2386 <h4 <%=heading_options%> ><a name="String.char"><tt>String.char (···)</tt></a></h4> |
2391 <p> | 2387 |
2392 <hr><h3><a name="pdf-string.char"><code>string.char (···)</code></a></h3> | 2388 <p> |
2393 Receives zero or more integers. | 2389 Receives zero or more integers. |
2394 Returns a string with length equal to the number of arguments, | 2390 Returns a string with length equal to the number of arguments, |
2395 in which each character has the internal numerical code equal | 2391 in which each character has the internal numerical code equal |
2396 to its corresponding argument. | 2392 to its corresponding argument. |
2397 | 2393 |
2398 | 2394 |
2399 <p> | 2395 <h4 <%=heading_options%> ><a name="String.concat"><tt>String.concat (···)</tt></a></h4> |
2400 Numerical codes are not necessarily portable across platforms. | 2396 |
2401 | 2397 <p> |
2402 | 2398 Concatenates the <a href="#Luan.to_string"><tt>to_string</tt></a> value of all arguments. |
2403 | 2399 |
2404 | 2400 |
2405 <p> | 2401 |
2406 <hr><h3><a name="pdf-string.dump"><code>string.dump (function [, strip])</code></a></h3> | 2402 <h4 <%=heading_options%> ><a name="String.encode"><tt>String.encode (s)</tt></a></h4> |
2407 | 2403 |
2408 | 2404 <p> |
2409 <p> | 2405 Encodes argument <tt>s</tt> into a string that can be placed in quotes so as to return the original value of the string. |
2410 Returns a string containing a binary representation | 2406 |
2411 (a <em>binary chunk</em>) | 2407 |
2412 of the given function, | 2408 |
2413 so that a later <a href="#pdf-load"><code>load</code></a> on this string returns | 2409 |
2414 a copy of the function (but with new upvalues). | 2410 <h4 <%=heading_options%> ><a name="String.find"><tt>String.find (s, pattern [, init [, plain]])</tt></a></h4> |
2415 If <code>strip</code> is a true value, | |
2416 the binary representation is created without debug information | |
2417 about the function | |
2418 (local variable names, lines, etc.). | |
2419 | |
2420 | |
2421 <p> | |
2422 Functions with upvalues have only their number of upvalues saved. | |
2423 When (re)loaded, | |
2424 those upvalues receive fresh instances containing <b>nil</b>. | |
2425 (You can use the debug library to serialize | |
2426 and reload the upvalues of a function | |
2427 in a way adequate to your needs.) | |
2428 | |
2429 | |
2430 | |
2431 | |
2432 <p> | |
2433 <hr><h3><a name="pdf-string.find"><code>string.find (s, pattern [, init [, plain]])</code></a></h3> | |
2434 | |
2435 | 2411 |
2436 <p> | 2412 <p> |
2437 Looks for the first match of | 2413 Looks for the first match of |
2438 <code>pattern</code> (see <a href="#6.4.1">§6.4.1</a>) in the string <code>s</code>. | 2414 <tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) in the string <tt>s</tt>. |
2439 If it finds a match, then <code>find</code> returns the indices of <code>s</code> | 2415 If it finds a match, then <tt>find</tt> returns the indices of <tt>s</tt> |
2440 where this occurrence starts and ends; | 2416 where this occurrence starts and ends; |
2441 otherwise, it returns <b>nil</b>. | 2417 otherwise, it returns <b>nil</b>. |
2442 A third, optional numerical argument <code>init</code> specifies | 2418 A third, optional numerical argument <tt>init</tt> specifies |
2443 where to start the search; | 2419 where to start the search; |
2444 its default value is 1 and can be negative. | 2420 its default value is 1 and can be negative. |
2445 A value of <b>true</b> as a fourth, optional argument <code>plain</code> | 2421 A value of <b>true</b> as a fourth, optional argument <tt>plain</tt> |
2446 turns off the pattern matching facilities, | 2422 turns off the pattern matching facilities, |
2447 so the function does a plain "find substring" operation, | 2423 so the function does a plain "find substring" operation, |
2448 with no characters in <code>pattern</code> being considered magic. | 2424 with no characters in <tt>pattern</tt> being considered magic. |
2449 Note that if <code>plain</code> is given, then <code>init</code> must be given as well. | 2425 Note that if <tt>plain</tt> is given, then <tt>init</tt> must be given as well. |
2450 | |
2451 | 2426 |
2452 <p> | 2427 <p> |
2453 If the pattern has captures, | 2428 If the pattern has captures, |
2454 then in a successful match | 2429 then in a successful match |
2455 the captured values are also returned, | 2430 the captured values are also returned, |
2456 after the two indices. | 2431 after the two indices. |
2457 | 2432 |
2458 | 2433 |
2459 | 2434 |
2460 | 2435 |
2461 <p> | 2436 <h4 <%=heading_options%> ><a name="String.format"><tt>String.format (formatstring, ···)</tt></a></h4> |
2462 <hr><h3><a name="pdf-string.format"><code>string.format (formatstring, ···)</code></a></h3> | |
2463 | 2437 |
2464 | 2438 |
2465 <p> | 2439 <p> |
2466 Returns a formatted version of its variable number of arguments | 2440 Returns a formatted version of its variable number of arguments |
2467 following the description given in its first argument (which must be a string). | 2441 following the description given in its first argument (which must be a string). |
2468 The format string follows the same rules as the ISO C function <code>sprintf</code>. | 2442 The format string follows the same rules as the Java function <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#format(java.lang.String,%20java.lang.Object...)"><tt>String.format</tt></a> because Luan calls this internally. |
2469 The only differences are that the options/modifiers | 2443 |
2470 <code>*</code>, <code>h</code>, <code>L</code>, <code>l</code>, <code>n</code>, | 2444 <p> |
2471 and <code>p</code> are not supported | 2445 Note that Java's <tt>String.format</tt> is too stupid to convert between ints and floats, so you must provide the right kind of number. |
2472 and that there is an extra option, <code>q</code>. | 2446 |
2473 The <code>q</code> option formats a string between double quotes, | 2447 |
2474 using escape sequences when necessary to ensure that | 2448 |
2475 it can safely be read back by the Lua interpreter. | 2449 <h4 <%=heading_options%> ><a name="String.gmatch"><tt>String.gmatch (s, pattern)</tt></a></h4> |
2476 For instance, the call | 2450 |
2477 | 2451 <p> |
2478 <pre> | |
2479 string.format('%q', 'a string with "quotes" and \n new line') | |
2480 </pre><p> | |
2481 may produce the string: | |
2482 | |
2483 <pre> | |
2484 "a string with \"quotes\" and \ | |
2485 new line" | |
2486 </pre> | |
2487 | |
2488 <p> | |
2489 Options | |
2490 <code>A</code> and <code>a</code> (when available), | |
2491 <code>E</code>, <code>e</code>, <code>f</code>, | |
2492 <code>G</code>, and <code>g</code> all expect a number as argument. | |
2493 Options <code>c</code>, <code>d</code>, | |
2494 <code>i</code>, <code>o</code>, <code>u</code>, <code>X</code>, and <code>x</code> | |
2495 expect an integer. | |
2496 Option <code>q</code> expects a string; | |
2497 option <code>s</code> expects a string without embedded zeros. | |
2498 If the argument to option <code>s</code> is not a string, | |
2499 it is converted to one following the same rules of <a href="#pdf-tostring"><code>tostring</code></a>. | |
2500 | |
2501 | |
2502 | |
2503 | |
2504 <p> | |
2505 <hr><h3><a name="pdf-string.gmatch"><code>string.gmatch (s, pattern)</code></a></h3> | |
2506 Returns an iterator function that, | 2452 Returns an iterator function that, |
2507 each time it is called, | 2453 each time it is called, |
2508 returns the next captures from <code>pattern</code> (see <a href="#6.4.1">§6.4.1</a>) | 2454 returns the next captures from <tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) |
2509 over the string <code>s</code>. | 2455 over the string <tt>s</tt>. |
2510 If <code>pattern</code> specifies no captures, | 2456 If <tt>pattern</tt> specifies no captures, |
2511 then the whole match is produced in each call. | 2457 then the whole match is produced in each call. |
2512 | 2458 |
2513 | 2459 |
2514 <p> | 2460 <p> |
2515 As an example, the following loop | 2461 As an example, the following loop |
2516 will iterate over all the words from string <code>s</code>, | 2462 will iterate over all the words from string <tt>s</tt>, |
2517 printing one per line: | 2463 printing one per line: |
2518 | 2464 |
2519 <pre> | 2465 <p><tt><pre> |
2520 s = "hello world from Lua" | 2466 local s = "hello world from Lua" |
2521 for w in string.gmatch(s, "%a+") do | 2467 for w in String.gmatch(s, [[\w+]]) do |
2522 print(w) | 2468 print(w) |
2523 end | 2469 end |
2524 </pre><p> | 2470 </pre></tt></p> |
2525 The next example collects all pairs <code>key=value</code> from the | 2471 |
2472 <p> | |
2473 The next example collects all pairs <tt>key=value</tt> from the | |
2526 given string into a table: | 2474 given string into a table: |
2527 | 2475 |
2528 <pre> | 2476 <p><tt><pre> |
2529 t = {} | 2477 local t = {} |
2530 s = "from=world, to=Lua" | 2478 local s = "from=world, to=Lua" |
2531 for k, v in string.gmatch(s, "(%w+)=(%w+)") do | 2479 for k, v in String.gmatch(s, [[(\w+)=(\w+)]]) do |
2532 t[k] = v | 2480 t[k] = v |
2533 end | 2481 end |
2534 </pre> | 2482 </pre></tt></p> |
2535 | 2483 |
2536 <p> | 2484 <p> |
2537 For this function, a caret '<code>^</code>' at the start of a pattern does not | 2485 For this function, a caret '<tt>^</tt>' at the start of a pattern does not |
2538 work as an anchor, as this would prevent the iteration. | 2486 work as an anchor, as this would prevent the iteration. |
2539 | 2487 |
2540 | 2488 |
2541 | 2489 |
2542 | 2490 <h4 <%=heading_options%> ><a name="String.gsub"><tt>String.gsub (s, pattern, repl [, n])</tt></a></h4> |
2543 <p> | 2491 |
2544 <hr><h3><a name="pdf-string.gsub"><code>string.gsub (s, pattern, repl [, n])</code></a></h3> | 2492 <p> |
2545 Returns a copy of <code>s</code> | 2493 Returns a copy of <tt>s</tt> |
2546 in which all (or the first <code>n</code>, if given) | 2494 in which all (or the first <tt>n</tt>, if given) |
2547 occurrences of the <code>pattern</code> (see <a href="#6.4.1">§6.4.1</a>) have been | 2495 occurrences of the <tt>pattern</tt> (see <a href="#6.4.1">§6.4.1</a>) have been |
2548 replaced by a replacement string specified by <code>repl</code>, | 2496 replaced by a replacement string specified by <tt>repl</tt>, |
2549 which can be a string, a table, or a function. | 2497 which can be a string, a table, or a function. |
2550 <code>gsub</code> also returns, as its second value, | 2498 <tt>gsub</tt> also returns, as its second value, |
2551 the total number of matches that occurred. | 2499 the total number of matches that occurred. |
2552 The name <code>gsub</code> comes from <em>Global SUBstitution</em>. | 2500 The name <tt>gsub</tt> comes from <i>Global SUBstitution</i>. |
2553 | 2501 |
2554 | 2502 |
2555 <p> | 2503 <p> |
2556 If <code>repl</code> is a string, then its value is used for replacement. | 2504 If <tt>repl</tt> is a string, then its value is used for replacement. |
2557 The character <code>%</code> works as an escape character: | 2505 The character <tt>\</tt> works as an escape character. |
2558 any sequence in <code>repl</code> of the form <code>%<em>d</em></code>, | 2506 Any sequence in <tt>repl</tt> of the form <tt>$<i>d</i></tt>, |
2559 with <em>d</em> between 1 and 9, | 2507 with <i>d</i> between 1 and 9, |
2560 stands for the value of the <em>d</em>-th captured substring. | 2508 stands for the value of the <i>d</i>-th captured substring. |
2561 The sequence <code>%0</code> stands for the whole match. | 2509 The sequence <tt>$0</tt> stands for the whole match. |
2562 The sequence <code>%%</code> stands for a single <code>%</code>. | 2510 |
2563 | 2511 |
2564 | 2512 <p> |
2565 <p> | 2513 If <tt>repl</tt> is a table, then the table is queried for every match, |
2566 If <code>repl</code> is a table, then the table is queried for every match, | |
2567 using the first capture as the key. | 2514 using the first capture as the key. |
2568 | 2515 |
2569 | 2516 |
2570 <p> | 2517 <p> |
2571 If <code>repl</code> is a function, then this function is called every time a | 2518 If <tt>repl</tt> is a function, then this function is called every time a |
2572 match occurs, with all captured substrings passed as arguments, | 2519 match occurs, with all captured substrings passed as arguments, |
2573 in order. | 2520 in order. |
2574 | 2521 |
2575 | 2522 |
2576 <p> | 2523 <p> |
2579 then it behaves as if the whole pattern was inside a capture. | 2526 then it behaves as if the whole pattern was inside a capture. |
2580 | 2527 |
2581 | 2528 |
2582 <p> | 2529 <p> |
2583 If the value returned by the table query or by the function call | 2530 If the value returned by the table query or by the function call |
2584 is a string or a number, | 2531 is not <b>nil</b>, |
2585 then it is used as the replacement string; | 2532 then it is used as the replacement string; |
2586 otherwise, if it is <b>false</b> or <b>nil</b>, | 2533 otherwise, if it is <b>nil</b>, |
2587 then there is no replacement | 2534 then there is no replacement |
2588 (that is, the original match is kept in the string). | 2535 (that is, the original match is kept in the string). |
2589 | 2536 |
2590 | 2537 |
2591 <p> | 2538 <p> |
2592 Here are some examples: | 2539 Here are some examples: |
2593 | 2540 |
2594 <pre> | 2541 <p><tt><pre> |
2595 x = string.gsub("hello world", "(%w+)", "%1 %1") | 2542 x = String.gsub("hello world", [[(\w+)]], "$1 $1") |
2596 --> x="hello hello world world" | 2543 --> x="hello hello world world" |
2597 | 2544 |
2598 x = string.gsub("hello world", "%w+", "%0 %0", 1) | 2545 x = String.gsub("hello world", [[\w+]], "$0 $0", 1) |
2599 --> x="hello hello world" | 2546 --> x="hello hello world" |
2600 | 2547 |
2601 x = string.gsub("hello world from Lua", "(%w+)%s*(%w+)", "%2 %1") | 2548 x = String.gsub("hello world from Luan", [[(\w+)\s*(\w+)]], "$2 $1") |
2602 --> x="world hello Lua from" | 2549 --> x="world hello Luan from" |
2603 | 2550 |
2604 x = string.gsub("home = $HOME, user = $USER", "%$(%w+)", os.getenv) | 2551 x = String.gsub("4+5 = $return 4+5$", [[\$(.*?)\$]], function (s) |
2605 --> x="home = /home/roberto, user = roberto" | |
2606 | |
2607 x = string.gsub("4+5 = $return 4+5$", "%$(.-)%$", function (s) | |
2608 return load(s)() | 2552 return load(s)() |
2609 end) | 2553 end) |
2610 --> x="4+5 = 9" | 2554 --> x="4+5 = 9" |
2611 | 2555 |
2612 local t = {name="lua", version="5.3"} | 2556 local t = {name="lua", version="5.3"} |
2613 x = string.gsub("$name-$version.tar.gz", "%$(%w+)", t) | 2557 x = String.gsub("$name-$version.tar.gz", [[\$(\w+)]], t) |
2614 --> x="lua-5.3.tar.gz" | 2558 --> x="lua-5.3.tar.gz" |
2615 </pre> | 2559 </pre></tt></p> |
2616 | 2560 |
2617 | 2561 |
2618 | 2562 |
2619 <p> | 2563 <h4 <%=heading_options%> ><a name="String.lower"><tt>String.lower (s)</tt></a></h4> |
2620 <hr><h3><a name="pdf-string.len"><code>string.len (s)</code></a></h3> | 2564 <p> |
2621 Receives a string and returns its length. | |
2622 The empty string <code>""</code> has length 0. | |
2623 Embedded zeros are counted, | |
2624 so <code>"a\000bc\000"</code> has length 5. | |
2625 | |
2626 | |
2627 | |
2628 | |
2629 <p> | |
2630 <hr><h3><a name="pdf-string.lower"><code>string.lower (s)</code></a></h3> | |
2631 Receives a string and returns a copy of this string with all | 2565 Receives a string and returns a copy of this string with all |
2632 uppercase letters changed to lowercase. | 2566 uppercase letters changed to lowercase. |
2633 All other characters are left unchanged. | 2567 All other characters are left unchanged. |
2634 The definition of what an uppercase letter is depends on the current locale. | |
2635 | 2568 |
2636 | 2569 |
2637 | 2570 |
2638 | 2571 |
2639 <p> | 2572 <p> |