MediaWiki的字符串处理扩展程序
“MediaWiki的字符串处理扩展程序”和“MediaWiki的StringFunctions扩展”在这里有相同的下面内容。
--James Qi 2008年12月5日 (六) 23:00 (CST)从大约3年前刚开始安装MediaWiki软件不久就添加上了ParserFunctions扩展,以便进行好些函数运算、条件判断,这个功能还是很有用的,基本上是不能缺少。不过ParserFunctions扩展也还有很多功能不够强大的地方,例如当需要对字符串进行处理的时候,最近在升级改版区号查询、邮编查询的时候因为很多网友进行站内搜索时都用的不规范的关键词所以很需要有强大的字符串处理功能来进行一些处理,所以又在网上去找到了一个StringFunctions扩展。
功能:
This module defines these functions: len, pos, rpos, sub, pad, replace, explode, urlencode, and urldecode.
All of these functions operate in O(n) time complexity, making them safe against DoS attacks.
#len:
The #len function returns the length of the given string. The syntax is:
{{#len:string}}
The return value is always a number of characters in the string. If no string is specified, the return value is zero.
Notes:
- Trailing spaces are not counted. Example:
{{#len:Icecream }}
returns 8. - This function is safe with utf-8 multibyte characters. Example:
{{#len:Žmržlina}}
returns 8. - Tags such as <nowiki> and other tag extensions will always have a length of one, since their content is hidden from the parser. Example:
{{#len:<nowiki>This is a </nowiki>test}}
returns 5.
#pos:
The #pos function returns the position of a given needle within the string. The syntax is:
{{#pos:string|needle|offset}}
The offset parameter, if specified, tells a starting position where this function should begin searching.
If the needle is found, the return value is a zero-based integer of the first position within the string. If the needle is not found, the function returns an empty string.
Notes:
- This function is case sensitive.
- The maximum allowed length of the needle is limited through the $wgStringFunctionsLimitSearch global setting.
- This function is safe with utf-8 multibyte characters. Example:
{{#pos:Žmržlina|lina}}
returns 4. - As with #len, <nowiki> and other tag extensions are treated as having a length of 1 for the purposes of character position. Example:
{{#pos:<nowiki>This is a </nowiki>test|test}}
returns 1.
#rpos:
The #rpos function returns the last position of a given needle within the string. The syntax is:
{{#rpos:string|needle}}
If the needle is found, the return value is a zero-based integer of its last position within the string. If the needle is not found, the function returns -1.
Tip: When using this to search for the last delimiter, add +1 to the result to retrieve position after the last delimiter. This also works when the delimiter is not found, because "-1 + 1" is zero, which is the beginning of the given value.
Notes:
- This function is case sensitive.
- The maximum allowed length of the needle is limited through the $wgStringFunctionsLimitSearch global setting.
- This function is safe with utf-8 multibyte characters. Example:
{{#rpos:Žmržlina|lina}}
returns 4. - As with #len, <nowiki> and other tag extensions are treated as having a length of 1 for the purposes of character position. Example:
{{#rpos:<nowiki>This is a </nowiki>test|test}}
returns 1.
#sub:
The #sub function returns a substring from the given string. The syntax is:
{{#sub:string|start|length}}
The start parameter, if positive (or zero), specifies a zero-based index of the first character to be returned.
Example: {{#sub:Icecream|3}}
returns cream.
If the start parameter is negative, it specifies how many characters from the end should be returned.
Example: {{#sub:Icecream|-3}}
returns eam.
The length parameter, if present and positive, specifies the maximum length of the returned string.
Example: {{#sub:Icecream|3|3}}
returns cre.
If the length parameter is negative, it specifies how many characters will be omitted from the end of the string.
Example: {{#sub:Icecream|3|-3}}
returns cr.
Notes:
- If the length parameter is zero, it is not used for truncation at all.
- Example:
{{#sub:Icecream|3|0}}
returns cream.
- Example:
- If start denotes a position beyond the truncation from the end by negative length parameter, an empty string will be returned.
- Example:
{{#sub:Icecream|3|-6}}
returns an empty string.
- Example:
- This function is safe with utf-8 multibyte characters. Example:
{{#sub:Žmržlina|3}}
returns žlina. - As with #len, <nowiki> and other tag extensions are treated as having a length of 1 for the purposes of character position. Example:
{{#sub:<nowiki>This is a </nowiki>test|1}}
returns test.
#pad:
The #pad function returns the given string extended to a given width. The syntax is:
{{#pad:string|length|padstring|direction}}
The length parameter specifies the desired length of the returned string.
The padstring parameter, if specified, is used to fill the missing space. It may be a single character, which will be used as many times as necessary; or a string which will be concatenated as many times as necessary and then trimmed to the required length.
Example: {{#pad:Ice|10|xX}}
returns xXxXxXxIce.
If the padstring is not specified, spaces are used for padding.
The direction parameter, if specified, can be one of these values:
- left - the padding will be on the left side of the string. Example:
{{#pad:Ice|5|x|left}}
returns xxIce. - right - the padding will be on the right side of the string. Example:
{{#pad:Ice|5|x|right}}
returns Icexx. - center - the string will be centered in the returned string. Example:
{{#pad:Ice|5|x|center}}
returns xIcex.
If the direction is not specified, the padding will be on the left side of the string.
The return value is the given string extended to length characters, using the padstring to fill the missing part(s). If the given string is already longer than length, it is neither extended nor truncated.
Notes:
- The maximum allowed value for the length is limited through the $wgStringFunctionsLimitPad global setting.
- This function is only partially safe with utf-8 multibyte characters. These characters will be treated appropriately if they appear in the original string, but will not be respected if they appear in the padding. Examples:
{{#pad:Zmrzlina|12|z}}
returns zzzzZmrzlina{{#pad:Žmržlina|12|z}}
returns zzzzŽmržlina{{#pad:Žmržlina|12|ž}}
returns žžŽmržlina
- Tags such as <nowiki> and other tag extensions are not permitted in the padding. If the padstring contains such a tag, it will be truncated.
#replace:
The #replace function returns the given string with all occurrences of a needle replaced with product.
{{#replace:string|needle|product}}
If the needle is unspecified or empty, a single space will be searched for.
If the product is unspecified or empty, all occurrences of the needle will be removed from the string.
Notes:
- This function is case sensitive.
- The maximum allowed length of the needle is limited through the $wgStringFunctionsLimitSearch global setting.
- The maximum allowed length of the product is limited through the $wgStringFunctionsLimitReplace global setting.
- Even if the product is a space, an empty string is used. This is a side-effect of the MediaWiki parser. To use a space as the product, put it in nowiki tags.
- Example:
{{#replace:My_little_home_page|_|<nowiki> </nowiki>}}
returns My little home page. - Note that this is the only acceptable use of nowiki in the product, as otherwise nowiki could be used to bypass $wgStringFunctionsLimitReplace, injecting an arbitrarily large number of characters into the output. For this reason, all occurrences of <nowiki> or any other tag extension within the product are replaced with spaces.
- Example:
- This function is safe with utf-8 multibyte characters. Example:
{{#replace:Žmržlina|ž|z}}
returns Žmrzlina.
#explode:
The #explode functions splits the given string into pieces and then returns one of the pieces. The syntax is:
{{#explode:string|delimiter|position}}
The delimiter parameter specifies a string to be used to divide the string into pieces. This delimiter string is then not part of any piece, and when two delimiter strings are next to each other, they create an empty piece between them. If this parameter is not specified, a single space is used.
The position parameter specifies which piece is to be returned. Pieces are counted from 0. If this parameter is not specified, the first piece is used (piece with number 0). When a negative value is used as position, the pieces are counted from the end. In this case, piece number -1 means the last piece. Examples:
-
{{#explode:And if you tolerate this| |2}}
returns you. -
{{#explode:String/Functions/Code|/|-1}}
returns Code.
The return value is the position-th piece. If there are less pieces than the position specifies, an empty string is returned.
Notes:
- This function is case sensitive.
- The maximum allowed length of the delimiter is limited through $wgStringFunctionsLimitSearch global setting.
- This function is safe with utf-8 multibyte characters. Example:
{{#explode:Žmržlina|ž|1}}
returns lina.
#urlencode: and #urldecode:
These two functions operate in tandem: #urlencode converts a string into an URL-safe syntax, and #urldecode converts such a string back. The syntax is:
{{#urlencode:value}} {{#urldecode:value}}
Notes:
- These functions work by directly exposing PHP's urlencode() and urldecode() functions.
- For anchors within a page use {{anchorencode}} instead of {{#urlencode}}. The results of a call to {{anchorencode}} are compatible with intra-page references generated with [[#link]] syntax, while {{#urlencode}}-generated values are not necessarily so.
这个程序不大,很快就可以下载、安装好,我初步试了一下,对中文字符串的处理也不错,应该是个好东西!
标签:MediaWiki、扩展、StringFunctions。 |
相关内容:
|
别名:MediaWiki的字符串处理扩展程序、维客(Wiki)平台的StringFunctions扩展。