Chapter 18 Other extensions
This chapter covers some of the useful MediaWiki extensions that didn't fit in anywhere else in this book. This is far from a comprehensive list.
External Data
No matter how large your wiki is, the vast majority of the world's data will not be contained in it – and that goes even for massive wikis like Wikipedia and Wikidata. Thankfully, any specific information that is contained in a structured way elsewhere can potentially be used and displayed within your wiki, without having to actually add it your wiki, using the External Data extension. External Data provides an easy way to query data stored in various computerized formats. It can then be displayed on wiki pages or, via Cargo or Semantic MediaWiki, stored for usage alongside the wiki's own data.
There are five basic data sources that External Data can query: web pages (directly and via SOAP), databases, LDAP servers, local files, and the output of local executables.
This extension has undergone significant changes over the last two years, and is currently undergoing more: some of its functions, like #get_web_data, are essentially deprecated at this point, and will eventually become truly deprecated, then removed. For that reason, and simply the needs of brevity, this section only covers a relatively small fraction of all of External Data's functionality; to see the entire set, read the External Data documentation at:\
Getting data from the web
External Data can, in theory, retrieve data from any page on the web that holds structured data. It is usually used to retrieve data from a URL-based API, or what's sometimes known as a RESTful API (the "REST" here stands for "Representational State Transfer"); but it can also be used to get data from an online standalone file. If it can read the contents of such a page, it can then retrieve some or all of the values that that page contains, and display them on the wiki.
The parser function #for_external_table is the standard way to get and display a group of values - it's useful even if the “table” of data being retrieved consists of just one “row”. Here is the syntax for such a call, when retrieving data from a standard web page:
{{#for_external_table:display text|url=url|format=format|data=local variable name 1=external variable name 1,local variable name 2=external variable name 2,...|filters=external variable name 1=filter value 1,external variable name 2=filter value 2,...}}
The first, unnamed, parameter holds a string that contains the names of one or more of the “local variables” that were set, with each one surrounded by three curly brackets, among whatever other text you want.
The 'url' parameter is the URL being accessed. It does not have to be accessible to the user viewing the wiki, but it does have to be accessible to the wiki's own server.
The 'format' parameter holds the format that the data is in. The allowed values are 'CSV', 'CSV with header', 'GFF', 'JSON', 'HTML', 'INI', 'XML' and 'YAML'. These represent, as you might guess, the data formats CSV, GFF, JSON, HTML, INI, XML and YAML, respectively.
CSV, JSON and XML you may well have heard of: these are standard formats for representing data. CSV is an old, very simple format. It stands for “comma-separated values”, and that's essentially all it is. The difference between 'CSV' and 'CSV with header' is that, for 'CSV', the data is assumed to start at the very first line, while for 'CSV with header', the first line holds the header information. JSON and XML are formats mostly associated with the web. YAML is another somewhat standard data format. INI indicates a file containing a set of lines that look like key=value. GFF is used only for representing genomics data.
The 'data' parameter defines a series of what could be called mappings. A mapping sets a local variable to have the value of a tag or field in the source document with the given name. So, for instance if the URL being accessed holds XML that contains a section like "<dept>Accounting<dept>", and the #for_external_table call has the parameter "data=department=dept", then a local variable, "department", will get set to the value "Accounting". This will similarly work, for XML, if the relevant value is a tag attribute (instead of tag contents), e.g. something like "<employee dept="Accounting">".
If the data that was retrieved contains more one than row, i.e. it's a table of data, displaying it is slightly more complicated. For that, we use the function #for_external_table, which takes in a string that holds one or more variables, and loop through the values for them, printing out the string for each row. For example, let's say there's a web page holding information on books and their authors in CSV format, and we want to display the entire set of information in a wiki page. We can accomplish that using the following call:
{{#for_external_table:The book ''{{{book name}}}'' was written by {{{author}}}.|url=http://example.com/books_data.csv |format=csv with header |data=book name=title,author=author}}}}
This will print out text in the form:
The book Shadow of Paradise was written by Vicente Aleixandre. The book The Family Moskat was written by Isaac Bashevis Singer. The book The Sovereign Sun was written by Odysseas Elytis.
Within #for_external_table, each field is displayed by putting its name in triple curly brackets; #for_external_table then loops all the way through the arrays of these values from start to finish, printing out the string for each.
Chances are good that you wouldn't want to print out a table of data as a long series of sentences – instead, you'll probably want to display them as a table. That can be done with a call that looks like the following:
{| class="wikitable"! Book! Author {{#for_external_table:<nowiki/>{{!}}-{{!}} {{{book name}}}{{!}} {{{author}}}|}
This will print out the correct wikitext for a table, including header rows.
Getting web data: additional complexity
Handling XML documents can be tricky because their format can involve using the same generic tag or attribute name for different types of data. In the extreme case, you could imagine XML formatted like:
<employee name="Charles Coletti">
<department name="Sales" /><position name="Head of sales" /><employee>
In this case, each value is directly tied to an attribute called "name", so you can't just use the attribute name, as you normally would. Instead, you would need to use the longer chain of tag and attribute names pointing to each value, using a simple query language known as XPath
. You can do that by adding the parameter "use xpath". Here is how you could get the information from the previous example:
{{#for_external_table:* {{{employee}}} - {{{position}}} in {{{department}}}|url=http://example.com/employee_data.xml |format=xml |use xpath|data=employee=/employee/@name, department=/department/@name, position=/position/@name}}
This problem of disambiguation can also occur with JSON data; thankfully, there is a syntax called JSONPath
that does for JSON what XPath does for XML. It is quite a bit less well-known than XPath, but fortunately it too is supported, using the parameter “use jsonpath”. If the example above were in JSON and not XML, the corresponding call might look like this:
{{#for_external_table:* {{{employee}}} - {{{position}}} in {{{department}}}|url=http://example.com/employee_data.json |format=json |use jsonpath|data=employee=$.employee.name, department=$..department.name, position=$..position.name}}
For CSV documents, the naming of fields depends on whether the file has a header row. If it does, i.e. it's of 'CSV with header' format, then each column gets the name assigned to it at the top; otherwise the names of the columns are just the sequential numbers 1, 2, 3 etc. For the basic 'CSV' format, a mapping parameter could look like "|data=department=4".
Any call to #for_external_table and the rest needs at least one value for the "data" parameter to be worthwhile; after all, some value needs to be set. By contrast, the "filters" parameter is optional. When used, the filters filter down the set of values in the accessed page by keeping only the row or rows of data that have the specified value(s) for the specified field name(s). For instance, if the file contains information about every employee, having a parameter like "filters=name=Irving Ivanov" will return only the row (if there is one) where the "name" field is set to "Irving Ivanov".
This filtering system leaves a lot to be desired – there's no way to match on substrings, for instance, or to use inequality operators for number fields. But in practice, that type of more complex filtering isn't often needed, because the URLs being accessed are often part of a web-based API, where necessary filtering can often be done via the URL itself. Here's an example of what such a call to an API could look like:
{{#for_external_table:{{{population}}}|url=http://example.com/country_data_api?country=Morocco |format=json |data=population=Population}}
Secret keys and whitelists
Some APIs require a “key”, placed within the URL, which serves as an identifier in order to prevent public abuse of the API. In some cases, this key is meant to be a secret; when that happens, it wouldn't work to place the full URL of the API directly within the wiki page. For that reason, External Data also allows the use of secret strings, whose real value is defined within LocalSettings.php. So if you want to access an API whose URLs are in the form “http://worlddata.com/api?country=Guatemala&key=123abc”, you can add the following to your LocalSettings.php file, after the inclusion of External Data:
$wgExternalDataSources['worlddata.com']['replacements'] = [ 'WORLDDATA_KEY' => '123abc' ];
After that, you can instead place within the wiki URLs like:
http://worlddata.com/api?country=Guatemala&key=WORLDDATA_KEY
...and the replacement will happen behind the scenes.
You may be wondering why “worlddata.com” is located in this line. That's because there is a possible counterattack that a malicious user could do to find out the value of a secret string: put somewhere in the wiki a call to #for_external_table or one of the others, pointing to a URL in a domain that that user controls, which also contains the replacement string. Then the user just has to check that domain's server logs to find out the true value. Placing “worlddata.com” within this line tells External Data to only send the secret value to URLs within this one domain, making the key impossible to crack.
Getting data from a database
Data can also be retrieved from databases. External Data access most major database systems, including MySQL, PostgreSQL, Microsoft SQLServer, Oracle, SQLite and the non-SQL MongoDB. The process for each of them is the same. First, the login information to access any particular database has to be added to LocalSettings.php (after the inclusion of External Data), in a format like this one:
$wgExternalDataSources['database ID'] = ['server' => "server URL",'type' => "DB type",'name' => "DB name",'user' => "username",'password' => "password"];
Here the string “database ID” is an arbitrary ID for the database; it can be any string. You can set information for as many databases as you want to in LocalSettings.php, with a different ID for each.
For the database systems SQLite and MongoDB, the group of settings is slightly different, and for SQLServer, some additional software may be required; see the extension homepage for details.
A call to, say, #for_external_table then takes the following form:
{{#for_external_table:display syntax|source=database ID |from=table name |where=filters |data=mappings}}
The idea is the same as for accessing web data, though the parameters are different. The source parameter holds the database ID, which is defined in LocalSettings.php. The next two parameters are based on elements of the “SELECT” statement in SQL, if you're familiar with that. The from parameter holds the database table name, or a join of tables if there's more than one; it's the equivalent of a SELECT statement's FROM clause. The where parameter holds the filters, or the set of conditions that we are restricting results on, with each filter separated by " AND "; this is equivalent to the SELECT statement's WHERE clause. Just as with web data retrieval, the data parameter holds the set of fields we want to retrieve, along with the mapping of a local variable for each field. It's similar to a SELECT statement's SELECT clause, though not identical.
Here's an example of a #for_external_table call for a database – this one retrieves some information about a room in a building whose name is the same as the current page name (it retrieves, in this case, a table with just one row):
{{#for_external_table:The building {{{building name}}} has capacity {{{capacity}}}, and status {{{status}}}.|source=myDB|from=rooms r JOIN room_status rs ON r.room_status_id = rs.id|where=r.name={{PAGENAME}}|data=capacity=building name=r.building, r.capacity, status=rs.name}}
Note that table aliases (here, "r" and "rs") can be used, just as with a SQL statement.
There are some additional parameters that can be passed in for a database retrieval:
- limit= – adds a “LIMIT” clause to the SELECT statement, setting the number of values to be returned.
- order by= – adds an “ORDER BY” clause to the SELECT statement, setting the order of results.
- group by= – adds a “GROUP BY” clause to the SELECT statement, grouping results by the values for a field.
Other display functions
Besides #for_external_table, there are two other important display functions that External Data defines: #display_external_table and #format_external_table. #display_external_table takes in the parameter “template=”, and passes all the data variables in to the specified template, which is used to display each row of data. #format_external_table takes in the parameter “display format=”, and passes in the entire table of data to the specified display template from the Cargo extension (see chapter 16); Cargo must be installed for #format_external_table to work.
CategoryTree
The CategoryTree extension adds the ability to drill down into large category trees from within a single category page – it's used on Wikipedia for that purpose. Earlier there was an image of the default listing of subcategories in a category page; but Figure 18 shows that same display, with some of the subcategories “opened up” to show their own subcategories.
CategoryTree also lets you put, on any page, a similar collapsible, hierarchical list of categories, but this one also listing the pages that each category holds.
The CategoryTree homepage is at:
DynamicPageList
There are two (formerly three!) extensions referred to as DynamicPageList, or DPL. The first was created in 2005; in 2007 the code was forked into two versions, when the decision was made to use the code on the Wikinews
site. This required removing some of the DPL functionality, for both security and performance reasons, which led to the spinoff. The version intended for use by Wikimedia sites gained the alternate name “Intersection
”, while the other, “third-party” version got the alternate name “DPL2” (though it ironically was more like the original code). In 2015, the third version, “DynamicPageList3”, was created; its functionality is the same as DPL2, but it has a rewritten code base that is supposed to run much faster. In 2020, DPL2 became unmaintained.
Here are the homepages for the two still-working extensions:
They are fairly similar in their functionality.
In theory, DynamicPageList (both versions) works somewhat similarly to Cargo and Semantic MediaWiki, in that it lets you query pages and display the results in different ways. The difference is that DPL doesn't store any data of its own; it uses existing data like categories, namespaces, and – if the FlaggedRevs
extension is installed – various FlaggedRevs metrics of page quality, to generate lists and galleries of pages.
Most notably, DPL is used to find intersections of categories, as well as to display all the pages in one category. For instance, to show a sorted list of pages about 18th-century composers, you might have the following:
<DynamicPageList>category = Composerscategory = 18th century birthsorder = ascending</DynamicPageList>
(This specific call would work with all versions of DPL. DPL3 also defines the <DPL> and #dpl functions, which have some additional features that the <DynamicPageList> tag does not support.)
UrlGetParameters
UrlGetParameters is an extension that can read and display the values of parameters from the URL query string. For instance, if the URL for a wiki page contains “&abc=def”, you could have the following call on the page:
{{#urlget:abc}}
...and it would display “def”. It also works for array values, so that, if there's a string like “vegetables[1]=cauliflower” in the query string, the call {{#urlget:vegetables[1]}} will display the right thing (“cauliflower”).
UrlGetParameters isn't usually necessary, but at times it can be very helpful, because it lets wiki pages pass information to one another. For instance, you could create a page that functions as a “wizard
”, guiding users through a list of choices and offering them advice based on their inputs. The user's inputs can be embedded in the query string via either links (the #fullurl parser function could help), or a mini-form (the Widgets extension could be used for that – see here). Then, the resulting page, which could be the same page or a different one, could use #urlget, in conjunction with #if or #switch from ParserFunctions, to display the right text.
The #urlget function can also be used in conjunction with Page Forms, especially with PF's “run query” option (here), which similarly can be used to display a wizard.
For more information about UrlGetParameters, see:
Ratings extensions
People have become used to being able to provide quick feedback on content on the web, whether it's “liking” a post, giving a thumbs up or thumbs down to a comment, rating a movie, etc. So not surprisingly, tools exist for doing the same within wiki pages. Such ratings extensions are always somewhat awkward in a wiki, though, because the content can change at any time. What if a page is poorly written, and you give it a low rating, and then the next day someone rewrites it and makes it a lot better? You might be able to change your rating at that point, but until you do, your rating is a meaningless metric that's just confusing.
(To be sure, in many cases blog and social networking posts can be rewritten too, making votes on such pages awkward as well – but wholesale rewriting of a post is rarely done.)
There's the added awkwardness where it may not be totally clear whether users are supposed to rate the page, or the topic that the page applies to – like in a wiki page about a restaurant. This is the same confusion that can affect commenting functionality on wiki pages (see here).
Nonetheless, there are a few ratings and article feedback extensions – though not as many as there used to be. Here are two still-maintained ones:
- VoteNY (https://www.mediawiki.org/wiki/Extension:VoteNY) – lets users vote on pages, then shows the highest-rated pages elsewhere. This is part of the SocialProfile family of extensions; see here.
- Semantic Rating (https://www.mediawiki.org/wiki/Extension:Semantic_Rating) – allows for displaying numbers (which can also be Semantic MediaWiki or Cargo query results – see chapter 16) as star ratings.
Additionally, the Page Forms extension includes a “rating” input type, which can be used to let users enter an individual rating via a form – see here.
Page Schemas
Page Schemas is an extension that aims to solve a general problem with Cargo, Semantic MediaWiki and Page Forms, which is that maintaining data structures can be difficult. Creating a new data structure is pretty easy: Page Forms offers various tools for automatically creating data structure pages – the most popular tool being Special:CreateClass – but once you've created a data structure, you're on your own; there's no way to edit an existing form or template to, say, add one field; other than manually editing the page source. This is easier with Cargo than with SMW, but it's a challenge in either case.
With Page Schemas, you use a single “master” form to define the data structure; that form in turn creates a single piece of XML (stored on the category page), which can then be used to automatically generate all those other pages. You can then keep re-editing that schema, using the same helper form, and then re-generating all the corresponding pages. Page Schemas can be used with either Cargo or SMW.
You can read more about it at:
Figure 18 shows an image of the interface provided by Page Schemas for editing a schema.
SocialProfile
Social networking in the enterprise is big business, with lots of companies wanting to add replicas of Facebook and the like to their internal network. Social networking can in theory encompass a lot of different features: messaging, user profiles, user groups, blogging, microblogging/”status updates”, and so on. There are some that MediaWiki already has natively; talk pages are arguably one example.
The SocialProfile extension adds a variety of additional social networking features, and it has roughly 25 spinoff extensions that add even more. Not all of them are well-maintained, but most seem to be.
Here are the main features of the core SocialProfile extension:
- User profile – a wizard lets users easily create a detailed user profile, including uploading an avatar image that is then used in discussions.
- Public and private messaging – users can write both private messages to one another, and public messages on a shared “user board”.
- Friending/”foeing” – users can publicly specify the other users that they know.
- User status – users can set their current status, and users' status history is preserved, allowing for Twitter-style microblogging.
- Rewards system – you can assign points to different actions, like editing a page and friending someone, and then set ranks that users are publicly given when they reach a certain number of points.
And here are some of the additional SocialProfile-based extensions:
- BlogPage – lets users create (non-wiki-page) blog posts.
- PollNY – lets users create polls.
- QuizGame – lets users create quizzes.
- VoteNY – lets users vote on articles.
- SiteMetrics – shows administrators various metrics related to usage of SocialProfile tools.
You can read more about SocialProfile, its current status, and all its related extensions here: