Working with MediaWiki
2nd ed., HTML version

18 Other extensions

This chapter covers some of the useful MediaWiki extensions that didn't fit in anywhere else in this book. This is far from a comprehensive list.

External Data

We live in a world with an enormous and ever-growing amount of data, contained in lots of different sources: databases, spreadsheets, APIs, offline files and more. Sadly, most of it will never be entered into any wiki. But it can still be displayed and used within wikis, via the External Data extension. External Data provides an easy way to query data stored in various computerized formats. It can then be displayed on wiki pages or, via Cargo or Semantic MediaWiki, stored for usage alongside the wiki's own data.
There are four basic data sources that External Data can query, handled by four different parser functions: #get_web_data, #get_db_data, #get_ldap_data and #get_file_data. Let's go through each one in turn.

Getting data from the web

#get_web_data is used to retrieve data from any page on the web that holds structured data. It is usually used to retrieve data from a URL-based API, or what's sometimes known as a RESTful API (the "REST" here stands for "Representational State Transfer"); but it can also be used to get data from a standalone file. If it can read the contents of such a page, it can then retrieve some or all of the values that that page contains, and display them on the wiki.
#get_web_data is called in the following way:
{{#get_web_data:url=url|format=format
|data=local variable name 1=external variable name 1,local variable name 2=external variable name 2,...
|filters=external variable name 1=filter value 1,external variable name 2=filter value 2,...}}
The 'url' parameter is the URL being accessed. It does not have to be accessible to the user viewing the wiki, but it does have to be accessible to the wiki's own server.
The 'format' parameter holds the format that the data is in. The allowed values are  'CSV', 'CSV with header',  'GFF',  'JSON' and 'XML'. These represent, as you might guess, the data formats CSV, GFF, JSON and XML, respectively.
CSV, JSON and XML you may well have heard of: these are standard formats for representing data. CSV is an old, very simple format. It stands for "comma-separated values", and that's essentially all it is. The difference between 'CSV' and 'CSV with header' is that, for 'CSV', the data is assumed to start at the very first line, while for 'CSV with header', the first line holds the header information. JSON and XML are formats mostly associated with the web. GFF is used only for representing genomics data.
The "data" parameter defines a series of what could be called mappings. A mapping sets a local variable to have the value of a tag or field in the source document with the given name. So, for instance if the URL being accessed holds XML that contains a section like "<dept>Accounting<dept>", and the #get_web_data call has the parameter "data=department=dept", then a local variable, "department", will get set to the value "Accounting". This will similarly work, for XML, if the relevant value is a tag attribute (instead of tag contents), e.g. something like "<employee dept="Accounting">".
Handling XML documents can be tricky because their format can involve using the same generic tag or attribute name for different types of data. In the extreme case, you could imagine XML formatted like:
<employee name="Charles Coletti">
<employee>
In this case, each value is directly tied to an attribute called "name", so you can't just use the attribute name, as you normally would with #get_web_data. Instead, you would need to use the longer chain of tag and attribute names pointing to each value, using a simple query language known as XPath. You can do that in #get_web_data, by adding the parameter "use xpath". Here is how you could get the information from the previous example:
{{#get_web_data:url=http://example.com/employee_data.xml |format=xml |use xpath |data=employee=/employee/@name, department=/department/@name, position=/position/@name}}
This problem of disambiguation can also occur with JSON data, and in fact there's a syntax called JSONPath that does for JSON what XPath does for XML, but unfortunately External Data doesn't support it.
For CSV documents, the naming of fields depends on whether the file has a header row. If it does, i.e. it's of 'CSV with header' format, then each column gets the name assigned to it at the top; otherwise the names of the columns are just the sequential numbers 1, 2, 3 etc. For the basic 'CSV' format, a mapping parameter could look like "|data=department=4".
A #get_web_data call needs at least one value for the "data" parameter to be worthwhile; after all, some value needs to be set. By contrast, the "filters" parameter is optional. When used, the filters filter down the set of values in the accessed page by keeping only the row or rows of data that have the specified value(s) for the specified field name(s). For instance, if the file contains information about every employee, having a parameter like "filters=name=Irving Ivanov" will return only the row (if there is one) where the "name" field is set to "Irving Ivanov".
This filtering system leaves a lot to be desired – there's no way to match on substrings, for instance, or to use inequality operators for number fields. But in practice, that type of more complex filtering isn't often needed, because the URLs being accessed are often part of a web-based API, where necessary filtering can often be done via the URL itself. Here's an example of what such a call to an API could look like:
{{#get_web_data:
url=http://example.com/country_data_api?country=Morocco |format=json |data=population=Population}}

Displaying and storing values

Once we have our local variables set, the next step is to display, or otherwise use them. How that's done depends on whether there is one, or more than one, value for each variable. In the simple case, we have one value that's been retrieved for each field. In that case, the parser function #external_value is used to display it on the screen. After the previous call to #get_web_data, for instance, the wikitext could contain the following:
The population of Morocco is {{#external_value: population}}.
Assuming the "population" field was correctly retrieved before, this will insert a number into the text that's displayed.
If you want to also store the value so that it can be queried alongside the wiki's native data, that's easy to do. If you're using Cargo, you need to do this within an infobox template - though really that's where the data storage should always be happening anyway. Within the template's #cargo_store call, add a parameter like:
|Population={{#external_value: population}}
Storing external data this way should be done with caution, though, since if the underlying data gets changed, the wiki's value for it won't be updated until and unless it gets manually refreshed in some way.

Displaying and storing a table of data

If the data that was retrieved contains more one than row, i.e. it's a table of data, displaying it is slightly more complicated. For that, we use the function #for_external_table, which takes in a string that holds one or more variables, and loop through the values for them, printing out the string for each row. For example, let's say there's a web page holding information on books and their authors in CSV format, and we want to display the entire set of information in a wiki page. We can accomplish that using the following two calls:
{{#get_web_data:url=http://example.com/books_data.csv |format=csv with header |data=book name=title,author=author}}
{{#for_external_table:The book ''{{{book name}}}'' was written by {{{author}}}. }}
This will print out text in the form:
The book Shadow of Paradise was written by Vicente Aleixandre. The book The Family Moskat was written by Isaac Bashevis Singer. The book The Sovereign Sun was written by Odysseas Elytis.
Within #for_external_table, each field is displayed by putting its name in triple curly brackets; #for_external_table then loops all the way through the arrays of these values from start to finish, printing out the string for each.
Interestingly, there's no reference in the #for_external_table call to the #get_web_data query that created the arrays of these two values – they could have even come from two different #get_web_data calls. In general, though, it's assumed that a call to #for_external_table will handle the values retrieved from a single #get_web_data call, and that all the arrays will hold the same number of rows. If the two arrays are of different sizes – i.e. if there are more book rows than author rows, or vice versa – then you'll probably get some strangely-formatted results.
Chances are good that you wouldn't want to print out a table of data as a long series of sentences – instead, you'll probably want to display them as a table. That can be done with a call that looks like the following:
{| class="wikitable"
! Book
! Author {{#for_external_table:<nowikfi/>
{{!}}-
{{!}} {{{book name}}}
{{!}} {{{author}}}
|}
This will print out the correct wikitext for a table, including header rows. Note the odd-looking text "{{!}}", which displays a pipe – if you are using MediaWiki 1.23 or lower, you will have to create a template to get this to work; see here.
There's one other interesting feature of #for_external_table, which is that it lets you URL-encode specific values, by calling them with {{{field-name.urlencode}}} instead of just {{{field-name}}}. For instance, if you wanted to show links to Google searches on a set of terms retrieved, you could call:
{{#for_external_table: http://google.com/search?q={{{ term.urlencode}}} }}
Another option for displaying the table of values is #display_external_table, which lets you pass all the values to a template; the template is called once for each row of data.
It's also possible to store the table of data, using either Cargo or Semantic MediaWiki – you just need to call #display_external_table, and put the relevant storage tags (from either extension) into the template.

Secret keys and whitelists

Some APIs require a "key", placed within the URL, which serves as an identifier in order to prevent public abuse of the API. In some cases, this key is meant to be a secret; when that happens, it wouldn't work to place the full URL of the API directly within the wiki page. For that reason, External Data also allows the use of secret strings, whose real value is defined within LocalSettings.php. So if you want to access an API whose URLs are in the form "http://worlddata.com/api?country=Guatemala&key=123abc", you can add the following to your LocalSettings.php file, after the inclusion of External Data:
$edgStringReplacements['WORLDDATA_KEY'] = '123abc';
After that, you can instead place within the wiki URLs like:
http://worlddata.com/api?country=Guatemala&key=WORLDDATA_KEY
...and the replacement will happen behind the scenes.
If you're a security-minded individual, you may have already thought ahead to a possible counterattack that a malicious user could do to find out the value of a secret string: put somewhere in the wiki a call to #get_web_data, pointing to a URL in a domain that that user controls, that also contains the replacement string. Then the user just has to check that domain's server logs to find out the true value. Thankfully, a defense exists for this: you can create a "whitelist" of domains for External Data, so that only URLs contained within that list can get accessed by External Data.
To create a whitelist with multiple domains, you should add something like the following to LocalSettings.php, after the inclusion of External Data:
$edgAllowExternalDataFrom = array('http://example.org', 'http://example2.com');
And if the whitelist has only one domain, you can just have something like this:
$edgAllowExternalDataFrom = 'http://example.org';

Getting data from a database

Data can also be retrieved from databases, using the call #get_db_data. #get_db_data can access most major database systems, including MySQL, PostgreSQL, Microsoft SQLServer, Oracle, SQLite and the non-SQL MongoDB. The process for each of them is the same. First, the login information to access any particular database has to be added to LocalSettings.php (after the inclusion of External Data), in a format like this one:
$edgDBServer['database ID'] = "server name";
$edgDBServerType['database ID'] = "DB type";
$edgDBName['database ID'] = "DB name";
$edgDBUser['database ID'] = "username";
$edgDBPass['database ID'] = "password";
Here the string "database ID" is an arbitrary ID for the database; it can be any string. You can set information for as many databases as you want to in LocalSettings.php, with a different ID for each.
For the database systems SQLite and MongoDB, the group of settings is slightly different, and for SQLServer, some additional software may be required; see the extension homepage for details.
The call to #get_db_data then takes the following form:
{{#get_db_data:db=database ID |from=table name |where=filters |data=mappings}}
The idea is the same as for #get_web_data, though the parameters are different. The db parameter holds the database ID, which is defined in LocalSettings.php. The next two parameters are based on elements of the "SELECT" statement in SQL, if you're familiar with that. The from parameter holds the database table name, or a join of tables if there's more than one; it's the equivalent of a SELECT statement's FROM clause. The where parameter holds the filters, or the set of conditions that we are restricting results on, with each filter separated by " AND "; this is equivalent to the SELECT statement's WHERE clause. Just as with #get_web_data, the data parameter holds the set of fields we want to retrieve, along with the mapping of a local variable for each field. It's similar to a SELECT statement's SELECT clause, though not identical.
Here's an example of a #get_db_data call – this one retrieves some information about a room in a building whose name is the same as the current page name:
{{#get_db_data:db=myDB
|from=rooms r JOIN room_status rs ON r.room_status_id = rs.id
|where=r.name={{PAGENAME}}
|data=capacity=r.capacity, building name=r.building, status=rs.name
}}
Note that table aliases (here, "r" and "rs") can be used, just as with a SQL statement.
There are some additional parameters that can be passed in to #get_db_data:
If you're accessing MongoDB, the syntax could be different: there are some restrictions, like that you can't include "OR" in the "where=" parameter, and you can set queries directly using the parameter "find query=", in place of "from=" and "where=". See the documentation for more details.
Once the data is retrieved, it can be displayed and stored using #external_value, or, if the data is an array, using #for_external_table and #display_external_table – once the data is set to local variables, it's indistinguishable from data retrieved by #get_web_data.

Getting data from an LDAP server

You can also get data from an LDAP server, if your organization has one, in a similar manner to how data is extracted from databases. As with getting data from a database, you first need to set the connection details in LocalSettings.php, then query the data using #get_ldap_data. We won't get into the details here, since it's a less frequently-used feature, but you can see the full syntax and examples on the External Data homepage.
As you would expect, #external_value, #for_external_table and #display_external_table can then all be called on the local values that have been set as a result.

Accessing offline data

What about offline data – data that's not available via the network, or perhaps is not even on a computer? The External Data extension doesn't have any special power to bend the laws of physics, but it does offer a utility that makes it possible to put such data online with a minimum of fuss: the page "Special:GetData". To use it, you first need to get this data into CSV format, i.e. rows of comma-separated values, with a single header row defining the name of each "column". This may or may not be a challenge, depending on the nature of the data and what form it's currently in, but CSV is, at the very least, a data format that's easy to create.
Once the CSV is created, it should be put into a page in the wiki – any name will work for the page. As an example, you could have a collection of information about a company's employees, and put it into a wiki page called 'Employee CSV data'. The page's contents might look like this:
<noinclude>
This page holds information about Acme Corp's employees.
</noinclude><includeonly>
Employee name,Department,Position,Phone Number
Alice Adams,Accounting,Accountant,5-1234
Bob Benson,Sales,IT administrator,5-2345
...
</includeonly>
The "<noinclude>" and "<includeonly>" tags are optional – they let you set a nice explanatory display when users view the page. If those tags aren't included, the entire page will be parsed as CSV, while if they are included, the "<noinclude>" sections will get ignored.
The page Special:GetData then serves as a wrapper around that data, providing a sort of "mini-API" for accessing its content. A typical URL for 'GetData' would look like this:
http://example.com/wiki/Special:GetData/Employee_CSV_data?Employee %20name=Alice%20Adams
The name of the page with the CSV data is placed after "Special:GetData", with a slash between them. Then, in the URL's query string (i.e., after the '?'), values can be set for different column names to filter down the set of values. In this case, Special:GetData will return all the rows of the page 'Employee CSV data' that have a value of 'Alice Adams' for the column named 'Employee name'.
That URL can then be passed in to a call to #get_web_data, just like any other API URL. Note that if you're using #get_web_data to query a 'GetData' page, you can apply filtering in either place: either within the API, or as a #get_web_data filter. Barring any reason to use one versus the other, it's recommended to do the filtering within the URL itself, since that's slightly faster.

Caching data

You can configure External Data to cache the data contained in the URLs that it accesses, both to speed up retrieval of values and to reduce the load on the system whose data is being accessed. To do this, run the SQL contained in the extension file "ExternalData.sql" in your database, which will create the table "ed_url_cache", then add the following to your LocalSettings.php file, after the inclusion of External Data:
$edgCacheTable = 'ed_url_cache';
You should also add a line like the following, to set the expiration time of the cache, in seconds; this example line will cache the data for a week:
$edgCacheExpireTime = 7 * 24 * 60 * 60;

CategoryTree

The CategoryTree extension adds the ability to drill down into large category trees from within a single category page – it's used on Wikipedia for that purpose. Earlier there was an image of the default listing of subcategories in a category page; but Figure 18.1 shows that same display, with some of the subcategories "opened up" to show their own subcategories.
Category page subcategories with drilldown.png
Figure 18.1 Hierarchical display of subcategories, enabled by CategoryTree extension
CategoryTree also lets you put, on any page, a similar collapsible, hierarchical list of categories, but this one also listing the pages that each category holds.
The CategoryTree homepage is at:
https://www.mediawiki.org/wiki/Extension:CategoryTree

DynamicPageList

There are bizarrely three extensions referred to as DynamicPageList, or DPL. The first was created in 2005; in 2007 the code was forked into two versions, when the decision was made to use the code on the Wikinews site. This required removing some of the DPL functionality, for both security and performance reasons, which led to the spinoff. The version intended for use by Wikimedia sites gained the alternate name "Intersection", while the other, "third-party" version got the alternate name "DPL2" (though it ironically was more like the original code). Then in 2015, the third version, "DynamicPageList3", was created; its functionality is the same as DPL2, but it has a rewritten code base that is supposed to run much faster. Here are the homepages for the three:
https://www.mediawiki.org/wiki/Extension:DynamicPageList_(Wikimedia)
https://www.mediawiki.org/wiki/Extension:DynamicPageList_(third-party)
https://www.mediawiki.org/wiki/Extension:DynamicPageList3
They are all fairly similar in their functionality.
In theory, DynamicPageList (all versions) works somewhat similarly to Cargo and Semantic MediaWiki, in that it lets you query pages and display the results in different ways. The difference is that DPL doesn't store any data of its own; it uses existing data like categories, namespaces, and – if the FlaggedRevs extension is installed – various FlaggedRevs metrics of page quality, to generate lists and galleries of pages.
Most notably, DPL is used to find intersections of categories, as well as to display all the pages in one category. For instance, to show a sorted list of pages about 18th-century composers, you might have the following:
<DynamicPageList>
category = Composers
category = 18th century births
order = ascending
</DynamicPageList>
(This specific call would work with all versions of DPL. The non-Wikimedia versions, i.e. DPL2 and DPL3, also support the <DPL> and #dpl functions, which have some additional features that the <DynamicPageList> tag does not support.)

UrlGetParameters

UrlGetParameters is an extension that can read and display the values of parameters from the URL query string. For instance, if the URL for a wiki page contains "&abc=def", you could have the following call on the page:
{{#urlget:abc}}
...and it would display "def". It also works for array values, so that, if there's a string like "vegetables[1]=cauliflower" in the query string, the call {{#urlget:vegetables[1]}} will display the right thing ("cauliflower").
UrlGetParameters isn't usually necessary, but at times it can be very helpful, because it lets wiki pages pass information to one another. For instance, you could create a page that functions as a "wizard", guiding users through a list of choices and offering them advice based on their inputs. The user's inputs can be embedded in the query string via either links (the #fullurl parser function could help), or a mini-form (the Widgets extension could be used for that – see here). Then, the resulting page, which could be the same page or a different one, could use #urlget, in conjunction with #if or #switch from ParserFunctions, to display the right text.
The #urlget function can also be used in conjunction with Page Forms, especially with PF's "run query" option (see here), which similarly can be used to display a wizard.
For more information about UrlGetParameters, see:
https://www.mediawiki.org/wiki/Extension:UrlGetParameters

Ratings extensions

People have become used to being able to provide quick feedback on content on the web, whether it's "liking" a post, giving a thumbs up or thumbs down to a comment, rating a movie, etc. So not surprisingly, tools exist for doing the same within wiki pages. Such ratings extensions are always somewhat awkward in a wiki, though, because the content can change at any time. What if a page is poorly written, and you give it a low rating, and then the next day someone rewrites it and makes it a lot better? You might be able to change your rating at that point, but until you do, your rating is a meaningless metric that's just confusing.
(To be sure, in many cases blog and social networking posts can be rewritten too, making votes on such pages awkward as well – but wholesale rewriting of a post is rarely done.)
There's the added awkwardness where it may not be totally clear whether users are supposed to rate the page, or the topic that the page applies to – like in a wiki page about a restaurant. This is the same confusion that can affect commenting functionality on wiki pages (see here).
Nonetheless, there are a few ratings and article feedback extensions -- though not as many as there used to be. Here are two still-maintained ones:
Additionally, the Page Forms extension includes a "rating" input type, which can be used to let users enter an individual rating via a form – see here.

Page Schemas

Page Schemas is an extension that aims to solve a general problem with Cargo, Semantic MediaWiki and Page Forms, which is that maintaining data structures can be difficult. Creating a new data structure is pretty easy: Page Forms offers various tools for automatically creating data structure pages – the most popular tool being Special:CreateClass – but once you've created a data structure, you're on your own; there's no way to edit an existing form or template to, say, add one field; other than manually editing the page source. This is easier with Cargo than with SMW, but it's a challenge in either case.
With Page Schemas, you use a single "master" form to define the data structure; that form in turn creates a single piece of XML (stored on the category page), which can then be used to automatically generate all those other pages. You can then keep re-editing that schema, using the same helper form, and then re-generating all the corresponding pages. Page Schemas can be used with either Cargo or SMW.
You can read more about it at:
https://www.mediawiki.org/wiki/Extension:Page_Schemas
Figure 18.2 shows an image of the interface provided by Page Schemas for editing a schema.
Page Schemas interface.png
Figure 18.2 Page Schemas "Edit schema" screen
Page Schemas makes it easy to modify the data structure, although it should be noted that, if you modify a data structure which is already in use, you may have to separately modify all the pages that use it. So, for instance, if you change the name of a template (and form) field, and there are pages that call this template and its field, they will all need to be changed as well. The easiest way to do that is via the Replace Text extension – see here.

SocialProfile

Social networking in the enterprise is big business, with lots of companies wanting to add replicas of Facebook and the like to their internal network. Social networking can in theory encompass a lot of different features: messaging, user profiles, user groups, blogging, microblogging/"status updates", and so on. There are some that MediaWiki already has natively; talk pages are arguably one example.
The SocialProfile extension adds a variety of additional social networking features, and it has roughly 25 spinoff extensions that add even more. Not all of them are well-maintained, but most seem to be.
Here are the main features of the core SocialProfile extension:
And here are some of the additional SocialProfile-based extensions:
You can read more about SocialProfile, its current status, and all its related extensions here:
https://www.mediawiki.org/wiki/Extension:SocialProfile