Extending Peertube aims to extend PeerTube to support the availability, accessibility, and discoverability of large-scale public media collections on the next generation internet.
As a national audiovisual heritage institute, we share thousands of collection items as open video content, which is free and open for everyone to reuse. In the past we’ve developed a custom solution to distribute this content, according to these open principles: openimages.eu
Since, our custom solution has aged, while we see that sister organisations and partners in our international network have similar requirements to what we seek: open source technology, state-of-the-art video distribution and play-out and the ability to self host.
In this project we extend the open source video platform PeerTube to fit our use case, while sharing our code and insight upstream, for other users with a similar use case to also benefit from our efforts. All this to contribute to a vibrant open video sharing ecosystem.
All this fits with our long term vision to federate multiple large scale open video collections to build an international community to share and reuse our audiovisual heritage on the next generation internet.
Although PeerTube is technically capable of supporting the distribution of large public media collections, the platform currently lacks practical examples and extensive documentation to achieve this in a timely and cost-efficient way. This project will function as a proof-of-concept that will showcase several compelling improvements to the PeerTube software by:
Since the original aim of PeerTube was to provide an alternative solution to YouTube as the dominant video sharing platform, it is optimized for the single user who wants to periodically upload their user generated content and share this as a single video within their own channel. Since our use case involves migrating multiple thousands of videos to a PeerTube instance, we worked on a toolkit and instructions to programmatically import large numbers of videos from a legacy video platform to PeerTube:
https://beeldengeluid.github.io/extending-peertube/category/migration.html
This also included instructions on implementing URL redirection in PeerTube, after migrating from a large video streaming platform:
https://beeldengeluid.github.io/extending-peertube/category/url-redirection.html
In the cultural heritage domain, the standard for free and open sharing of resources on the web has been Creative Commons licenses for over a decade now. Although PeerTube already implemented an option for their users to attribute a ‘license’ to their contributions to a PeerTube instance, this solution wasn’t (fully) compliant with Creative Commons.
As Sound and Vision values transparent and legally sound communication about the extent to which the collection items it shares on the web can be shared and reused by the public, it took upon itself the task of developing a Creative Commons compliant licensing plugin for PeerTube.
This plugin extends PeerTube with the needed user interface elements to select and display the licenses, but also inserts the correct micro formats into the video item page, making the license also machine readable:
https://beeldengeluid.github.io/extending-peertube/category/cc-plugin.html
The final strand of work in the project looked at improving the accessibility and discoverability of videos within large collections hosted on a PeerTube instance. It provided a guide on using the PeerTube API to - at scale - add subtitles to an instance, and also suggested ways for existing video content to be enriched with subtitles, using open source Automatic Speech Recognition software:
https://beeldengeluid.github.io/extending-peertube/category/subtitles.html
We documented our progress here on this website and published our tools at:
https://github.com/beeldengeluid/extending-peertube/
Sound and Vision runs a self-hosted PeerTube instance at:
https://peertube.beeldengeluid.nl
The Creative Commons plugin for PeerTube is maintained in a separate repository at:
https://github.com/beeldengeluid/peertube-plugin-creative-commons
and published on NPM:
https://www.npmjs.com/package/peertube-plugin-creative-commons
]]>The reverse proxy will receive client HTTP requests, and:
- Proxy them to the API server
- Serve requested static files (Video files, stylesheets, javascript, fonts…)
So we extend this with another functionality:
Nginx keeps it’s configuration in the expected /etc/nginx
directory. This directory is broken up as can be seen on the Debian wiki page.
As we can see from the description of this directory structure, the default nginx.conf file includes a line that will load additional configurations files into the http { } context from /etc/nginx/conf.d/
. This is where we drop in the global mapping directive as a file called redirect.conf
, although any other filename will work the same:
map_hash_max_size 8192;
map_hash_bucket_size 128;
map $request_uri $old_id {
~^/media/([0-9]+) $1;
}
map $old_id $new_id {
include /etc/nginx/snippets/rewritemap.conf;
}
In this file we include the configuration snippet file rewritemap.conf
from the directory /etc/nginx/snippets/
. This is the big file with the old ids and new ids that looks like this (first 10 lines):
1215870 35d6b543-d830-4678-a7f1-fda30c0ec95d;
1193100 4d76aaa4-a991-444d-a282-42e2cebf5912;
1216245 cebe1695-a289-4698-8a95-1efaaf9f13fd;
1216823 ec131ab0-429a-444c-b262-18dd6a1a57b8;
1216986 6196ddec-f7b7-454e-89e6-9d29740db259;
1216173 7e8b8069-4897-4db1-875c-199a62b2e279;
1215985 4b28ce84-7a33-46d5-964c-e315518e48ba;
1216848 b889fa83-d860-4d57-9e23-de76f2fdc688;
1196886 bccaa3b0-3baf-4ece-9afa-e0347a66dc0d;
1196956 a94c4000-1ee1-4bf6-bac2-40e5e05029f3;
Now there are 2 ways to integrate the redirects in PeerTube, depending on if you want to redirect URLs from an old domain name to the new one of your PeerTube instance, or if you keep the old domain name.
If you migrate from an old domain name, you can create a seperate configuration file for that specific server_name when you point this domain name to the same IP as the PeerTube instance by changing it’s DNS record. This file can be dropped into /etc/nginx/conf.d/
as well to have it automatically loaded or you could use the usual /etc/nginx/sites-available/
and /etc/nginx/sites-enabled/
structure for defining virtual hosts.
In our example we could name the file openbeelden.conf
to refer to the old domain name openbeelden.nl
.
server {
listen 80;
listen [::]:80;
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name openbeelden.nl;
if ($new_id) {
return 301 https://peertube.beeldengeluid.nl/videos/watch/$new_id;
}
return 301 https://peertube.beeldengeluid.nl$request_uri;
}
As you can see from the last line, we will also redirect old URLs to our new domain that didn’t match a video web page, including the part that was requested with $request_uri. This means that other URLs like https://www.openbeelden.nl/about will be redirected to https://peertube.beeldengeluid.nl/about if it exists or not. If you don’t want this to result in possible 404 “Not found” responses, you could leave out the $request_uri part and let the user land on your PeerTube home page.
In the case that you keep your old domain name, which will be the same domain name as for your PeerTube server, then you will have to edit the existing /etc/nginx/sites-available/peertube
file that was created during the install of PeerTube in the part about Webserver in the production guide. By editing this file you can add the if
statement, just after the ‘Application’ comment.
server {
# ...
##
# Application
##
if ($new_id) {
return 301 /videos/watch/$new_id;
}
# ...
}
We have tested this second setup with our own instance where we haven’t migrated from an old domain name. With the example we have used before, you can see how you get properly redirected by following this link:
https://peertube.beeldengeluid.nl/media/1000452
It shows the same video with the new PeerTube URL as on the page where we have imported the video from:
https://www.openbeelden.nl/media/1000452
Either way, after implementing this added Nginx configuration, you only have to reload Nginx to activate your URL-redirection solution:
$ sudo systemctl reload nginx
If you want to implement your own URL redirection for Nginx, we have added all files as examples/templates to our Github repository in our ‘tools’ folder, which can be found at:
https://github.com/beeldengeluid/extending-peertube/tree/main/tools/nginx
]]>For small websites with only a few pages, simple if conditional statements can be used for redirects and similar things. However, such a configuration is not easy to maintain or extend in the long run as the list of conditions grows longer.
The map module is a core Nginx module, which means it doesn’t need to be installed separately. It allows you to compare Nginx variable values against a list of conditions and then associate a new value with the variable depending on the match.
To create the necessary map and redirect configuration, we adapt a Nginx configuration file (for instance /etc/nginx/sites-available/default
):
. . .
# Default server configuration
#
# Old website redirect map
#
map $uri $new_uri {
/old.html /index.html;
}
server {
listen 80 default_server;
listen [::]:80 default_server;
# Old website redirect
if ($new_uri) {
rewrite ^ $new_uri permanent;
}
. . .
The section before the server
block is a new map
block, which defines the mapping between the old URLs and the new ones using the map module. The section inside the server
block is the redirect itself.
In our example where we want to redirect our video web page from:
https://www.openbeelden.nl/media/1000452
to our new PeerTube web page:
https://peertube.beeldengeluid.nl/videos/watch/b0fd82cf-68b7-447d-a5fe-ed5706cd991a
you could map this with:
map $uri $new_uri {
/media/1000452 /videos/watch/b0fd82cf-68b7-447d-a5fe-ed5706cd991a;
}
However this becomes a bit unwieldy when you have thousands of redirect like this. A clean way is to define all the redirects in a separate ‘snippet’ file /etc/nginx/snippets/rewritemap.conf
and include this as follows:
map $uri $new_uri {
include /etc/nginx/snippets/rewritemap.conf;
}
As we map redirects that always start with /media
+ old_id to one that starts with /videos/watch
+ new_id, further way to optimize this is to just map old_id to new_id and match the request with a seperate map block:
map $request_uri $old_id {
~^/media/([0-9]+) $1;
}
map $old_id $new_id {
include /etc/nginx/snippets/rewritemap.conf;
}
with the included snippet that looks like:
1215870 35d6b543-d830-4678-a7f1-fda30c0ec95d;
1193100 4d76aaa4-a991-444d-a282-42e2cebf5912;
1216245 cebe1695-a289-4698-8a95-1efaaf9f13fd;
1216823 ec131ab0-429a-444c-b262-18dd6a1a57b8;
1216986 6196ddec-f7b7-454e-89e6-9d29740db259;
1216173 7e8b8069-4897-4db1-875c-199a62b2e279;
1215985 4b28ce84-7a33-46d5-964c-e315518e48ba;
1216848 b889fa83-d860-4d57-9e23-de76f2fdc688;
1196886 bccaa3b0-3baf-4ece-9afa-e0347a66dc0d;
1196956 a94c4000-1ee1-4bf6-bac2-40e5e05029f3;
and the redirect that is in the server block that checks if the variable $new_id
has an assigned value:
server {
if ($new_id) {
return 301 /videos/watch/$new_id;
}
}
Another reason for keeping the map directive’s key-value pairs (included via the snippet file) as small as possible is that Nginx uses hash tables to quickly process static sets of data.
When dealing with large sets of data (in our case thousands of id mappings) Nginx will throw this warning on start:
nginx: [warn] could not build optimal map_hash, you should increase either map_hash_max_size: 2048 or map_hash_bucket_size: 64; ignoring map_hash_bucket_size
As stated in the Nginx documentation:
During the start and each re-configuration nginx selects the minimum possible sizes of hash tables such that the bucket size that stores keys with identical hash values does not exceed the configured parameter (hash bucket size). The size of a table is expressed in buckets. The adjustment is continued until the table size exceeds the hash max size parameter.
This means we have to set the map_hash_max_size
and map_hash_bucket_size
values before our mapping directive. The problem is that apart from the size of our data the processor also impacts what the optimal values are, which means we have to tweak these values and test them as it’s practically impossible to estimate them beforehand.
The general recommendation would be to keep both values as small as possible.
In our case with an id mapping set of 7200 items, we first increased the map_size and then tested our Ngnix reconfiguration with:
sudo nginx -t
We kept increasing it until there were no more Nginx warnings, but that resulted in a very big number, so then we doubled the bucket_size to 128 and then started again with the max_size by increasing it with a factor of 2 from the original 1024. This resulted in the following mapping directive after testing:
map_hash_max_size 8192;
map_hash_bucket_size 128;
map $request_uri $old_id {
~^/media/([0-9]+) $1;
}
map $old_id $new_id {
include /etc/nginx/snippets/rewritemap.conf;
}
In our previous article on bulk importing videos to a PeerTube instance with the API, you can find how and where we generated the snippet file rewritemap.conf
with the old ids from our data in the CSV file and the new ids from the PeerTube API response on succesfull import/upload.
In our next article we will show how to integrate this Nginx URL redirection setup in PeerTube (or in any other existing webserver with Nginx), by recommending where to add these Nginx configuration files.
]]>URL redirection is done for various reasons:
from Wikipedia: URL redirection
In our case we focus on the second reason: to prevent broken links after migrating a video streaming website to PeerTube.
Several different kinds of response to the browser will result in a redirection. These vary in whether they affect HTTP headers or HTML content. The techniques used typically depend on the role of the person implementing it and their access to different parts of the system. For example, a web author with no control over the headers might use a Refresh meta tag whereas a web server administrator redirecting all pages on a site is more likely to use server configuration.
We will focus on the implementation of redirects in a server configuration, for which we need to send a HTTP header with the status code 301 to specify the URL is “moved permanently”.
For instance if we want to redirect this video web page URL:
https://www.openbeelden.nl/media/1000452
to our new PeerTube web page:
https://peertube.beeldengeluid.nl/videos/watch/b0fd82cf-68b7-447d-a5fe-ed5706cd991a
In a configuration for Apache HTTP Server this would be implemented like:
<VirtualHost *:80>
ServerName www.openbeelden.nl
Redirect 301 /media/1000452 https://peertube.beeldengeluid.nl/videos/watch/b0fd82cf-68b7-447d-a5fe-ed5706cd991a
</VirtualHost>
There are also some other ways to achieve the same and the documentation on Redirecting and Remapping for the Apache HTTP Server is extensive and of very high quality. For the Nginx http server all this can be done in a very similar matter, but it’s documentation is a bit more sparse.
Adding redirects like this (by adding a Redirect directive for every page) to a webserver configuration works perfectly fine if you only want to redirect several pages, but runs into a problem with scale.
To understand this you have to think about what happens if you have a couple of hundred, thousand or even tens of thousands of redirects. For every request to the webserver it will try to match all redirect directives line for line, which will have an impact on the time the webserver responds.
Fortunately webservers have a way to do a fast lookup of a key-value pairs by using so-called mapping files which will be loaded in memory/cache.
With a text file that looks like this:
##
## rewritemap.txt - old ID to new ID map file
##
1215870 35d6b543-d830-4678-a7f1-fda30c0ec95d
1193100 4d76aaa4-a991-444d-a282-42e2cebf5912
1216245 cebe1695-a289-4698-8a95-1efaaf9f13fd
1216823 ec131ab0-429a-444c-b262-18dd6a1a57b8
1216986 6196ddec-f7b7-454e-89e6-9d29740db259
1216173 7e8b8069-4897-4db1-875c-199a62b2e279
1215985 4b28ce84-7a33-46d5-964c-e315518e48ba
1216848 b889fa83-d860-4d57-9e23-de76f2fdc688
1196886 bccaa3b0-3baf-4ece-9afa-e0347a66dc0d
1196956 a94c4000-1ee1-4bf6-bac2-40e5e05029f3
you could implement a solution like this with RewriteMap for the Apache HTTP Server:
RewriteMap old2newid "txt:/etc/apache2/rewritemap.txt"
RewriteEngine on
RewriteRule "^/media/(.*)" "/videos/watch/${old2newid:$1}" [PT]
Note: The RewriteMap directive may not be used in
After some searching we found that you can do the same with Nginx, although it’s slightly more tricky to get it right. In a next article we will show how to set this up for Nginx and how we integrated it in the PeerTube architecture.
]]>.srt
or .vtt
file) it will be uploaded separately and transformed in a WebVTT (.vtt
) format so it can be used in the video player. In this article we will show how to bulk upload existing subtitle files with the PeerTube API.
As an example we will show you how we added existing subtitle files for most videos in the ‘themindoftheuniverse’ channel with the use of the Video Captions endpoint of the API.
For adding subtitle we need 3 pieces of data:
.srt
or .vtt
)en
)As we have shown in the article on bulk importing videos, you only know the identifier of the video after it has been assigned by a succesful upload/import. This means you could add the subtitle file right after this part in the video-imports script when you collect the uuid from the response:
if response.status_code == requests.codes.ok:
data = response.json()
uuid = data['video']['uuid']
If you generate subtitles after upload/import, it’s important to find a way to couple the video identifier with the subtitle file if you want to bulk upload your subtitle files as well. For instance, you could give your subtitle file the same name as the identifier (like 192727.srt if you use the object id as identifier). In our case we generated a small csv file to ‘map’ the video uuid with the subtitle file like below (first 10 records):
5a71eb42-d31d-4cf0-9a63-42319bfeb9c6|Guy_Consolmagno__ex2.srt
0721c557-b4d9-4ef9-ad97-91d663b38125|Yuri_Oganessian__ex2.srt
9f1b6821-a930-4653-b400-c1f517da7da6|Carolina_Cruz__br.srt
b6d136c8-dd4d-4792-97ec-979f27a7d833|Artur_Avila__br.srt
1ab7edd6-a41a-451e-946c-cfb2b97947e7|Joanna_Aizenberg.srt
0c53e6c5-9890-4696-88b6-7d3246405ce5|Pascale_Fung.srt
1fddd827-d868-4f1e-b649-ff8a2893dae6|Susant_Pattnaik__ex2.srt
cd1fbc56-110c-4243-80a5-f36aa3082840|Lee_Cronin__ex2.srt
7a66ecd9-f5e1-4638-a62c-26532c804632|Lee_Cronin__ex.srt
28ad638d-a795-459c-9f1e-4775a5174cec|George_Church__ex.srt
This file is then used by reading it line by line and adding a subtitle file through the API:
with open('data/vpro_srt.csv', 'r') as csvfile:
csv_data = csv.reader(csvfile, delimiter='|')
for row in csv_data:
uuid = row[0]
srt = row[1]
srt_exists = os.path.exists('data/transcripts/' + srt)
if srt_exists:
files = {
'captionfile': (srt, open('data/transcripts/' + srt, 'rb'))
}
response = requests.put(api_url + '/videos/' + uuid + '/captions/en', headers=headers, files=files)
time.sleep(1)
As you can see from the PUT request it either adds a subtitle file for the language specified in the API url (/captions/en
) or replaces the subtitle file for that language if there is a pre-existing file.
Generating subtitle files is mostly something that’s been done in post-production of a video and there is still a lot of development in (semi-)automatic subtitling (by ASR, Automatic Speech Recognition, for instance) and it’s not as advanced for all natural languages, but in a future article we will suggest some recommendations for tools/solutions in this domain.
]]>Creating subtitles manually involves a lot of human work, which makes it less feasible in the context of audiovisual archives with large collections. An increasingly popular alternative is to transcribe spoken audio using automatic speech recognition (ASR). As part of this project, we have explored the use of ASR transcription to generate subtitles for videos published via our PeerTube instance.
Have a look at this video about ‘The work of the harbour cleaning service’ for an example of a PeerTube video that is fully subtitled using the workflow described in this post.
This post walks through the process of generating ASR transcripts for Dutch spoken videos using Kaldi NL, converting them into the right subtitle format for PeerTube and adding them to their respective videos on the PeerTube instance.
An open-source solution available for ASR in Dutch is Kaldi NL, which builds on the broader Kaldi ASR software. The easiest way to install Kaldi NL is to use the bootstrapping script included in LaMachine. LaMachine is a linguistics toolkit which supports the installation of a rich selection of tools, but can also be customized to only set up a barebones installation of Kaldi NL and its requirements.
Installation of the required Kaldi packages is most easily done on commonly used Linux distributions. In our case we used Ubuntu 20.04.2 LTS. For the full installation notes please refer to the installation instructions included in the repository.
The first step is to ensure Kaldi NL has access to the source videos to be transcribed. If the videos are published on the PeerTube instance, one way to download these videos is using their torrent URIs and an application such as transmission-cli.
Kaldi NL can be started by running the Kaldi NL docker image, and specifying a bind mount to give it access to a directory on the host machine. In the following example, we have created a directory in the home directory called docker_share
. The files to be processed should be placed inside this folder, nested in a subdirectory called ‘input’. Our full example command becomes:
docker run -it --mount type=bind,source=/home/ubuntu/docker_share,target=/docker_share proycon/lamachine:lamachine_1
Following this, from within the docker instance, the following command can be used to transcribe a video:
./decode_OH.sh /<input-directory>/* /<output-directory>/
This will populate the output directory with a TXT file and a CTM file, containing the generated transcript.
The CTM and TXT formats are not directly supported for subtitles in PeerTube, so we will convert them to the supported SRT format.
The CTM output from Kaldi NL is formatted as a list of words per timecode. As reconstructing sentences with appropriate length from this source would be non-trivial, we will work with the TXT file. The TXT file is formatted with a sentence on each line, followed by the filename, the sentence number and the starting timecode, as shown in this subtitle about the port city of Rotterdam:
Zoals elke havenstad kent ook Rotterdam. (BG_10284.00001 0.070)
The target SRT format uses a subtitle index, a starting and ending timecode, and the subtitle text itself. The above example would look like this in the SRT format.
2
00:00:00,070 --> 00:00:02,750
Zoals elke havenstad kent ook Rotterdam.
As is visible from these examples, the SRT format requires information from two lines of Kaldi NL TXT, in order to get both the start and end point of a subtitle. We’ve written a Python script to convert a Kaldi NL TXT file into a SRT file, which can be utilized as follows:
python3 kalditxt2srt.py -i <kaldi-txt>.txt
The script takes as input a TXT file generated by Kaldi NL and generates a new file with the same name and the SRT extension and format.
Once the video has been processed by the ASR software, and its output transformed into SRT format, the final step is the ingestion of the subtitles in a PeerTube instance. Adding subtitles to existing videos on a PeerTube can be scripted via the PeerTube REST API. Using this part of the API, requires a OAuth2 authorization token, as described in the documentation.
First the client ID and secret need to be known. This is done by a simple GET request:
curl https://host.name.nl/api/v1/oauth-clients/local
Then a user must generate an OAuth2 bearer token, which can be done by sending a request for a bearer token:
curl -X POST -d "client_id=<client-id>&client_secret=<client-secret>>&grant_type=password&response_type=code&username=<username>&password=<password>" https://host.name.nl/api/v1/users/token >> token.json
Ingestion of subtitles is done using a call to the captions
endpoint of the API. Subtitles are added to existing videos using a PUT HTTP request. For example, the HTTP request shown below adds a subtitle file for a specified language to an existing video, using the bearer token acquired in the previous step:
curl -X PUT -F "captionfile=@<-name>.srt" -H "Authorization:Bearer <token>" -H "Accept:application/json" -v https://host.name.nl/api/v1/videos/<your-video-identifier>/captions/<language-tag>
As an example we will show you how we set a category for all videos in a specific channel. The channel that we targeted has the handle ‘themindoftheuniverse’ and the category is ‘Science & Technology’ of which the id is 15.
https://peertube.beeldengeluid.nl/c/themindoftheuniverse https://peertube.beeldengeluid.nl/api/v1/videos/categories
As we can only return a maximum of 100 videos with the API, we first need to establish the total of the videos in the channel, so we can figure out how many sets of 100 videos we need to update.
channel_handle = 'themindoftheuniverse'
response = requests.get(api_url + '/video-channels/' + channel_handle + '/videos?count=0')
videos = response.json()
total = videos['total']
loops = math.ceil(total/100)
Basically this gets the total from making this call to the API without getting the video data (count=0):
https://peertube.beeldengeluid.nl/api/v1/video-channels/themindoftheuniverse/videos?count=0
With the math.ceil function we can calculate how much loops/pages we need to iterate. We start with an offset of zero and then get the next 100 videos by multiplying the iterator i
with 100.
while i < loops:
offset = i * 100
# GET videos from channel
response = requests.get(api_url + '/video-channels/' + channel_handle + '/videos?start=' + str(offset) + '&count=100&skipCount=true')
videos = response.json()
for video in videos['data']:
requests.put(api_url + '/videos/' + video['uuid'], headers=headers, data=data)
time.sleep(1)
i += 1
For each 100 videos we send a PUT API call to update the video by it’s uuid with the data we have defined for category, but it’s possible to update several fields at once:
data = {
'category': 15
}
Once again we pause our script for 1 second after each API update call as to not overload the API.
]]>There are a lot of other optional data fields you can set and in this article we will show how we imported videos in bulk with their metadata.
Before you start uploading videos, you first have to know the required channelID of the channel that will contain the videos. Once you have created a channel through the PeerTube interface (or have a default main channel after signup), you can find the channel ID by listing the channels with the API:
https://peertube.beeldengeluid.nl/api/v1/video-channels
As we were targeting our channel named ‘openbeelden’, we found that the channelID is 2.
To understand what other data can be added to our video imports, it’s important to look first what the options and/or restrictions are to the data. In the PeerTube REST API documentation it’s described in relatively easy to understand OpenAPI format.
As we exported data for around 7000 videos in a big CSV file, we first had to map the data to something that would fit the PeerTube data model. This is an example of 1 line/record in our data file:
oai:openimages.eu:1215870|"Het gerestaureerde carillon speelt weer"||carillons;gemeentehuizen;klokken|"Weekjournaal van Polygoon Hollands Nieuws van week 27 uit 1949."|"Het gerestaureerde carillon van de stadhuistoren in Veere wordt opnieuw in gebruik genomen. De stadsbeiaardier van Rotterdam, Ferdinand Timmermans, speelt oa het Zeeuwse volkslied. SHOTS: - stadsshots en straatshots; - ext. stadhuis; op het bordes luisteren de burgemeester en gezelschap; - int. stadhuistoren; beiaard en beiaardier; - enkele shots van luisterend publiek dat grotendeels bestaat uit mensen in Zeeuwse klederdracht."|"Polygoon Hollands Nieuws (producent) / Nederlands Instituut voor Beeld en Geluid (beheerder)"|1949-06-27|https://www.openbeelden.nl/media/1215870|https://creativecommons.org/publicdomain/mark/1.0/|https://www.openbeelden.nl/files/12/15/1217877.1215874.WEEKNUMMER492-HRE000132B4_2551000_2626960.mp4|https://www.openbeelden.nl/files/12/15/1217883.1215874.WEEKNUMMER492-HRE000132B4_2551000_2626960.ogv
First we defined a text cap funtion to prevent failed imports due to too long text items and a dictionairy to map the licence links in our data to the PeerTube licence IDs (see: https://peertube.beeldengeluid.nl/api/v1/videos/licences)
def cap(text, length):
return text if len(text) <= length else text[0:length-3] + '...'
# NOTE: there are only 7 default licenses in PeerTube. For any number above that you need a plugin which adds more license options
# https://github.com/beeldengeluid/peertube-plugin-creative-commons
licence_links = {
'https://creativecommons.org/licenses/by/3.0/nl/': '1',
'https://creativecommons.org/licenses/by-sa/3.0/nl/': '2',
'https://creativecommons.org/licenses/by-nd/3.0/nl/': '3',
'https://creativecommons.org/licenses/by-nc/3.0/nl/': '4',
'https://creativecommons.org/licenses/by-nc-sa/3.0/nl/': '5',
'https://creativecommons.org/licenses/by-nc-nd/3.0/nl/': '6',
'https://creativecommons.org/publicdomain/zero/1.0/': '7',
'https://creativecommons.org/publicdomain/mark/1.0/': '8'
}
Next we open 3 files we need during import: the data source CSV file (‘openbeelden.csv’), an empty CSV file (named ‘delta.csv’) where we add the records that fail during import and another file (‘rewritemap.conf’) where we add the old video id and the new video id after import to later create an URL redirect map.
file_delta = open('data/delta.csv', 'a', newline='')
delta_writer = csv.writer(file_delta, delimiter='|')
file_rewritemap = open('rewritemap.conf', 'a')
with open('data/openbeelden.csv', 'r') as csvfile:
csv_data = csv.reader(csvfile, delimiter='|')
for row in csv_data:
While looping to all the rows/lines in our data file, we match the field by their index number (0 for first field), strip unnecessary spaces and put them in named variables for further processing:
uri = row[0].split(':');
old_id = uri[2];
title = row[1].strip()
alternative = row[2].strip()
tags = row[3].split(';');
description = row[4].strip()
abstract = row[5].strip()
creator = row[6].strip()
date = row[7].strip()
url_old = row[8].strip()
licence_link = row[9].strip()
video = row[10].strip() # mp4 HD
if not video:
video = row[11].strip() # ogv HD
The following steps transforms this data so it’s acceptable for import. We cap the texts to their maximum, get the licence id and restrict the tags to 5 unique items that are between 2 and 30 characters long. We also combine descriptive texts from our data to 1 extended description.
title = cap(title, 120)
licence = licence_links.get(licence_link, '')
tags = list(dict.fromkeys(tags))
tags = list(filter(lambda a: len(a) >= 2, tags))
tags = list(filter(lambda a: len(a) <= 30, tags))
tags = tags[:5]
description_ext = ''
if alternative:
description_ext += alternative + '\n\n'
if description:
description_ext += description + '\n\n'
if abstract:
description_ext += abstract + '\n\n'
description_ext = cap(description_ext, 9800)
if creator:
description_ext += creator
Next we prepare our transformed data and post it to the video import API endpoint. Some data like language (nl) and privacy (1 = public) and booleans for commentsEnabled and downloadEnabled we set fixed for all videos:
# Import video, use multipart/form-data request with 'files'
data = {
'name': (None, title),
'channelId': (None, str(channel_id)),
'targetUrl': (None, video),
'language': (None, 'nl'),
'privacy': (None, '1'),
'commentsEnabled': (None, 'false'),
'downloadEnabled': (None, 'true'),
'description': (None, description_ext),
'originallyPublishedAt': (None, date)
}
# create indexed array for tags
for j in range(len(tags)):
data['tags[' + str(j) +']'] = (None, tags[j])
if licence:
data['licence'] = (None, licence)
response = requests.post(api_url + '/videos/imports', headers=headers, files=data)
By checking if the import succeeds or fails we either write the old and new id to our rewritemap file or we add the failed record to our delta file:
if response.status_code == requests.codes.ok:
data = response.json()
uuid = data['video']['uuid']
file_rewritemap.write(old_id + " " + uuid + ";\r\n")
else:
delta_writer.writerow(row
time.sleep(1)
We end this iteration of the loop with a 1 second pause as to prevent that we overflow the API with too many concurrent requests.
Once we were done with importing the videos we checked our delta file for failed imports and adjusted the data accordingly so we could use this file for another import.
Only recently (since version 3.3) the error responses have been normalized in PeerTube, so it might be wise to add this error response text as another field to the delta file to get a clear indication of why the import fails.
]]>As one of the examples of using the PeerTube API, you can make an HTTP(S) GET request to the /api/v1/videos
endpoint. As you can see by following the link
https://peertube.beeldengeluid.nl/api/v1/videos
in your browser this will return data of the videos in JSON.
Requests is also the name of a popular HTTP library for the Python programming language. This library is thus essential and imported as first in the header of our scripts.
#!/usr/bin/python3
import requests
With this library we can retrieve the same data as above by sending a GET request and outpout the response:
response = requests.get('https://peertube.beeldengeluid.nl/api/v1/videos')
print(response.json())
For a lot of things you can do with the PeerTube API (like create, update or delete a video), you need to be authorized and therefore authenticated first. When you sign up for an account on a PeerTube instance, you are given the possibility to generate sessions on it, and authenticate there using an access token.
Authenticating via OAuth requires the following steps:
/api/v1/users/token
.This is how we implement these steps. First we define our activated account (api_pass is the password of the account):
api_url = 'https://peertube.beeldengeluid.nl/api/v1'
api_user = 'nisv'
api_pass = 'xxxxxxxxxxxx'
Then we get the OAuth client token with the help of the requests library:
response = requests.get(api_url + '/oauth-clients/local')
data = response.json()
client_id = data['client_id']
client_secret = data['client_secret']
Now we can fetch the user access token by sending the client token with our username and password:
data = {
'client_id': client_id,
'client_secret': client_secret,
'grant_type': 'password',
'response_type': 'code',
'username': api_user,
'password': api_pass
}
response = requests.post(api_url + '/users/token', data=data)
data = response.json()
token_type = data['token_type']
access_token = data['access_token']
With the access token we can define the headers that we will use with our requests:
headers = {
'Authorization': token_type + ' ' + access_token
}
For example this will look something like this when we want to update a video in our script with a PUT action on the /api/v1/videos
API endpoint:
requests.put(api_url + '/videos/sY6rPiwzy85rQxwp1E7LMa', headers=headers, data=data)
Being able to make authorized requests in this way, opens up a lot of functionality to do things programmatically through the API that are not possible in the web interface of PeerTube, like bulk importing and/or updating videos.
]]>Before thinking about creating your own tool to do things like bulk importing videos, it’s wise to check the existing official CLI tools that are part of PeerTube. These tools are tried and tested and are supported by the PeerTube community.
One of these tools is peertube-import-videos.js, which comes close to what we wanted, but we imported videos from a platform which is not supported by this tool.
You can use this script to import videos from all supported sites of youtube-dl into PeerTube.
As an admin of your instance you have to setup your preferred configuration for video uploads. After logging in as an admin you can navigate to Administration > Configuration > Basic > Videos and set ‘Allow import with HTTP URL (e.g. YouTube)’ if you are importing this way. More importantly you have to decide how you want your videos transcoded for playback through Administration > Configuration > VOD Transcoding as this has quite an impact on your available storage.
We decided on the following configuration after careful consideration.
We chose the recommended ‘HLS with P2P support enabled’ instead of the default ‘Webtorrent enabled’ for reasons that are stated in the help text (behind the question mark icon):
Requires ffmpeg >= 4.1 Generate HLS playlists and fragmented MP4 files resulting in a better playback than with plain WebTorrent:
Resolution change is smoother Faster playback especially with long videos More stable playback (less bugs/infinite loading) If you also enabled WebTorrent support, it will multiply videos storage by 2
When using the PeerTube API there is a couple of things you need to be aware of:
When testing our scripts/tools that utilise the PeerTube API, we noticed some inconsistencies, omissions and even discovered some small bugs. A lot of our findings found it’s way back into fixes in the documentation and source code, by submitting pull requests upstream.
As PeerTube is still in development towards v4 and new features are being added, the API and it’s documentation is still a moving object. When you upgrade your PeerTube instance, always re-test your script/tool and adapt accordingly.
Since v2 of PeerTube, API throttling was introduced to prevent user-agents (like bots) to “hammer your API” and have a negative impact on the performance of your instance. In your main production configuration file, this shows the rates limits:
rates_limit:
api:
# 50 attempts in 10 seconds
window: 10 seconds
max: 50
login:
# 15 attempts in 5 min
window: 5 minutes
max: 15
signup:
# 2 attempts in 5 min (only succeeded attempts are taken into account)
window: 5 minutes
max: 2
ask_send_email:
# 3 attempts in 5 min
window: 5 minutes
max: 3
When bulk importing videos we needed to take this into account as each video import needs a separate API call. We implemented a 1 second pause between each API call in our script to be on the safe side.
Of all the 7000 videos we imported only a dozen or so failed. All of those failed, not because of PeerTube or our script, but because of problems with the data source.
As data source we used a CSV file that was generated by exporting data (harvesting of OAI-PMH) from the openbeelden.nl platform. In this dataset of 7000 videos there were some odd and unexpected data inconsistencies and even a couple of missing video files that have been unnoticed for years.
Our videos had a lot of tags added to them and we already knew we could only use a maximum of 5 tags for a video in PeerTube, so we scripted it to reduce it to use the first 5 tags. What we missed was that these 5 tags also have to be unique, which we assumed they were, but apparently not in our data.
The error reporting of the PeerTube API is pretty good, so you can see why the import failed. For every failed import we appended the source data to a seperate CSV file in our script, so we could fix the data and import it again.
]]>