vmx

the blllog.

Video uploads for an online conference

2021-06-12 16:35

This blog post should give some insights on what happens behind the scenes in preparation of an online conference, and I also hope that some of the scripts I created might be useful for others as well. We were using pretalx for the submissions and Seafile for video uploads. Both systems are accessed over their HTTP API.

This year’s FOSSGIS 2021 conference was a pure online conference. Though it had the same format as every year. Three days of conference, with four tracks in parallel. This leads to about 100 talks. I joined the organizing team about 10 weeks before the conference took place. The task sounded easy. The speakers should be able to upload their talks prior to the conference, so that during the conference less could go wrong.

All scripts are available at https://github.com/vmx/conference-tools licensed under the MIT License.

The software

The speakers submitted their talks through pretalx, a conference management system I highly recommend. It is open source and has an active community. I’ve worked on/with it over the past few to make it suitable for OSGeo conferences. The latest addition is the public community voting plugin, which has been used for the FOSS4G 2021 as well as this conference. pretalx has a great HTTP API to get data out of the system. It doesn’t yet have much support for manipulating the data, but pull-requests are welcome.

For storing the video files, Seafile was used. I haven’t had any prior experience with it. It took me a bit to figure out, that the Python API is for local access only and that the public API is a pure HTTP API. You can clearly see that their API is tailored to their use in their web interface and not really designed for third party usage. Nonetheless, it guarantees that you can do everything via the HTTP API, that can be done through the web UI.

My scripts are heavily based on command line tools like b2sum, curl, cut, jq and jo, hence a lot of shell is used. For more complex data manipulation, like merging data, I use Python.

The task

The basic task is providing pre-recorded videos for a conference that were uploaded by the speakers themselves. The actual finer grained steps are:

  • Sending the speakers upload links
  • Looking through the videos to make sure they are good
  • Re-organizing the files suitable to be played back according to the schedule
  • Make the final files easily downloadable
  • Create a schedule which lists the live/pre-recorded talks

In Seafile you can create directories and make them publicly available so that people can upload files. Once uploaded, you won’t see what else in that directory. In order to be able to easily reference the uploaded videos back to the corresponding talk, it was important to create one dedicated directory per talk, as you won’t know which filenames people will use for their videos.

The speakers will receive an email containing dedicated upload links for each of their talks. See the email_upload_links directory for all the scripts that are needed for this step.

pretalx

First you need to get all the talks. In pretalx that’s easy, go to your conference, e.g. https://pretalx.com/api/events/democon/submissions/. We only care about the accepted talks, which can be done with selecting a filter. If you access it through curl, you’ll get a JSON response like that one: https://pretalx.com/api/events/democon/submissions/?format=json. pretalx returns 25 results per request. I’ve created a script called pretalx-get-all.py that automatically pages through all the results and concatenates them.

A talk might be associated with multiple speakers. Each speaker should get an email with an upload link. There were submissions that are not really talks in the traditional sense, so people shouldn’t get an email. The query for jq looks like that:

[.results[] | select((.submission_type[] | contains("Workshop")) or (.submission_type[] == "Anwendertreffen / BoF") | not) | { code: .code, speaker: .speakers[].code, title: .title, submission_type: .submission_type[]}]

The submissions contain only the speaker IDs and names, but not other details like their email address. We query the speakers API (e.g. https://pretalx.com/api/events/democon/speakers/) and post-process the data again with jq, as we care about their email addresses.

You can find all the requests and filter in the email_upload_links/upload_talks_to_seafile.sh script.

Seafile

Creating and upload link is a two-step process in Seafile. First create the directory, then creating a public accessible upload link for the directory. The directories are named after the pretalx ID of the talk (Full script for creating directories).

Creating emails

After acquiring the data, the next step is to process the data and creating the individual emails. Combining the data is done with the combine_talks_speakers_upload_links.py script, where the output is again post-processed with jq. The data_to_email.py script takes that data output and a template file to create the actual email as files. The template file is used as a Python format string, where the variables a filled with the data provided.

Those email files are then posted to pretalx, so that we can send them over their email system. That step is more complicated as currently there is no API in pretalx to do that. I logged in through the web interface and manually added a new email, while having the developer tools open. I then copied the POST request “as cURL” to have a look at the data it sent. There I manually extracted the session and cookie information in order to add emails from the command line. The script that takes the pre-generated emails and puts them into pretalx is called email_to_pretalx.sh.

Reviewing the uploaded videos

Once a video is uploaded, it gets reviewed. The idea was, that the speakers don’t need to care too much about the start and the end of the video, e.g. when they start the recording and there is a few seconds of silence while switching to the presentation. The reviewer will cut the beginning and end of the video and also convert it to a common format.

We wanted to preserve the original video quality, hence we use LosslessCut and converted it then to the Matroska format. The reviewers would also check that the video isn’t longer than the planned slot.

See the copy_uploads directory for all the scripts that are needed for this step.

pretalx

The reviewers get a file with things to check for each video file. We get the needed metadata again from pretalx and post-process it with jq. As above for the emails, there is again a template file which (this time) generates Markdown files with the information for the reviewers. The full script is called create_info_files.sh.

Seafile

Once videos are uploaded they should be available for the reviewers. The uploaded files are the primary source, hence it makes sense to always make copies of the talks, so that the original uploads are not lost. The sync_files_and_upload_info.sh script copies the talks into a new directory (together with the information files), which is then writeable for the reviewers. They will download the file, review it, cut it if needed, convert it to Matroska and upload it again. Once uploaded, they move the directory into one called fertig (“done” in German) as an indicator that no one else needs to review it.

I run the script daily as a cron job, it only copies the new uploads. Please note that it only checks the existence on a directory level. This means that if a talk was reviewed and a speaker uploads a new version of the talk, it won’t be copied. That case didn’t often happen often and speakers actually let us know about it, so it’s mostly a non-issue (also see the miscellaneous scripts section for more).

Last step is that someone looks through the filled out markdown files to check if everything was alright, respectively make sure that e.g. the audio volume is fixed, or asks the speaker for a new upload. The then checked videos are moved to yet another directory, which then contains all the talks that are ready to be streamed.

Re-org files for schedule

So far, the video files were organized by directories that are named after the pretalx ID of the talk. For running the conference we used OBS for streamer. The operator would need to play the right video at the right time. Therefore, it makes sense to sort them by the schedule. The cut_to_schedule.sh script does that re-organization, which can be found in the cut_to_schedule directory.

pretalx

To prevent accidental inconsistencies, the root directory is named after the current version of the pretalx schedule. So if you publish a new version of the schedule and run the script again, you’ll get a new directory structure. The video files still have an arbitrary name, chosen by the uploader/reviewer, we want a common naming scheme instead. The get_filepath.py script creates such a name that also sorts chronologically and contains all the information the OBS operators need. The current scheme is <room>/day<day-of-the-conference>/day<day-of-the-conference>_<day-of-the-week>_<date>_<time>_<pretalx-id>_<title>.mkv.

Seafile

The directories do not only contain the single final video, but also the metadata and perhaps the original video or a presentation. The file we actually copy is the *.mkv file which was modified last, which will be the cut video. The get_files_to_copy.sh script creates a list of the files that should be copied, it will only list the files that weren’t copied yet (based on the filename). The copy_files.sh script does the actual copying and is rather generic, it only depends on a file list and Seafile.

Easily downloadable files

Seafile has a feature to download a full directory as zip file. I originally planned to use that. It turns out that the size of the files can be too large, I got the error message Unable to download directory "day1": size is too large.. So I needed to provide another tool, as I didn’t want that people would need to click and download all individual talks.

The access to the files should as easy as possible, i.e. the operators that need the files shouldn’t need a Seafile account. As the videos also shouldn’t be public, the compromise was using a download link secured with a password. This means that an authentication step is needed, which isn’t trivial. The download_files.sh script does login and then downloads all the files in that directory. For simplicity, it doesn’t do recursively. This means that any stage would need to run this script for each day.

I also added a checksum check for more robustness. I created those checksums manually with running b2sum * > B2SUMS in each of the directories and then uploaded them to Seafile.

List of live/pre-recorded talks

Some talks are recorded and some are live, the list_recorded_talks.py script, creates a Markdown file that contains a schedule with that information, including the lengths of the talks if they are pre-recorded. This is useful for the moderators to know how much time for questions will be. At the FOSSGIS we have 5 minutes for questions, but if the talk runs longer, there will be less time.

You need the schedule and the length of the recorded talks. This time I haven’t fully automated the process, it’s a bit more manual than the other steps. All scripts can be found in the list_recorded_talks directory.

Get the schedule:

curl https://pretalx.com/<your-conference>/schedule.json > schedule.json

For getting the lengths of the videos, download them all with the download script from the Easily downloadable files section above. Then run the get_length.sh script in each of the directories and output then into a file. For example:

cd your-talks-day1
/path/to/get_lengths.sh > ../lengths/day1.txt

Then combine the lengths of all days into a single file:

cat ../lengths/*.txt > ../talk_lengths.txt

Now you can create the final schedule:

cd ..
python3 /path/to/list_recorded_talks.py schedule.json talk_lengths.txt

Here’s a sample schedule from the FOSSGIS 2021.

Miscellaneous Scripts

Speaker notification

The speakers didn’t get feedback whether their video was correctly uploaded/processed (other than seeing a successful upload in Seafile). A short time before the conference, we were sending out the latest information that speakers needs to know. We decided to take the chance to also add information whether their video upload was successful or not, so that they can contact us in case something with the upload didn’t go as they expected (there weren’t any issues :).

It is very similar to sending out the email with the upload links. You get the information about the speakers and talks in the same way. The only difference is we now also need the information whether the talk was pre-recorded or not. We get that from Seafile:

curl --silent -X GET --header 'Authorization: Token <seafile-token>' 'https://seafile.example.org/api2/repos/<repo-id>/?p=/<dir-with-talks>&t=d'|jq --raw-output '.[].name' > prerecorded_talks.txt

The full script to create the emails can be found at email_speaker_final.sh. In order to post them to pretalx, you can use the email_to_pretalx.sh script and follow the description in the creating emails section.

Number of uploads

It could happen that people upload a new version of the talk. The current scripts won’t recognize that if a previous version was already reviewed. Hence, I manually checked the directories for the ones with more than one file in it. This can easily be done with a single curl command to the Seafile HTTP API:

curl --silent -X GET --header 'Authorization: Token <seafile-token>' 'https://seafile.example.org/api2/repos/<repo-id>/dir/?p=/<dir-with-talks>&t=f&recursive=1'|jq --raw-output '.[].parent_dir'|sort|uniq -c|sort

The output is sorted by the number of files in that directory:

  1 /talks_conference/ZVAZQQ
  1 /talks_conference/DXCNKG
  2 /talks_conference/H7TWNG
  2 /talks_conference/M1PR79
  2 /talks_conference/QW9KTH
  3 /talks_conference/VMM8MX

Normalize volume level

If the volume of the talk was too low, it was normalized. I used ffmpeg-normalize for it:

ffmpeg_normalize --audio-codec aac --progress talk.mkv

Conclusion

Doing all this with scripts was a good idea. The less manual work the better. It also enabled me to process talks even during the conference in a semi-automated way. I created lots of small scripts and sometimes used just a subset of them, e.g. the copy_files.sh script, or quickly modified them to deal with a special case. For example, all lightning talks of a single slot (2-4) were merged together into one video file. That file of course then isn’t associated with a single pretalx ID any more.

During the conference, the volume level of the pre-recorded talks was really different. I think for next time I’d like to do some automated audio level normalization after the people have uploaded the file. It should be done before reviewers have a look, so that they can report in case the normalization broke the audio.

The speakers were confused whether the upload really worked. Seafile doesn’t have an “upload now” button or so, it does it’s JavaScript magic once you’ve selected a file. That’s convenient, but was also confusing me, when I used it for the first time. And if you reload the page, you also won’t see that something was uploaded already. So perhaps it could also be automated that speakers get an email “we received your upload” or so.

Overall I’m really happy how the whole process went, there weren’t major failures like lost videos. I also haven’t heard any complaints from the people that needed to use any of the videos at any stage of the pipeline. I’d also like to thank all the speakers that uploaded a pre-recorded video, it really helped a lot running the FOSSGIS conference as smooth as it was.

Categories: en, conference, geo

Comments are closed after 14 days.

By Volker Mische

Powered by Kukkaisvoima version 7