Saturday, December 03, 2016

What is Multi-Tenancy? A closer look

Lately, I had a lot of conversations about multi-tenancy (MT). So I finally wrote up my thoughts on that term.

In this post I will argue that
  1. MT is a value that depends on a continuous variable. Therefore, any statement about a system being “MT” can only be made in the context of the given requirements. It is not a property of the system itself.
  2. I will also show that perfect multi-tenancy is indistinguishable from single-tenancy (ST).

MT is a value that depends on a continuous variable

Imagine a step-function "ST-MT" (values are either 0 or 1) that determines if a given system is MT (1) or ST (0). That function will look like this:

ST-MT = function(system, business requirements)

Look at  the function’s arguments: the first one is obvious – the result will depend on the system itself.
The second one is more interesting: it is the cumulative set of business requirements. Typically, these requirements will include:
  • Resource sharing: systems typically declared as being MT share some resources. That can be network, storage, compute, etc. The business requirements specify which of these resources are acceptable to be shared and which are not.
  • Tenant isolation: the business requirements specify the required level of tenant isolation, meaning which level of “noisy neighbour” problem is acceptable. That noisy neighbour could affect the resource sharing, but also has a heavy influence on the acceptable security requirements.
  • Extensibility requirements: how can tenants extend or tweak the system. Consider setting variables or deploying custom code into the system.
The first observation to make is that these input variables are continuous. That means:
Any given system can either be MT or ST depending on the values of the cumulative requirements.

Let me work through some example values for the requirements that could affect the value of the ST-MT-function:

 

Resource sharing

  • Network: multiple tenants might share parts of the network, e.g. Local network interfaces, internet connection, routers, etc. One given tenant might saturate the network and affect other tenants, maybe even make the system inaccessible for these tenants 
  • Storage: the system might allow tenants to submit queries to the storage subsystem. Those queries could be expensive/long-running and affect the storage’s response times to other tenants. 
  • Compute: same as above for shared compute resources.

 

Tenant isolation

It is to be specified by business requirements which isolation levels between tenants are required. The considerations above regarding “resource sharing” fall into this bucket, but there is more:
It is a business consideration if the physical storage system can be shared between tenants. For example, some customers require storage to be physically separated from their competitors data/content.

 

Extensibility

Business requirements might require to allow tenants to independently adjust system settings (properties). For example, it might be required that the URL space can be created for each tenant independently, so that any tenant could independently claim the URL "/abc”. Similarly, it could also be required that any tenant can independently change setting like “max time to execute a function”, “max number of users in a group”, etc.
More interestingly, the business might require that the tenants are allowed to upload executable code into the system. This input variable usually has a large impact on the value of the ST-MT-function.

In the light of this let’s look at some given systems and ponder whether they are MT or ST:

 

DBs 

Consider a plain old RDBMS like MySQL or Oracle. Do you think it is MT or ST?
One can create DB users with respective read/write rights for each tenant such that that the users cannot “see” each other. However, what if one tenant saturates the JDBC/ODBC connections? Same question for compute resources required for queries.
As a concrete example: consider the (now deprecated) Parse platform – which was a MBaaS. Customers could simply sign up and get a MBaaS. However, they would share the same underlying MongoDB. Guess what: it is extremely easy to write a Mongo query that eats up all system resources – which would slow down or break the system for all other tenants.
It is clear: the very same DB technology can be considered either ST or MT – it entirely depends on your requirements.

 

A web gateway

Consider a reverse proxy (or even a cluster of reverse proxies) that serve as an entry point to many backend services. Let's say for the sake of the argument that it is Nginx-based. Most would probably argue that such a setup is MT (because adding new tenants does not require changing the system). Well, let me come up with a new requirement: a malicious tenant shall not be able to break the system (i.e. make the system unavailable for other tenants). Well, obviously it is always a possibility that there is a bug in Nginx which one tenant might exploit or hit accidentally (see old CVE).
Is this an unacceptable risk and hence makes that gateway unsuitable for MT purposes? Would it make the system ST? Again: this is just a business decision.

 

A consumer-facing storage system

Consider a content repository that is largely thought of as being “MT” (e.g. Dropbox). Consider a new enterprise customer who demands that the physical storage of his files is to be physically separated from any other tenants. If you bring in this requirement and if the backend of the system does not physically separate files then these new tenants must be deployed onto their own backend storage servers. Does this make the system “ST”?

At this point I hope to have made the point clear that a system by itself cannot be called MT or ST without specifying further business constraints.
Let me move to the second point:

Perfect multi-tenancy is indistinguishable from single-tenancy

Reading the above you might be tempted to think that MT is a continuum with ST at one end and MT at the other end. This is somewhat true – but:
The continuum is a circle and MT and ST fall onto the same point.

How is that possible?
Imagine that you want to set out to design a system that satisfies ALL POSSIBLE business requirements towards multi-tenancy. You would separate storage, network, compute etc etc. The resulting system would be indistinguishable from a number a ST systems that sit next to each other.
In conclusion: the systems that are commonly called multi-tenant could be called “single-tenant systems in which the business requirements allowed to share certain resources”. Or to put it the other way around “what is commonly know as ST systems are simply perfectly designed MT systems that satisfy all possible requirements on multi-tenancy”.
Funny, no?

Friday, May 28, 2010

Second FISE Hackathon


At this week's IKS meeting at Paderborn the second FISE Hackathon took place. FISE is an open source semantic engine that provides semantic annotation algorithms like semantic lifting. The actual annotation algorithms are pluggable through OSGi. Existing CMSs can integrate the engine through an HTTP interface (inspired from Solr). Last week, Bertrand gave an introductory talk about FISE that is available online.


There was no explicitly set goal for the second Hackathon. Rather, the existing code base was extended in various different directions. Some examples:

  • a language detection enhancement engine (I am particularly glad to see this - automatic language detection in CMSs is a pet passion of mine)
  • a UI for FISE users that allows humans to resolve ambiguities
  • myself, I coded a JCR-based storage engine for the content and annotations

There was also a good amount of work done on the annotation structure used by FISE and documented on the IKS wiki.

A complete report of the Hackathon is available on the IKS wiki (the only thing it fails to mention: the event's good spirit).

One major non-code step was to get many participants up to speed with the FISE engine and enable them to deploy the engine as well as get accustomed with the architecture and code base.

It was only last week that I took a deeper look into FISE. I like its architecture a lot. The HTTP interface makes it easy to play with FISE as well as integrate it. Even more important, the pluggable archirecture that is mostly inherited from the OSGi services architecture makes FISE very flexible and extensible. This is particularly important given the different natures of the enhancement engines that we want to be able to deploy (hosted services, proprietary, open source, etc). I consider FISE to be a particularly well suited use case for OSGi.

(cross-posting from here)

Saturday, April 10, 2010

NoSQL talk at Developer Summit

Three days ago I had to chance to talk about NoSQL at the Internet Briefing's Developer Summit. On top of general ideas and concepts like the CAP theorem I chose to talk about Apache Jackrabbit, CouchDB and Cassandra. My slides are embedded below.

It was a really good event with interesting speakers and a knowledgeable audience. I was especially pleased that when I talked about CouchDB's HTTP API someone from the audience mentioned that Apache Sling does something very similar for Jackrabbit.

Special kudos to Christian Stocker of Liip for daring to do a live demo of the "real-time web" - he took a picture from his phone and had it pop up on Jabber and Twitter in about 5 secs.

Vlad Trifa has posted a good summary of the whole event (part 1, part 2) - he also gave a great presentation about the application of the REST architectural style to the "Web of Things".


No Sql
View more presentations from mmarth.

Friday, March 12, 2010

CMS vendors now and then

CMS analyst Janus Boye has just published a post on CMS vendors that discontinue their products (because they get bought out or similar)
During the past 10 years, a number of software products used by online professionals have been discontinued
That sentence reminded me that I had given a talk almost 10 years ago (it was in 2001 exactly) that contained a slide on the CMS market at that time:



The circles denote vendors that were part of CMS market overview articles by popular German IT magazines in that year (I wanted to show how differently the market place could be perceived). A vendor placed in any of the circles had enough attention to be part of at least one evaluation. The vendors outside of the circles were not part of any of these overview articles, but somehow present in the market place - at least I knew their names back then.

It is interesting to look at the landscape from that time. Of course there are a number of well-known vendors that got bought (Vignette, Obtree, Gauss), but the majority still seems to linger on - at least, a web site still exists, for example iRacer, Schema Text, or Contens.

On the other hand, one can ask how many vendors that were important enough to make it into a (German) market overview are still relevant in the market place today. I have used Janus Boye's spreadsheet of relevant European CMS vendors as a benchmark and checked which vendor's of today's list were already in 2001's presentation: Day, Coremedia and Open Text were "in the circles". Tridion was there, but outside of the circles. The rest of the vendors that Janus considers relevant today were not on my radar in 2001.

The end of my presentation involved a couple of CMS-related predictions. Let's see how I did. I predicted:
  • product borders between CMS, DMS and app servers will blur further - my take now: wrong. I do not think that these border are more blurry than they were in 2001
  • more standards and standards-based software (Java, JSP/ASP, XML, XSL) - true. The underlying technologies of CMSs are more homogeneous than they were at that time. Remember TCL?
  • But no true compatibility. True. Nothing more to say.
  • Improved Personalization. Improved Multi-Channel support. Both not really true, but rather fads of those days.
  • Improved DMS features and Office integration. Don't ask me why I said that.
  • No quick market consolidation in sight. Right on the money here.
Mostly correct on general market considerations, mostly wrong on features.

Saturday, January 09, 2010

mp3tagger on GitHub

On the mp3 tagger post I have received quite a bit of feedback and feature requests. Therefore, I thought it might be a good idea to do "social coding" and put the code on GitHub where it can easily be forked (and the forks can be watched).

Other than that, the latest version of the tagger contains these improvements:
  • the Last.fm keys and secret are not stored in the code anymore, but entered on the first run and stored in ~/.mp3tagger.cfg
  • you can run the script in two additional modes: simulation and ask. In simulation mode no changes to mp3s will be saved, in ask mode you will be asked to save each change. Start the script with flags "-m simulation" or "-m ask", respectively.
  • It is now possible to specify a list of genre tags that will be considered (additionally to the mp3 default genre tags). The list needs to be stored in a config file at ~/.mp3tagger_genres.cfg (in the "generic" section of the file). The full format this file needs to have is shown below.
  • The last improvement is a tricky one: after tagging all my mp3s I ended up with hundreds of albums tagged with genre Electronic or Indie. I wanted to refine these genres into sub-genres. This again works by putting a list of possible sub-genres into ~/.mp3tagger_genres.cfg and running the tagger with flag "-r genre", e.g. "-r Electronic". You would run this option when you find that you have too many albums of one genre and want to split them up.
So in summary my config file ~/.mp3tagger_genres.cfg looks like:


[generic]
genres=Shoegaze,Dubstep,Grime,Dub,Drum And Bass
[refinements]
Electronic=Idm,Turntableism,Techno,Minimal,Dub,Big Beat,Ambient,Breakbeat,House,Lounge,Electroclash,Drum And Bass,Chillout
Indie=Indie Rock,Indie Pop,Singer-Songwriter,Indie Pop,Shoegaze,Post-Rock,Americana,New Wave,Alt-Country
Reggae=Dancehall,Dub,Ska

Sunday, November 15, 2009

Running the iTunes genre tagger script with OS X Automator

Due to public demand here's a little recipe how to run last post's mp3 tagger without using the command line on OS X:
  • Open Automator
  • Start a new "Application" project
  • Drag the "Run Shell Script" action into the right workflow panel, set the "pass input" drop-down to "as arguments" and edit the script to (see screenshot below):
for f in "$@"
do
/opt/local/bin/python /Users/michaelmarth/Development/Code/mp3tagger/tag_groupings.py -d "$f"
done

(you will have to adapt the paths to your local setup)
  • Save the application and happily start dropping mp3 folders onto the application's icon.

Tuesday, November 03, 2009

Python script to set genre in iTunes with Last.fm tags

Now that I have started to seriously use iTunes I figured it might be nice to have the genre tag set in a meaningful way. Since I have a reasonably large collection of mp3s doing that manually was out of question - I wrote me a Python script to do that. There seems to be a large demand for such a functionality (at least I found a lot of questions on how to automatically set the genre tag) so maybe someone else finds the script useful. It is pasted below.

General Strategy

The basic idea is to use Last.fm's tags for genre tagging. In iTunes the genre tag is IMO best used when it only contains one single genre, i.e. something like "Electronica", not something like "Electronica / Dance". On the other hand dropping all but one tag would lose a lot of information, so I decided to use the groupings tag for additional information that is contained in the list of tags that an artist has on Last.fm. In the example above that would be something like "Electronica, Dance, 80s, German". In that way it is simple to use iTunes' Smart Playlist feature to create play lists of all, say, dance music. This approach is probably not suitable for classical music..

The ID3 field that is exposed in iTunes' UI as "grouping" is defined in the ID3v2 spec as:
TIT1
The 'Content group description' frame is used if the sound belongs to a larger category of sounds/music. For example, classical music is often sorted in different musical sections (e.g. "Piano Concerto", "Weather - Hurricane").
So, the strategy I described above seems to be kind of in line with the spec. In general, it is a good idea to have a look at the ID3v2 spec if you consider dabbling with mp3 tags.

Practical Considerations

If one would just take an artist's highest-rated Last.fm tag for the genre one would end up with pretty inconsistent genre tags (think "hip-hop", "hip hop", and "hiphop"). Therefore, I chose to use a fixed set of values for genre. In a previous version of ID3 the list of possible genres was fixed. While this is clearly a terrible idea to start with it came along handy in this case. I used his as a fixed list for genres.

The second practical consideration was which Last.fm tags to include. In Last.fm parlance each artist tag comes with a weight (values form 0 to 100). Selecting only the tags with weight larger than 50 worked out fine for me (usually I had 1-5 tags per artist).

A third thing you might want to be aware of: if you programmatically change tags in an mp3 iTunes will not pick up these changes automatically. A simple way of letting it know: select the "Get Info" command on these items. This will trigger a reload of the new tag values.

Script

To run the script you will need the Python libraries mutagen and pylast installed. Run it with the option
-d directory_with_mp3s
The script will walk along this directory and modify all mp3s it finds. Also, you will need a Last.fm API key and set your API_KEY and API_SECRET accordingly in the script.


#!/usr/bin/env python
# encoding: utf-8
"""
tag_groupings.py

Created by Michael Marth on 2009-11-02.
Copyright (c) 2009 marth.software.services. All rights reserved.
"""

import sys
import getopt
import pylast
import os.path
from mutagen.id3 import TCON, ID3, TIT1

help_message = '''
Adds ID3 tags to mp3 files for genre and groupings. Tag values are retrieved from Last.FM. Usage:
-d mp3_directory
'''

class Usage(Exception):
def __init__(self, msg):
self.msg = msg

all_genres = TCON.GENRES
genre_cache = {}
groupings_cache = {}
API_KEY = "your key here"
API_SECRET = "your secret here"
network = pylast.get_lastfm_network(api_key = API_KEY, api_secret = API_SECRET)

def artist_to_genre(artist):
if genre_cache.has_key(artist):
return genre_cache[artist]
else:
tags = network.get_artist(artist).get_top_tags()
for tag in tags:
if all_genres.__contains__(tag[0].name.title()):
genre_cache[artist] = tag[0].name.title()
print "%20s %s" % (artist,tag[0].name.title())
return tag[0].name.title()

def artist_to_groupings(artist):
if groupings_cache.has_key(artist):
return groupings_cache[artist]
else:
tags = network.get_artist(artist).get_top_tags()
relevant_tags = []
for tag in tags:
if int(tag[1]) >= 50:
relevant_tags.append(tag[0].name.title())
groupings = ", ".join(relevant_tags)
groupings_cache[artist] = groupings
print "%20s %s" % (artist,groupings)
return groupings

def walk_mp3s():
for root, dirs, files in os.walk('.'):
for name in files:
if name.endswith(".mp3"):
audio = ID3(os.path.join(root, name))
artist = audio["TPE1"]
genre = artist_to_genre(artist[0])
grouping = artist_to_groupings(artist[0])
if genre != None:
audio["TCON"] = TCON(encoding=3, text=genre)
if grouping != None:
audio["TIT1"] = TIT1(encoding=3, text=grouping)
audio.save()

def main(argv=None):
if argv is None:
argv = sys.argv
try:
try:
opts, args = getopt.getopt(argv[1:], "ho:vd:", ["help", "output="])
except getopt.error, msg:
raise Usage(msg)

# option processing
for option, value in opts:
if option == "-v":
verbose = True
if option in ("-h", "--help"):
raise Usage(help_message)
if option in ("-o", "--output"):
output = value
if option in ("-d"):
try:
os.chdir(value)
except Exception,e:
print "error with directory " + value
print e
walk_mp3s()

except Usage, err:
print >> sys.stderr, sys.argv[0].split("/")[-1] + ": " + str(err.msg)
print >> sys.stderr, "\t for help use --help"
return 2

if __name__ == "__main__":
sys.exit(main())