Tuesday, July 08, 2008

FriendFeed or FollowerFunnel?

Continuing in the vein of the Pied Piper effect. The noise around how bad Twitter is and how soon it's going to collapse and how great FriendFeed is, is all getting rather tiring.

The word "shrill" comes to mind. Along with the word "shill".

I find both services useful, but I notice that most people complaining about Twitter seem to be the people who are more concerned about followers rather than friends. Note that most comments about how FriendFeed is so great start with "I have X followers on Twitter and Y followers on FriendFeed and Y is rapidly going to surpass X. It took me P years to get my followers on Twitter and only Q months on FriendFeed and this is baaad, baaad, baad for Twitter."

Yes, yes the repetitive noise with different values of X, Y, P and Q is getting rather tiring and all you're doing is splitting your own followers into those that follow you on FriendFeed and those that follow you on Twitter.

Note again that there is no mention of how FriendFeed is helping them to follow their friends, but how FriendFeed is helping them create a more effective attention vortex centered around them so they have more followers.

So is a Pied Piper more interested in FriendFeed because the Pied Piper really is interested in the personal lives of the long line of followers? Or just because it plays a better, faster, catchier tune that attracts more children.

Ok, so this sounds really cynical but the noise from some people around why FriendFeed is better needs to be boiled down to a simple thing - it allows the same some people to aggregate followers faster. FriendFeed, in that particular usage scenario, isn't about friends - in that scenario it should be called FollowerFunnel. That would be a more accurate description of how it's being used at least from the posts of the people clamoring for similar features from Twitter. So FriendFeed and similar emerging services have a "dark side" the FollowerFunnel and while the "friends" aspect is the advertising, the "follower" aspect is the man behind the curtain.

Twitter isn't a megaphone in front of a circus tent and I suspect it never will be, so the Follower Funnelers should move on and stop trying to make it into one and leave the rest of us in peace.

Tuesday, June 17, 2008

Robert Scoble and The Pied-Piper Effect

I've noticed some interesting happenings on Twitter recently that have inspired me to coin a term "The Pied Piper Effect". The setup is as follows.

A user on a social network starts to accrete a large number of followers to the extent that this user becomes the most followed user on the network. This user is popular for one of a number of reasons -

a) They have something useful to say about the world outside the web.
b) They have something useful to say about the web.
c) The number of messages that they send attracts attention and people are drawn just to see what the noise is all about.
d) The number of messages that they send about themselves being on the web talking about the web or about other people talking about the web draws people even more to see why that might even be faintly interesting.

On Twitter Robert Scoble falls into category d) but he is not the only member of the category. This category of user, at the head of the power law distribution, becomes at some point the test of the infrastructure of the service. Some services like Facebook (5000 followers max), LinkedIn (500 connections max) limit the number of links to other users. Some services (Twitter, FriendFeed...) don't. It is the latter that provide fertile ground for Pied Piper behavior.

A Scoble-like user accretes a large number of users, stresses the network talking mainly about themselves talking about themselves - "come see me I am livecasting myself now and I will be talking about my next livecasting event". And then when the network begins to show signs of stress the Scoble-like user threatens to move, or actually moves, the focus of their attention to a new and as yet not fully saturated network.

The hypnotized children follow.

The Pied Piper does this because the denizens of the previous town refused to "pay the piper". The children follow because they are afraid they might miss a note or a beat and then, oh my god the horrors. And all this will happen to the next town at some point and the next.

But the more important question is - how many of these newly created social networks get populated in the first place just because a Scoble-like user happened to pass by the town with the kids in tow and happened to stop by? To massively mix metaphors, is there a Pied-Piper-Pollination effect in play?

And then there's the much bigger question about whether the tune that the Piper is playing is even music at all, and whether the price is worth it - but that's a whole other story.

Monday, May 19, 2008

It's all about Tupperware(tm)

On Sunday, I spent an afternoon talking with Om Malik of GigaOm, Matt Mullenweg of Wordpress and Stanislav Shalunov of shlang about Social data and Facebook and monetization of data and Google and data ownership and data portability.

You know, nothing topical ;-).

One of the insights for me, from this discussion, was that monetizing Facebook social connections i.e spamming your friends to sell them something, was a model that's at least 40+ years old. In the past it was the "tupperware party" model.

The model behind "tupperware parties" was that friends were invited over to a party and then they all admired the Tupperware(tm) - which was plastic kitchenware. You focused on getting orders before the party was over, sold stuff to your friends and bootstrapped your business that way, hoping to convert some of them to Tupperware distributors.

This was in the era when plastic kitchen stuff was considered cool, because plastic anything was novel. Let that sink in. That's how old this model is.

But the huge difference there was you made real money by selling real stuff.


To quote:-

"This plan has been used primarily to sell items whose main appeal is to women, such as Tupperware itself (a food-storage system), kitchen utensils, home decor items, jewelry, skincare, cosmetics, and similar products; recent additions to the field include lingerie, sex toys and Landmark Education."

With the so called social network ad model - you sell stuff to your friends (oh ok, "get them to install apps"). You piss them off and then you make .... zip, nada, bupkus.

So how long before people catch on that this isn't really about being "social" which means actually connecting with people in a real way. This is like you are in big tent for hosting Tupperware parties. You invite people to meet you at this tent because that's where all the cool people are going to have their parties. And you're going because you are with someone who is following someone else who says it's cool to go to this party.

[According to Stanislav, "the core demographic on Facebook is young women and the guys are there because the women are there." ]

And the women and the guys all come over and then after the party's over, prematurely, they get solicited for Tupperware, relentlessly, endlessly, crassly.

And eventually people in the tent start muttering to each other that this is not what they expected when they first got there. And that it was all so much fun meeting new people and connecting with old friends until it all became about Tupperware. And they all move on to the next tent where it's about, oh does it matter ... Chipperware, or something equally ludicrous and plastic.

So how long before the muttering starts. I'll give it two years max. Or else the tent folds up because not enough Tupperware gets sold and they can't pay the rent and the people who bought the Tupperware to sell it now are stuck with ... well plastic social stuff that no one considers cool any more.

I know I am probably in the minority but have you been to a Tupperware party - a real one - lately? Make that ever? Don't you think if selling stuff at social events was something that people loved, it would have been around for a little longer than a few years?

And just because it's over the web somehow people are supposed to develop a taste for buying plastic at parties?

Perhaps the wizards behind the curtain are forgetting that the entities on the other side of the Innertube are real people and that people move on from fake stuff and long for real stuff.

Ok, so call me a curmudgeon, I've been called worse. Just don't sell me stuff pretending to make it a party. After all wasn't this social media thing supposed to be about "the conversation" in the first place?

.. because you know maybe people prefer organic social stuff, that doesn't abuse the social, you know, environment - let's call it "green social". Something that people can grow in their neighborhoods and you can actually talk in person to the farmer at the farmers market. And what's the equivalent of that for social networks?

Once the Tupperware party in the sky is over, let's have a real conversation where no one is selling anything. Like Om said at the end of the post where he announced the mini-meetup on Sunday - "I will buy coffee and cakes, but please don’t pitch me your company. I want some honesty about this topic."

It was a refreshing conversation and I'd like to have more of those where people aren't selling plastic social.

Thursday, May 08, 2008

Guten Tag. Und Wilkommen in Tag Schema

This blog was originally created to discuss database designs underlying the emerging tagging or folksonomy applications (Flickr, del.icio.us ....) at the leading edge of Web 2.0. Over the next year or so the frenetic activiy in that area stabilised and the focus moved to the so called 'social network' applications. While those were not the subject of the original folksonomy discussions, they underlined some importamt and related database issues.

One such issue is that of the database schemas induced by the need to model 'friend' relationships. As it turns out these are just as interesting as the folksonomy schemas. In parallel, other issues emerged - "data in the cloud" and "data portability". The former is about the move away from centralized, relational databases based on SQL and the latter is about data ownership issues created by having personal digital assets distributed all over the Internet captive inside Web 2.0 applications.

So the current trends on the net have major implications for underlying data structure. As the application architectures change and as new disruptive ones emerge, the underlying data layers experience corresponding tectonic shifts. And it is the new agenda of this blog to track all these data related issues as they emerge. This is much broader than the original narrow issue of folksonomy database design but in retrospect it is a natural evolution. The only problem that remains is the name 'tagschema' which seems so narrowly focused on tagging database schemas.

Luckily at a recent MySQL event the solution emerged in a conversation with Kaj Arno of MySQL, now Sun. "Ahhh ..." he said looking at the word 'tagschema' on my name tag. "That sounds like a German daily tv show 'Guten tag und wilkommen in Tag Schema' - that means 'Good day and welcome to Tag Schema - schema of the day'". I thought nothing more of it but the phrase 'Guten tag und wilkommen in Tag Schema' kept playing in my mind. Later the name for the new blog came back to mind and I realized that Tag Schema could mean - schema of the day or 'Current schema' or 'Current trends in schema' more generally 'discussions of underlying structure of the day' which is generally where we will be going with this blog. - exploring new data structures and technologies as they emerge.

So thanks very much Kaj Arno for that moment of zen serendipity.

And so 'Guten Tag. Und Wilkommen in Tag Schema'

How many times do I have to tell you? ........ Don’t …. Repeat ….. Yourself.

It is 2008 – do you know where your avatar is? I don’t. I have copied it into so many different web apps I have no idea where it’s been. And I just got a request to update my address book from yet another address book provider with an address of mine copied 5 years ago.

The social web has become a giant cookie monster – growling “Gimme Copy, Gimme Copy, yeah yeah yeah … mmmmmm Copy”

The DRY (Don’t Repeat Yourself) principle has been touted by designers of Rails and other modern web app frameworks. It is ironic then that these frameworks have been used to build a whole generation (Web 2.0) of apps that force the user to make copies of data again and again into each web app. In the data world the DRY principle reads "Don't Make Copies" (DoMaCo).

How many times do I have to tell you – Don’t …. Repeat ….. Yourself. Dear web app builders – you created the Internet Copy Monster – you need to help stamp it out. But how? Read on.



Fig 1. Yesterday's web app architecture – forced violation of DRY principle at the data level.

Most Web 2.0 apps do not expose REST URI’s to every data element. This means the user can’t access their data freely which means they can’t reuse their data in other web apps. The real added value of a successful social web app lies in the community interactions and the UI elegance that enables community interactions, not in the data management layer.

For example the popularity of a service such as Flickr is primarily due to their innovations around tagging and “interestingness” and the very active community, not because of their massive data storage facilities or their disk farms, which are a cost center.

That means Flickr would still be Flickr even if the data layer in Fig 1 were not owned by Flickr. Think about that for a minute and apply that across the social web. Note also that Flickr allows you to embed Flickr photos into other applications – so Flickr photos can become definitive instances of your photo data. Flickr exposes data pointers – in effect they have become a next generation Internet data layer for photos.

This leads to the possibility of a general purpose data layer - not part of the web app but part of the Internet infrastructure - a data layer which contains user digital assets and all social data. This would be provided by a new class of service provider the “data service provider” who would give you URI’s to all your content, give you full control and access to your data and would be a for-fee service.

Web apps would only point to data in this data layer and not be part of the huge Internet Copy Monster.

Now consider tomorrow’s web application architecture which is already in place in parts. I call this the Yinas approach – YINAS being a recursive acronym for Yinas Is Not A Silo.



Fig 2 Yinas. A web app architecture that doesn’t violate the DRY principle at the data level and respects user data rights.

This may seem like it needs a massive redesign of all web apps, but it doesn’t. It would just require a uniform approach to data embedding in web apps, most of which is already in place. The needed work is already done for most content except text, avatars and structured content such as address books etc. We already embed photos, video, and audio via URI’s to remote content hosting services. We just need to extend it uniformly to all content types, not just image, audio, video and we need to use it as a pervasive design principle across the web.

In summary – let’s recognize “Don’t Make Copies” as a useful design principle for web app data and let’s consume pointers instead of copies.

Let's stamp out the Internet Copy Monster. Let's stamp out unnecessary repetition. Shall we? Shall we?

P.S.

And please forward a permalink to your friends, not a copy ;-)

Friday, February 01, 2008

Why Data Portability is a non-solution to a non-problem

I have written a draft on Backpack
Note: As of Feb 6 2008 the draft is now a post on GigaOm

Please leave comments there - comments on this post here are now closed.

Wednesday, July 25, 2007

Some thoughts on data rights

Tim O'Reilly was talking about data and data access in his keynote at Oscon2007 today (Wed Jul 25th 2007) I thought I'd post some thoughts I have been chewing on for a while, even while I am still in the keynote. These issues have technical and philosophical implications. They are not about tags per se but do apply very strongly to data currently captive in contemporary folksonomy applications as well as other Web 2.0 applications. Comments and Criticism invited.

A manifesto for data rights in a globally networked world

(Draft 1 Jul 25th 2007) (cc) Published under Creative Commons "Attribution No Derivatives" Licence

We consider the following to be axiomatic and universal

  1. Data is a first class citizen of the network.

  2. Data must not be held captive in an application or locked in proprietary application-specific file formats.

  3. Data must be readable and exportable directly, programmatically, completely without restriction and stored in open, non-proprietary formats.

    1. Programmatic data access must allow FULL export and read capability independent of what the human UI allows
    2. Arbitrary restrictions must not be placed on data access by the application controlling the data, whether due to unintentional limitations of the application architecture or due to intentional design.


  4. Every unit of data must be independently addressable via a URI

    1. On the Internet, data should be accessible via REST based architectures


  5. Every unit of data must be capable of having an associated access policy, separately from other such units of data

    1. Each data unit must be able to have a possibly different access control policy
    2. The default access control policy of a data unit created by an individual must be "private"
    3. Policy change must be under the free control of the individual,
    4. Policy change must be under the control only of the individual.


  6. Data is property. Hence data access and ownership must be subject to rights strongly similar to or identical to physical property rights.

    1. No application, service, organization or other entity may require data exposure or implicit surrender of data ownership as a price of use or access to some facility

    2. Data exposure must be separately negotiated and be freely negotiable without coercion, according to the needs of the individual.
    3. "Website shrink wrapped licenses" are not considered to be a a meaningful negotiation in this context.

    4. Data about an individual belongs to that individual and only to that individual, who may choose to share the data subject to their needs and no one else's

    5. Data does not belong to the incidental keepers of data representations (internet service providers, medical service providers, financial service providers, state and federal govt agencies)

Sunday, October 01, 2006

Putting the "folk" back in folksonomy

Or ... The fat belly and recommendation systems



Since the beginning of Web 2.0 time, "folksonomy" has been synonymous with tagging. It's time to fill out the picture. As readers of this blog know, folksonomy involves tags, tagged-items, and tagger-users. This post digs deeper re: the role of users in the "holy trinity" of user-tag-item. And examines the relationship of users to recommendation systems, ... and to the "fat belly".

Yes, that does sound like a whole lot of ground to cover but

a) I have been gone for a while so need to catch up in a hurry - what can I say?
b) It's not that much ground to cover when we see the interesting relationships
c) The notation described in the previous post makes it possible to cover a lot of ground without too much verbiage.

So without further ado, here goes.

In a typical folksonomy system we have users attaching tags to items. As the system evolves we have, given an item 'i', the sets: -
T(i), the tags associated with i and
U(i) the users who use the item i.

Typical folksonomy apps have focused on navigating the various relationships with a focus on T(i). Recommendation systems that suggest 'related items' are also most often based on T(i), as follows. Given an item we find all tag related items via I(T(i)). Then we use some algorithm to trim this down to the "best" 5 or 10 by some definition of "best". Then we use these as recommendations. Given a user of item i, these are the recommended other items, or 'related items' based on tags.

For the rest of this discussion, we denote this set of recommendations as Rt(i) i.e given an item i, the recommended other items based on tags.

Consider now, the other way to get related items, i.e. user-related items.
This is the famous "users who bought this item also bought ...." approach that we know and love.

Given an item i we get U(i) all the users of i, and then I(U(i)), all the items used by those users. Again we use some way to trim this down to the best 5 to 10 or so and recommend these. Given a user of item i, these are the recommended other items, based on users.
We denote this set of recommendations as Ru(i) i.e the recommendations based on users of item i.

Now comes the interesting part derived from work done at Odeo and Greenplum over the last year or so. Experiments suggest the following two major results, which need much more qualification by further work and study. This is only an indicator of interesting areas for research, not a formal proof of anything.

a) Empirical results suggest that for even a small set of users Ru(i) gives better recommendations than Rt(i), i.e. using user-related items gives better recommendations than using tag-related items.

b) Empirical results suggest that the "algorithm" we use to go from I(U(i)) to Ru(i) makes a lot of difference to the relevance and 'interestingness' of recommendations.

Ok, b) was really cryptic so we'll take the rest of this post to unpack it into useful results and pretty pictures.

Step by step,

I(U(i) is the raw set of user related items for item i (people who bought item i also bought a whole ton of other shtuff namely I(U(i)) )

But that is too huge a set to use as recommendations - it could have anywhere from tens to tens of thousands of items depending on what data we are operating on. So we need to trim this down with a filter that filters out and keeps the best recommendations.

So I(U(i)) ---> Filter ---> Ru(i) ie. after filtering the raw set of user-related items we get user-related recommendations.

Now we need to decide how to filter. Lets do the simple thing first.

First we sort the collection I(U(i)) by count, i.e. how many times does some item turn up in this collection.

The temptation is to take the top 10 items by count and use these as the recommendation. This is what I did in practice and found that the recommendations that are generated are only mildly customized i.e they are interesting in general but not necessarily interesting to me. Most of the times they are almost identical to the "most popular" items on the front page.

Why is this?

Because I *took* the most popular ones by count, I sampled the head of the distribution and didn't get anything new.

So then I decided to go the other way - I looked at the lower end of the counts and picked reco's from there, i.e. the proverbial "long tail". Now I got some strange and freaky recommendations - if you had subscribed to the Catholic podcast on Odeo you would have been recommended the Open Source Sex podcast. Not quite what we have in mind, when we say "recommendations".

This led me by accident to explore the remaining area of the range of counts, the middle, recently named the "fat belly" by Robert Young in a recent post on GigaOm.

Here is where things got very, very interesting in the recommendations generated. For example,
Evan Williams who has an interest in modern furniture got a recommendation for a podcast related to furniture although none of his current subscriptions had anything to do with furniture!

This was very exciting and stimulated further exploration which confirmed that the best recommendations came from the fat belly.


So

I(U(i)) ----> Sort by count, filter from the head ----> "popular (i.e. obvious) "

I(U(i)) ----> Sort by count, filter from the long tail ----> "freaky (i.e. too different)"

I(U(i)) ----> Sort by count, filter from the fat belly ----> "relevant and interesting"


Recommendation systems and the powerlaw curve


Now the other interesting observation was that using similar techniques on I(T(i)) did not give such crisp recommendations, where I(T(i)) are all the tag-related items for a given item. i.e. collections of tags are not as useful as collections of users in creating a recommendation engine.

Why might this be and how do we understand it from first principles? Here's my little theory.

Let's think about this in terms of gestures, primary and secondary gestures. Users express interest in an item by various gestures. One of them is tagging an item, but prior to tagging an item is the act of focusing on an item and picking it out of the vast universe of items.
This primary selection process appears to be far more powerful an indication of interest than the secondary act of tagging or describing the already selected item. Hence, I hypothesize, a recommendation system based on user-related items is more crisp than one basedon tag-related items.

The bigger picture here suggests that the user or people dimension in folksonomy is just as or more interesting than just the tag dimension. We need to look more deeply at the "folk" and not just the "..sonomy".

(This subject was discussed in a talk I gave at FooCamp where present were and some very smart people like Hal Varian of Google, DeWitt Clinton ex of Amazon, Luke Lonergan CTO of Greenplum, Mary Hodder of Dabble, James Levine of SimplyHired, and Todd "the SEO Guy" .... who participated in a very energetic discussion and helped me refine these ideas. Thanks for that, guys.)