Improving Search for Twitter Handles

Hello Twitter,

I have been using your service for awhile, and I love it!

At first, I was skeptical about what you could offer: Broadcasting to all my friends that I was eating a pizza, or taking a walk, is not really my cup of tea. But 3 years ago I figured out what Twitter was really meant for and how it could help me in a totally different way from what I first thought:

  • sharing interesting articles,
  • checking if /replace by the service provider you want/ is down,
  • or catching up on HackerNews.

More recently, I discovered you had a feature that could help me even more: I can now ask for support by tweeting. Tweeting is often faster and more productive than sending an email. You taught me to include the recipient’s Handle in my tweets, and your current Handle auto-completion implementation works pretty well: but what if you could provide a better typo-tolerance and ranking? (I’m NOT speaking about your official OSX/iOS native clients and its totally unusable auto-completion feature… btw, could you explain me why it is different from the one on your website?).

I have been leading a search-engine development team over the last 5 years and I’m now VP of engineering at Algolia. I am aware that considering my job, I have kind of an “expert” point of view about search. But search has become so essential that I am convinced it must be irreproachable. Did you know that 1.7M+ people are currently following

expecting great things from your search-engine, Twitter :) Here is how I would improve search for Twitter handles:

For example, it would be nice if I could find President @barackobama with his last name:

Search for Twitter handles including @obama yields less-than-stellar
results.

Same for Justin:

Search for Twitter handles that could be Justin Bieber's yields less-than-
stellar results.

Typo-tolerance is now a must-have, especially because we’re all using smartphones and tablets:

Search for Twitter handles should have typo tolerance.

More and more handles are now prefixed/suffixed by “official”, which makes finding @OfficialAdele just impossible:

Search for Twitter handles that start with @official is broken.

For sure we can improve it, let’s code!

First of all Twitter, I need your Handles database :)

  • I used your Streaming API to crawl about 20M+ accounts in ~2 weeks: it’s not blazing fast but I must admit it does the job (and it’s free). That’s about 5 lines of Ruby with TweetStream, good job guys!
  • and Daemonize to create a bin/crawler executable.
#! /usr/bin/env ruby

require File.expand_path(File.join(File.dirname(__FILE__), '..', 'config', 'environment'))

daemon = TweetStream::Daemon.new('crawler', :log_output => true)
daemon.on_inited do
  ActiveRecord::Base.connection.reconnect!
  ActiveRecord::Base.logger = Logger.new(File.join(Rails.root, 'log/stream.log'), 'w+')
end
daemon.on_error do |message|
  puts "Error: #{message}"
end
daemon.sample do |status|
  Handle.create_from_status(status)
end

For each new tweet you send to me, I store the author (name + screen_name + description + followers_count) and all his/her user mentions.

class Handle < ActiveRecord::Base

  def self.create_from_user(user)
    h = Handle.find_or_initialize_by(screen_name: user.screen_name)
    puts h.screen_name if h.new_record?
    h.name = user.name
    h.description = (user.description || "")[0..255]
    h.followers_count = user.followers_count
    h.updated_at ||= DateTime.now
    h.save
    h
  end

  def self.create_from_status(status)
    Handle.create_from_user(status.user)
    status.user_mentions.each do |mention|
      m = Handle.find_or_initialize_by(screen_name: mention.screen_name)
      m.updated_at ||= DateTime.now
      m.name = mention.name
      m.mentions_count ||= 0
      m.mentions_count += 1
      m.save
    end
  end

end

And every minute, I re-index the last-updated accounts with a batch request using algoliasearch-rails,

every 1.minute, roles: [:cron] do
  runner "Handle.where('updated_at >= ?', 1.minute.ago).reindex!"
end

The result order is based on several criteria:

  • the number of typos,
  • the matching attributes: the name/handle is more important than the description,
  • the proximity between matched words,
  • and the followers count (I also use the “mentions count” if my crawler didn’t get the followers count yet).

I could have improved the results by using the user’s list of followers/following but I was limited by your Rate LimitsInstead, I chose to emphasize your top-users (accounts having 10M+ followers).

Here is the configuration I used

class Handle < ActiveRecord::Base

  include AlgoliaSearch
  algoliasearch per_environment: true, auto_index: false, auto_remove: false do
    # add an extra score attribute
    add_attribute :score

    # add an extra full_name attribute: screen_name + name
    add_attribute :full_name

    # do not take `full_name`'s words order into account, `full_name` is more important than `description`
    attributesToIndex ['unordered(full_name)', :description]

    # list of attributes to highlight
    attributesToHighlight [:screen_name, :name, :description]

    # use followers_count OR mentions_count to sort results (last sort criteria)
    customRanking ['desc(score)']

    # @I_love_you
    separatorsToIndex '_'

    # tag top-users
    tags do
      followers_count > 10000000 ? ['top'] : []
    end
  end

  def full_name
    # consider screen_name and name equal
    # the name should not match exact so we concatenate it with the screen_name
    [screen_name, "#{screen_name} #{name}"]
  end

  # the custom score
  def score
    return followers_count if followers_count > 0
    if mentions_count < 10
      mentions_count
    elsif mentions_count < 100
      mentions_count * 10
    elsif mentions_count < 1000
      mentions_count * 100
    else
      mentions_count * 1000
    end
  end

end

The user query is composed by 2 backend queries:

  • the first one retrieves all matching top-users (could be replaced by a query targeting your followers/following only)
  • the second one the others.

Try it for yourself, and enjoy relevant and highlighted results after the first keystroke: Twitter Handles Search.

Our Search-as-a-Service offer has now 10 API Clients!

We recently reached a new milestone towards the release of our Search as a Service offer. We’re now proud to offer 10 API clients, covering all major languages.

Ease of use was a major focus during development. We began by offering a complete and easy-to-integrate REST API. Providing API clients was a logical way to improve ease of use. You can now quick start and test the engine with your data in a couple of minutes, with no prior configuration whatsoever. Each API Client is released under the MIT License and comes with a quick start and complete documentation:

This variety of languages and platforms reveals the diversity of our beta testers:

  • Customer size: from a small startup developing their MVP, to a big social network searching in their 130M+ users.
  • Volume: from a few queries to tens of millions per day. 
  • Technical environments: mobile, desktop, and web apps.

Interested in trying it out yourself? Ask for an invite!

Asian Language support in our Offline Search SDK 2.2

Like most search engines, version 2.1 did not include any specific processing for Asian Languages. Version 2.2 significantly improves Asian language support (Chinese, Japanese, Korean) by including specific processing like the automatic conversion between Simplified Chinese and Traditional Chinese using the Unicode UniHan Database. This advanced processing was only possible because we built our own Unicode library. Many thanks to Stephen for his help!

This release also contains other improvements we released first for our SaaS version:

  • The out-of-the-box ranking was greatly improved when queries contained more than two words,
  • Indexing speed was greatly improved on mobile (2 times more efficient),
  • Search speed was improved by about 20%.

We hope you’ll like these new features, and as ever, we welcome your feedback!

New iOS and OS X API clients for our Search-as-a-Service offer

Build new things with our iOS and OS X API
client.One week after releasing our Java & Android clients, we are happy to release our iOS and OS X API clients for our search-as-a-service offer.

In order to ease the setup, we support Cocoapods. Installation of the client just requires one line in your Podfile:

pod 'AlgoliaSearch-Client', '~> 1.0

And don’t forget we also provide developers with an offline SDK that they can use to search directly on iOS devices with no connection to the network. Developers now  have the perfect tools to build a great search experience both online and offline.

With this new client, we now have API Clients for the most popular languages and platforms. They are all released under the MIT license and available on our Github account:

Ease of integration just improved again! Your feedback (and pull requests) is most welcome.

New Java & Android Search-as-a-Service API Clients at DroidCon Paris!

Our Search-as-a-Service offer is progressing toward its official release. We launched our Java and Android search API Clients at DroidCon Paris today! Come to see us if you’re attending!

And don’t forget we also provide developers with an offline SDK they can use to search directly on Android devices with no connection to the network. Developers now have the perfect tools to build a great search experience both online and offline.

The Android API Client is based on the Java client and adds support of asynchronous API calls. You can thus easily trigger a search query from the UI thread and get the result in a listener without any additional line of code. You have just to implement the IndexListener interface.

With these two new clients, we now have eight API Clients released under the MIT license to simplify integration of Algolia Search as a Service:

Ease of integration just improved again! Your feedback (and pull requests) is most welcome.

Search