Back By Popular Demand: Salesforce scraper
(by Erik Peterson on February 1st 2008)

In a previous incarnation of this here blog, I had a post about how to construct an app in Rails that scrapes Salesforce.com data to a local database. This was about a year and a half ago, and the blog that it was posted on has been defunct for at least a year now. To this day I still get requests for that post, so I figure I should repost it. A lot of things have changed in that year and a half, however. My project went from "integrate with Salesforce so we can get better reports" to "Replace Salesforce.com and then some." Therefore, I haven't used the code I had written about in a long time. I have no idea if it even still works. So while my use-case for this code doesn't exist any more, it is apparently useful to other people.

First off, this uses the activesalesforce gem. I think the last version that I tested it on was 1.0.0, so any updates might break this code. Also, there's apparently a activerecord-activesalesforce-adapter gem now, which works with Rails 2.0. Therefore, things are apparently much different these days. However, this code will probably give you some insight.

Database.yml

sf_production:
  adapter: activesalesforce
  url: https://www.salesforce.com/services/Soap/u/8.0
  username: #your salesforce username
  password: #your salesforce password

production:
  adapter: postgresql #You can probably use MySQL or whatever.  I doubt it matters
  database: production
  username: #whatever
  password: #whatever
  host: #whatever

account.rb

class Account < ActiveRecord::Base
  establish_connection "sf_#{RAILS_ENV}"
end

class Account_Postgres < ActiveRecord::Base
  set_primary_key "id"
  establish_connection "#{RAILS_ENV}"
  set_table_name 'accounts'
end

Opportunity.rb

class Opportunity < ActiveRecord::Base
  establish_connection "sf_#{RAILS_ENV}"
end

class Opportunity_Postgres < ActiveRecord::Base
  establish_connection "#{RAILS_ENV}"
  set_primary_key "id"
  set_table_name 'opportunities'
end

I think you get the point. You'll need to do the same for any other Salesforce objects you plan on using. It really sucks having to have the Salesforce version have the normal AR name, and the Postgres version having to have the altered name, but the activesalesforce gem depends on this naming scheme. Next, Here's a big fat dump of some of the helper methods that I use. You might want to put this in a file in lib.

lib/salesforce_helpers.rb

#Creates a "migration" for a specified Salesforce class.  Eval this in a ActiveRecord::Schema.define() block
def dumpclass( aclass )
  dumpstr = "  create_table \"" + aclass.table_name + "\", :id => false, :force => true do |t|\n"
  for column in aclass.columns
    dumpstr += "    t.column \"" + column.name + "\", "
    if column.type.to_s.eql?('text') && column.limit.to_i < 1000
      dumpstr += ":string, :limit => " + column.limit
    else
      if column.type.nil?
        dumpstr += ":string, :limit => 255"
      else
        dumpstr += ":" + column.type.to_s
      end
    end
    dumpstr += "\n"
  end
  return dumpstr + "  end\n"
end

#Scrapes a single record from Salesforce to the local database
def scrape( aobj, aclass, adbclass )
  begin
    new_obj = adbclass.new(convert(aobj, aclass))
    new_obj.id = aobj.id
    new_obj.save!
  rescue
    return 1
  end
  return 0
end

#Scrapes a whole Salesforce Object harshly- deleting all the local data, and the dumping.  Good for empty tables
def hard_update_class(aclass, adbclass)
  count = 0
  adbclass.delete_all
  for aobj in aclass.find(:all, :limit => 0)
    scrape(aobj, aclass, adbclass)
    count += 1
  end
  return count
end  

#Scrapes a whole Salesforce Object softly.  Only looks for objects that were created/updated since the last scrape.
def update_class(aclass, adbclass)
  #I honestly don't know why I did it this way.  It isn't very DRY.  There must be a reason, so tinker with caution.
  begin
    lastcreated = adbclass.find(:first, :order => 'created_date desc')
    lastmodified = adbclass.find(:first, :order => 'last_modified_date desc')
    for aobj in aclass.find(:all, :limit => 0, :conditions => 'createddate > ' + (lastcreated.created_date - 18000).to_s(:iso_8601_special))
      scrape(aobj, aclass, adbclass)
    end
    for aobj in aclass.find(:all, :limit => 0, :conditions => 'lastmodifieddate > ' + (lastmodified.last_modified_date - 18000).to_s(:iso_8601_special))
      adbclass.delete(aobj.id)
      scrape(aobj, aclass, adbclass)
    end
  rescue
    begin
      lastcreated = adbclass.find(:first, :order => 'created_date desc')
      lastmodified = adbclass.find(:first, :order => 'last_modified_date desc')
      for aobj in aclass.find(:all, :limit => 0, :conditions => 'created_date > ' + (lastcreated.created_date - 18000).to_s(:iso_8601_special))
        scrape(aobj, aclass, adbclass)
      end
      for aobj in aclass.find(:all, :limit => 0, :conditions => 'last_modified_date > ' + (lastmodified.last_modified_date - 18000).to_s(:iso_8601_special))
        adbclass.delete(aobj.id)
        scrape(aobj, aclass, adbclass)
      end
    rescue
      puts "Skipping " + aclass.to_s
    end
  end
end

#Converter for a single object
def convert ( aobj, aclass )
  hash = {}
  aobj.attributes.each { | key, value | 
    hash[key] = value if aclass.column_names.include?(key)
  }
  hash
end

#For some reason the Salesforce didn't interpret the ISO 8601 date format spec correctly
ActiveSupport::CoreExtensions::Time::Conversions::DATE_FORMATS.merge!(
  :iso_8601_special => "%Y-%m-%dT%H:%M:%S-05:00"
)

Ok, whew. Sorry that was so long. Now we can go about using these methods. Most of this stuff is in rake tasks.

#Create local schema for Account
ActiveRecord::Schema.define() do
  eval(dumpclass(Account))
end

#Prefill the data for Account
hard_update_class(Account, Account_Postgres)

#Update the data for Account
update_class(Account, Account_Postgres)

Conclusion

I hope that was useful to someone. I'm sorry that the code is in such a decrepit state. If you have any questions, just comment here and I'll try to help you out. I highly encourage anyone to modify/update/cleanup this code and make a Rails plugin or Gem for it.