Looking forward to Azure from Microsoft

October 27th, 2008 admin Posted in Appengine, azure, microsoft No Comments »

Microsoft has a new announcement in the area of "hosted computing” called “azure”. Its a new service rivaling amazon’s webservices and Google’s half baked Appengine.

The problem with this announcement is, the signup page doesn’t take me anywhere nor can I find any “sample” sites that are running azure. It looks like a work in progress and will be sometime before it is released to the developers. I just wish they(Microsoft) make their websites simple and not generate so much of noise, with no significant information. Make an announcement if it is ready and I can take it for a spin.

The previously released Microsoft SSDS ( Sql server data services) is a part of this whole azure tech stack.

  • Windows Azure for service hosting and management, low-level scalable storage, computation and networking
  • Microsoft SQL Services for a wide range of database services and reporting
  • Microsoft .NET Services which are service-based implementations of familiar .NET Framework concepts such as workflow and access control
  • Live Services for a consistent way for users to store, share and synchronize documents, photos, files and information across their PCs, phones, PC applications and Web sites
  • Microsoft SharePoint Services and Microsoft Dynamics CRM Services for business content, collaboration and rapid solution development in the cloud

One of the most significant advantages of Azure is going to be support for “ALL” languages. I don’t know when this is going to happen, but is something they are promising. Support for PHP, Rails etc is fundamental for the success of azure , because of the body of open source code already existing for these tech stacks. It will be quite simple to port the existing PHP application on-to azure when it is available.

One of the biggest problem with Appengine is, there are no “open source” code base which can be leveraged to build apps. Simple functionality, which is normally pulled off the web and implemented with a copy paste, has to be laboriously hand-coded for Appengine.

Just hope azure serves as wake up call for Appengine team and they speed up their releases.

AddThis Social Bookmark Button

Google Appengine Limitations and why startups should avoid it

October 6th, 2008 admin Posted in Appengine, google 3 Comments »

I built a couple of applications using Google Appengine. The google appengine was marketed as highly scalable with high-availability. Google Appengine looked like an ideal choice for “salary search” application that I was planning to build for a while. The data-set that I was looking at was more than 2 Million records to begin with and felt that Appengine could scale well. Two weeks after releasing it, I am frustrated with how little Google Appengine delivers on its promises.

  1. Abrupt time-out errors: This is the single biggest issue with Google Appengine. Nobody including Google has explained what causes this and how to avoid these abrupt “Google Appengine is Over Quota” errors. We are not talking “Digg” effect, but something like 3K page views and 100 users per hour. Google appengine couldn’t scale to handle that kind of load.
    • Not doing any writes to the datastore which are more expensive.
    • Every request is memcached
    • Not more than 100 entities are being retrieved per request.
  2. 1000 Entity retrieval limit: There is a limit on how many entities of a particular type can be retrieved. For example, when I try to list all the jobs offered by Infosys, the results cannot go beyond a count of 1000. There is a offset parameter but that offset works within this 1000 entities limit. So if the offset exceeds 1000, the query bombs.
  3. SDK is broken: The SDk and the production versions are out of synch. What works in the SDK does not necessarily works in PROD version.
  4. Indexes: Building Indexes takes more than 24 hours if it is successful.
  5. Updating Data is cumbersome: Once the data is uploaded, its very difficult to update that data. There are no “bulk” update operations that can be performed on the data. The only way I am aware of is creating a page and set it to “auto-refresh” using Meta-tag. The data operations functions are so rudimentary that no serious iterative development can be done on app-engine. Small changes to the data-model can take fore ever to implement.
  6. Documentation and Support: Documentation is limited. For support one has to be active on google appengine group. Though the engineers on the group try their best, the support needs are lot of improvement
  7. CRON/Batch jobs not supported: There is no support for CRON jobs or batch processing.

In conclusion, Appengine is Work in Progress and it has a long way to go before it can live up to its expectations. It is definitely not suited for iterative development, where companies want to release incremental functionality regularly. If the data-model is not well thought out , then Appengine framework will lock all your data or not scale at all, which defeats the purpose.

AddThis Social Bookmark Button

Resuming Google Bulk Data uploader for Appengine

October 2nd, 2008 admin Posted in Appengine 2 Comments »

Google Appengine does not provide a straight forward way to upload data to the data-store. They have one article which indicates a way to upload data to Google Appengine data store. Using this method, one can upload CSV file to your google Appengine APP by making repeated “HTTP” requests. While this work for smaller data-sets (1000) records or less, it fails for bigger data sets. I had a requirement to upload mover 2 million records for my salary search app and the program kept failing. Resuming a bulk upload, resets the counter and it start all the way from the beginning causing duplicate data.

I have modified the bulk upload program to provide do the following things

  1. Introduced a parameter called “skip” which will skip “N” number of records when the bulk data uploader resumes
  2. A lot of times, the bulk data uploader fails because of either time-out issues or network issues. Instead of aborting in such scenario, the program just goes to sleep for 1 min before resuming from where it left.
  3. This program writes the progress log to a text file. So in case of a failure, the log file contains the entry to where the program failed last time. You can skip that many records when resuming.
  4. Sample Call :

bulkload_client_resume.py –filename upload_me.csv –kind Visa  –url .appspot.com/load">.appspot.com/load">http://<your app>.appspot.com/load  –skip 1000

In order to resume the Bulk Data uploader, you have modify the following sources in your SDK

  • bulkload_client.py under Google\google_appengine\tools
    • Make a copy of the existing file and create a new file with the name bulkload_client_resume.py
    • Change following line of code in bulkload_client_resume.py

          BULKLOAD_CLIENT_PATH = ‘google/appengine/tools/bulkload_client_resume.py’

  • bulkload_client.py under the following folder Google\google_appengine\google\appengine\tools
    • Make a copy of existing file and create a new file with the name bulkload_client_resume.py
    • Modify ImportCSV function
    • Introduce a new parameter called skip, which will skip "X” number of records before resuming upload.

def ImportCSV(filename,
              post_url,
              cookie,
              batch_size,
              kind,
              skip,
              split_url=SplitURL,
              openfile=file,
              create_content_generator=ContentGenerator,
              post_entities=PostEntities):
  """Imports CSV data using a series of HTTP posts.

  Args:
    filename: File on disk containing CSV data.
    post_url: URL to post the Entity data to.
    cookie: Full cookie header to use while connecting.
    batch_size: Maximum number of Entity objects to post with each request.
    kind: Entity kind of the objects being posted.
    split_url, openfile, create_content_generator, post_entities: Used for
      dependency injection.

  Returns:
    True if all entities were imported successfully; False otherwise.
  """
  host_port, uri = split_url(post_url)
  csv_file = openfile(filename, ‘r’)
  retry=1
  i=0
  try:
    content_gen = create_content_generator(csv_file, batch_size)
    logging.info(‘Starting import; maximum %d entities per post’, batch_size)
    for num_entities, content in content_gen:
      retry=1

#logic to skip
      if i<int(skip):      
       logging.info(‘Skipping  entities in %d skip %d’, i, int(skip))      
       i=i+int(batch_size)
      else:
       logging.info(‘Importing %d entities in %d bytes current count %d’,
                   num_entities, len(content),i)
       LogMessage(‘Importing  entities in  bytes current count ‘+str(i) +’\n’)
       LogMessage(time.ctime(time.time()) +’ :Importing  entities in  bytes current count ‘+str(i) +’\n’)      
       while retry==1:
               try:
                 content = post_entities(host_port, uri, cookie, kind, content)
                 retry=0
                 i=i+int(batch_size)
               except PostError, e:
                 logging.error(‘An error occurred while importing: %s’, e)
                 logging.error(‘Going to sleep’)
                 LogMessage(time.ctime(time.time()) + ‘ ERROR Count:’+str(i))

                #Sleeps for 1 min before retrying the same record       
                 time.sleep(60)
                 logging.error(‘Retrying %s’)
                 retry=0
  finally:
    csv_file.close()
  return True

  • Introduce a new function, which will write a upload log to a log file.

def LogMessage(message):
try:
  logfile = open(‘bulkupload.log’, ‘a’)
  logfile.write(message)
except:
  logging.error(sys.exc_info()[0]);
  if logfile:
   logfile.close()

Feel free to drop a note if you have trouble getting this to work

AddThis Social Bookmark Button

Salary Search Engine using Google Appengine

October 2nd, 2008 admin Posted in Appengine 2 Comments »

Have created a new “salary search” application using Google’s Appengine. The salary search is based on data released by department of labor. The size of the data is large and I have upto 2 million job listings to search from. Considering the size of the data, Google Appengine sounded like a good Idea.

Some interesting searches from the search app listed below

Tech Companies:

  1. Google Engineer Salary
  2. Microsoft Engineer Salary
  3. Apple Exploratory Design Engineer

Consulting Companies:

  1. Bearing Point Salary
  2. KPMG
  3. Ernst & Young
AddThis Social Bookmark Button