Talk On Multi Stage Ranking


Posted in Uncategorized | Leave a comment

QUS : Query Understanding Service


The journey of a search query through e-commerce engineering stack can be broadly divided into following phases, search query text processing phase, retrieval phase where relevant products are fetched from indexer and the last but not the least, product re-ranking phase where a machine learning ranking engine re sorts the products primarily based on combination of KPIs like click through rate, add to cart rate, checkout rate etc. The focus of this post would be primarily on the first phase i.e. query text processing via a Query Understanding Service (QUS). I would be discussing the applications and working of QUS in e-commerce search. QUS is one of the most critical service needed to resolve user query and find the key search intent. Among the plethora of machine learning(ML) services working across the engineering stack in any e-commerce company, QUS is usually the first to hold the fort and acts as the backbone ML service in pre retrieval phase.

When a user enters a query in the search box, the first step is to ingest that raw text and generate some structured data from it. The objective here is to find as much relevant information from the query as possible and retrieve the most relevant results for the user.

The search query in e-commerce can contain many clues that can guide us in finding results pertinent to user’s intent. A query like “levi black jeans for men” consist of a un-normalized brand name “levi”, gender “men”, product type “jeans”, color “black” and top level taxonomy category of the query can be Clothing & Accessories. The aim of QUS is to find these attributes, normalize them (levi => Levi Strauss & Co, mens => Men etc.) and send this information to retrieval endpoint to fetch relevant results. QUS would comprises of an ensemble of sub services like query category classification service, query tagging service, attribute normalization service etc. In the case of long tail queries(queries that are not too common and results in either very limited products or null results) the output of QUS can be further used to relax by rewriting it, this process is known as query relaxation. Furthermore we can also expand the query (query expansion) where we can show user results which are near similar to search query e.g. if user search for “light blue tee”, we can expand the results set where color can be either light blue, blue or violet. Also in case of brand sensitive queries, the result can be expanded to near similar brands to provide user exposure to available alternatives in your inventory.

Common Issues

QUS will help move search beyond raw keyword match. In a world without QUS following issues can occur if we depend on pure SOLR retrieval

  1. Wrong product categorization : queries like “hershey cocoa powder” that belong to Grocery category can retrieve fashion products since it has the word “powder” in it.
  2. Retrieval Sensitivity :
    1. Queries like “office desk wooden” vs “office desk wood”, “halloween costumes for girls” vs “halloween costumes girls” can result in different results although they have the same search intent.

B. Solr Provides equal weightage to all the tokens. This may result in a situation like the following where the tokens “range” is getting equal weight as “eggs”. Hence the result set includes “range free chicken”.

3. No query relaxation: too narrow queries are mostly unresolved e.g. “fleece queen size sheets” can return blankets – query should be relaxed to “queen size sheets”

4. Price Agnostic Search: Another feature of QUS is be to extract price information from the query, this either can be done by using either a set of regular expressions or using tagger + normalizer.

Raw Query: black pants under $50

5. Unresolved Brand Variants: For queries like “coke six pack” vs “coca-cola six pack” results can be different.

Search Phases

We can divide the e-commerce search process in two phases

Post Retrieval :

This phase is concerned with the retrieved relevant results corresponding to the query. It is here where we rank the results, add recommendation, add sponsored products to the final list of results.

Pre Retrieval :

This is the phase where we haven’t yet retrieved results from backend indexing engine yet. The only information we got to deal with is the raw user query. Usually this phase comprises of following components (can be separate microservices)

Spell corrector

Intent Detection

Query Classifiers

Category Classifier: This service would classify the query into leaf level categories corresponding to the catalog structure/taxonomy. The output of this service would be a set of leaf level categories. Solr can either filter the results on the basis of predicted set of categories or boost the results for products belonging to predicted category set.

Product Type(PT) Classifier: Usually taxonomies are hierarchical in nature e.g. Under top level Clothing & Accessories category you would have Shirts at level 2 and Men/Women/Children at level 3, formal/casual at level 4 etc. Due to noise in catalog content and semantic issues in catalog structure (near similar leaf categories under different L1s e.g. tea can be in Grocery as well as Office > Pantry items) it is usually better to classify the query into flat hierarchy Product Types e.g. in the context of last query PT would be just Shirts, if query is iphone 9 charger, PT would be “Phone Chargers”

Query Tagger

Just like query category classifier and query product type classifier query tagger is another component of QUS but unlike them it works on tagging individual tokens in a query rather than categorizing the whole string. 

The aim here to successfully detect customers intent and improve retrieval by finding tokens in the query that contribute to key product being searched by customer, brand, gender, color, size, price range etc. This would help us in

  1. refining the retrieval in a faceted manner
  2. resolving long tail queries
  3. query rewriting and query relaxation
  4. product promotion / recommendation 
Architecture: Bidirectional LSTM-CRF Models for Sequence Tagging
Tutorial :

Tagger Attributes

Broadly speaking there are two types of attributes, namely global and local. Local attributes are highly specific to particular leaf categories e.g. attributes for Home>Furniture>Table would have attribute key/values like

Top MaterialEngineered Wood
MaterialParticle Board
Furniture FinishEspresso
Local Attributes

Since each leaf level category can have specific attributes that are not applicable to other categories, we can end up with a large number of local attribute key/value pairs. That’s why it is better to not to use Tagger to detect these attributes since we would face performance issues in scaling the tagger to these many attributes.

On the contrary there are other set of attributes that are global in nature i.e. they are not focused on any particular category e.g. size can be found in clothing, in furniture, appliances etc. Although the values of these attributes can be category specific e.g. size in clothing can take values like XS, S, M, L etc. while size in home appliances >Microwave category could have valid size values as Compact, Mid-Sized, Family Size etc. They are present across categories or at least in bunch of categories. It is better to use a tagger to detect these attributes. The table below comprises of what we call as global attributes.

Attribute Key DescriptionAttribute Value
BrandBrands are companies and their subsidiaries that offer productsony, philips, great value, coke etc.
Gendermen, boys, women, unisex, girl etc.
ColorSpecified colors, also includes materials that represent colors such as gold, silver,and bronze
CharacterCharacters are recognizable entities that exist in multiple Brands and Product LinesBatman, Golden State Warriors, Taylor Swift, UCLA Bruins etc.
PT Descriptor Features pertaining to the product as well as media titleswith shelves, led lightbulbs, round dining table
Product LineProduct lines from brandsplaystation, sonicare, air jordans etc.
MiscellaneousAll other relevant tokens including: themes, versions, years, and model numbersstar wars, 2013, paris, ucla, in-store, highest rated etc.
Price$, dollars, bucks, sale, clearance
Agea. Age Value – Numeric value for age (e.g. 8, 12)
b. Age Unit – Context for defining value (e.g. month, year)
c. Age Type – Qualitative measurement for age (e.g. baby, teenage, elder, young, jr)
Sizea. Size Value – Numeric value and word representation for sizing (e.g. 3, 120, double)
b. Size Unit – Context for defining size (e.g. oz, lb, gb)
c. Size Category – Grouping for size units (e.g. weight, volume, length, diameter)
d. Size Type – Qualitative measurement for size (e.g. small, medium, large, 4xl, mini, giant, tall, short, wide, slim)
Quantitya. Quantity Value – Numeric value for quantities
b. Quantity Unit – Context for defining quantity (e.g. piece, sheets)
c. Quantity Type – Qualitative measurement for quantity (e.g. value size, bulk, set)
Global Attributes

Attribute Normalization

Once the tagger detects a mention in query and tags individual tokens the next step involves normalizing these tokens. For attributes types like color, normalization is pretty straight forward e.g. {red, light red, pink, .. etc} can be mapped to one color family with normalized name RED, similarly for price too we can create a standardized denomination using a set of regular expressions. With normalization we are aiming to standardized the attribute key/value pair w.r.t the values in catalog. Here the prior requirement is that products in catalog would have canonicalized values for attributes e.g. all men shirts would have size attribute mapped to only a predefined values {XS, S, M, L, Xl, XXL ..}. Now once we detect size attribute in a query like “mens small checked shirt”, the next step is to normalize the size token “small” to normalized attribute value in catalog “S”. This would help us in either making a faceted query to SOLR or boost products in retrieval where size attribute is “S”, thereby enhancing the retrieval quality.

Numerical attributes like price, quantity (e.g. 1 gallon milk), Age (toys for 8-12 years old) can be handled with regular expression driven approach. Once we detect category and product type of a query, we can apply set of regular expressions applicable for only those categories and PTs to extract numerical attributes e.g. for query like “2 gallon whole milk”, the category can be “Grocery>Milk>Whole Milk” and PT can be Milk, once we know these values we can apply a set of regular expression created exclusively to handle the grocery/milk quantity/amount normalization. The following set of queries have price attribute values as 20 that can be easily extracted using a couple of regular expressions.

a. “tees under 20”

b. “tees under $20”

c. “tees under 20 dollars”

d. “tees under 20 usd”

Overall attribute normalization can be achieved using following approaches

  1. Regular Expressions based methods
  2. Rule based methods
    1. This is a simple yet very effective approach. Before jumping to nlp based methods a good way to get some quick wins and draw a performance baseline is to manually create rules for normalization.
    2. A sql like table can be created with each unnormalized attribute mapped to its normalized variant
    3. A simple lookup in the table can generate normalized attribute values.
  3. Classification Based Approach:
    1. The key drawback of the rule based approach is that it is not scalable. It would too much manual effort to analyze the queries and find varied patterns in them and create explicit rules to map them to normalized values.
    2. But the last approach would provide us a labeled data set to work with i.e. attribute values mapped to their canonicalized versions.
    3. The above data set can be used to create an attribute classifier. The difference between this and category classifier that I mentioned earlier is that here we would be classifier the tagged mention (e.g. Product Line) and in the earlier we were classifying the whole query string.
    4. Entity Linking based approach
      1. Entity linking is wholesome topic that I plan to write about in a separate post. But to provide gist of the idea Entity Linking is a process to detect mention strings in a text that may correspond to names entities (like tagging finds mentions and tags them to attribute keys like Brand, Size etc.) and then tries to map them to the entities in knowledge base. This method can be useful while trying to detect brands in query as well as in product title and description.
      2. Although there are neural architectures that can detect the mention string and then link the mention to the best candidate entity, in the next section we would discuss a much similar model based approach.

Entity Linking Based Brand Normalization

Let’s say we have a mention string tagged as Brand in the search query. The entity linking task can be broken into two steps: candidate generation and candidate ranking. Candidate generation means fetching normalized brand candidates that are syntactically or semantically similar to the mention string, e.g for search query “paw-patrol fire truck”, the tagger would generate mention for Brand as “paw-patrol” and the candidate generation phase can find a set of syntactically similar brands from catalog for category Toys. Traditionally an information retrieval based approach for candidate generation has been used like BM25, a variant of TF-IDF to measure similarity between mention string and candidate brands and their description. Once we have a set of candidates we can rank them in the Candidate ranking phase.

A context aware ranking can be done by using the left span of mention, mention string, right span of mention as separate inputs to a model. We can create a model to score the contextual similar of a mention string to description of a brand. For getting description of brands(and hence their representations we can either create a custom dataset or get brand pages from wikipedia and learn a representation for them using the title, introduction of the brand).

In Learning Dense Representations for Entity Retrieval authors uses model architecture similar to sentence similarities architecture to put the entity description and mention description representations near each other in the same domain. Furthermore this kind of approach is highly preferable since the brand representations can be pre computed and since in this architecture there is no direct interaction between the encoders on each side, we can just compute the contextually aware representation of the mention string and take a dot product between it and pre computed brand representations. This enables efficient retrieval, but constrains the set of allowable network structures. The image is from the mentioned publication depicting how the components from the query string and entity description can be used to find similarity between the two.

Brand Resolution In A Noisy Catalog

It may happen that brand names are not normalized in the catalog. In this case some brand e.g. Coca-Cola can be referred by different products in catalog using different variants e.g. coke, coca cola, coca-cola, coca + cola etc. Here we can’t normalize brand in query since brand names in catalog aren’t normalized. So instead of canonicalizing the brand in query we should aim to fetch products that refer to any variant of the searched brand.

Brand Normalization Process Flow

A simple yet handy way to canonicalize brand names in queries would involve following steps

  1. Parse the catalog to create a mapping of product type : list of available brand
    1. E.g. coffee maker : [black decker, black & decker, black and decker, black + decker, Mr. Coffee, Keurig,  …..]
  2. Use Product Type classifier to get the query PT
  3. Use query tagger to get the token/tokens tagged as B-BR and I-BR
  4. Now match the tagged tokens with list of brands corresponding to the predicted product type and use string similarity approaches to select candidate brands
    1. Trigram-Jaccard
    2. Levenshtein Edit Distance
    3. JaroWinkler
    4. FuzzyWuzzy


For a query like “coffee maker black n decker” the predicted Product Type can be “Coffee Maker” and mention string tagged as brand can be “black n decker”. A lookup in PT to brand list map can return list of valid brand variants in catalog for PT “Coffee Maker” as [black decker, black & decker, black and decker, black + decker, Mr. Coffee, Keurig, …]. Now by using edit distance of either 1 or 2 to we can find candidate brands from brand list mapped to product type coffee maker as [black decker, black & decker, black and decker, black + decker]. Later on Solr can boost all these brands while retrieving the results for this query. In this approach we don’t even needs brands to be normalized in the catalog since we can boost all variants in one go.

Connecting the Dots

Once QUS fetches all the key insights from the query, the results are forwarded to SOLR. The prerequisite is that catalog is indexed with separate dimensions for Category, Brand, product type etc. For retrieval we can use a weight based scheme where higher weight based boost is provided to predicted categories, product types, brands etc. For attributes like category and PT we can even make a faceted search call while adding boosts for predicted brand, product line, size etc. Furthermore we can also add a relaxed secondary query to the main query so that the recall can be high. This will help in resolving long tail queries and null result queries. The product ranker layer can take care ordering the relaxed query supplemental results w.r.t to products returned from main query. More advanced techniques like creating a separate optimization service to predicts weights for attributes to be boosted based on user query can further enhance the relevance of returned results e.g. for clothing query the SOLR weight prediction algorithm can provide more weights to brand and price rather then style/pattern.

Posted in Uncategorized | Leave a comment

Demand Forecasting – Online Advertisements

Objective: Given an ad campaign’s targeting constraints, ad slots, start date and end date we want to forecast the available inventory for that line item. If the ad slots in question are being targeted by other active campaigns during the concerned time period then the forecasting system should project demand taking that latter account.


To effectively forecast the demand we can use a two phase approach defined below

1. Forecast overall inventory from a particular ad slot
2. Project the proportion of the forecasted overall inventory available for the new campaign given some active campaigns targeting the ad slot.

  • Trend Detection : In this step we will try to analyze the global trend e.g. are the number of requests linearly increasing, if so at what rate.
  • Seasonal Behavior: This include weekly and monthly fluctuations in inventory particularly if we have an e-commerce publishers. This step will involve tapping the monthly seasonal fluctuations and include the details in the overall projected inventory.
  • Noise Removal: Here we will try to smooth out the series via removing the inherent noise in the data.
  • Impulse Handling: This consists of unprecedented and sometimes unaccounted sudden rise in inventory which if not handled properly will lead to over projection of inventory. Sometimes publishers introduce new features( e.g.some production related issue can cause unserved requests) or due to some unpredictable event (in case of news site) huge peaks (impulses) are often observed in the page views time series, our aim is to somehow detect that the impulses are not long term trend and ignore them while projecting available inventory.

To accomplish the above mentioned goals, the research phase will involve but not restricted to testing the performance of time series algorithms like

  • Holt-Winters
  • Kalman Filters (State Space Models)
  • Elastic Smooth Season Filtering
  • Discrete Fourier Transforms based forecasting techniques
  • regression models for global trend detection etc
  • Some heuristic based techniques (e.g. moving average) to handle impulse and noise

Experiments and Observations

I did some descriptive analysis on impressions delivery plots of Ad Id  X from publisher XX on the data we fetched from Google DFP API. Interestingly just by visual inference we could infer weekly pattern in data. To find weekly pattern (weekly seasonal behavior) theoretically month is taken as one period in time series and to find monthly data pattern (monthly season behavior) year is considered as one period unit. By analyzing yearly data we found global trend i.e. if overall impressions rate to a site is increasing or not. But since we only have data from July 2013 to August 2014 we cannot infer anything statistically relevant about the monthly season behavior and trend, but we can clearly see weekly seasonal behavior.

2014 monthly impressions plot – Jan(01) to Oct(10)

Screen Shot 2018-02-20 at 3.05.14 AM

Initial Experimentation

We trained the model, tested it and tuned it using XX(publisher) DFP impression data. Once done we compared it with the daily forecasted impression values with actual delivered impressions. We will be updating the model parameters on daily basis so as to effectively incorporate new insights from data.

For creating initial baseline model for creating benchmark performance statistics we zeroed in on a variation of Holt-Winter’s forecasting algorithm with the additional feature of multi season forecasting i.e. along with daily trends within a week, e.g. learning that more impressions are received in weekends compared to weekdays, it will also adjust according to monthly seasonality. Since we didn’t have even 2 years of data, the algorithm would not be that efficient while forecasting accurately when it comes to dealing with monthly variations (though it will do far better than naive approaches e.g. Google DFP’s approach of using just last 28 days data without considering seasonal trends) but I hope it will at least learn good amount from weekly data and distinguish patterns from weekday and weekends impression delivery.


Overall Features of the algorithm

  • Trend Detection : Finding if overall impression delivery is increasing over time or decreasing
  • Monthly Seasonal Decomposition : Find seasonal traffic behavior e.g. if traffic is much more during year end use this information for better forecast during following year’s end.
  • Weekly Seasonal Component : This will involve learning from the fluctuating daily traffic behavior and learning on which particular days of week more traffic is received.

Experiment 1.1 : Holt Winters Additive – 12*7 Seasons : In this prototype we tried to learn global trend and seasonal patterns (pattern for each day of week day, Monday to Sunday for each month, so in total 12*7 distinct seasons). Since overall we didn’t have much data points to train for 84 distinct seasons ( only 4 data points for each season) the results have significant variance but we are still capping overall seasonal trends, day wise and monthly shifts and overall increasing global trend.

The plot below shows the predicted(one day ahead) and real impression. The training data used belongs from 07 July 2013 to 15 Oct 2014.


Screen Shot 2018-02-20 at 3.18.27 AM

Experiment 1.2 : Holt Winters Additive – 12 Month Seasons

*In this approach we smooth out the weekly fluctuations by taking centered moving average with period 7.

In this manner we removed the noise in data due to weekend and now we can concentrate on learning from the monthly behavior. The RMSE of test data reduced significantly with this approach though the only caveat is we will have huge errors if we want to predict #impressions for one particular day. Though the algorithm will perform good if we want overall #impressions for a time period.

Screen Shot 2018-02-20 at 3.20.04 AM

test data = overall impressions severed from October 17,2014 to October 27, 2014 = 346246

smoothed value of test data = 372105

Screen Shot 2018-02-20 at 3.21.29 AMScreen Shot 2018-02-20 at 3.21.54 AM


COVARIATE SHIFT: If the seasonal behavior of impression corresponding to some ad-unit changes totally then the algorithm is having problem adjusting to this abrupt change

Screen Shot 2018-02-20 at 3.24.13 AM.png



Screen Shot 2018-02-20 at 3.24.45 AM.png

Screen Shot 2018-02-20 at 3.25.08 AM.png

Screen Shot 2018-02-20 at 3.25.35 AM.png

Screen Shot 2018-02-20 at 3.25.57 AM

Screen Shot 2018-02-20 at 3.26.30 AM

Screen Shot 2018-02-20 at 3.27.07 AM

Screen Shot 2018-02-20 at 3.27.40 AM

Screen Shot 2018-02-20 at 3.29.45 AM.png

Screen Shot 2018-02-20 at 3.30.24 AM.pngScreen Shot 2018-02-20 at 3.30.47 AM

Results of modeling level of series

Screen Shot 2018-02-20 at 3.31.52 AM.png

Screen Shot 2018-02-20 at 3.32.14 AM.png

Screen Shot 2018-02-20 at 3.32.54 AM.png

predicted value. I will try to work around to find some sub optimal value and use it to draw CIs.

Generalized Additive Model (GAM)

Extracts from “Forecasting at Scale”, Sean J. Taylor et. al. 

Screen Shot 2018-02-21 at 12.43.48 AM

The above mentioned paper is a really good study on how to model a time series as a GAM with change point detection to tackle covariate shift.

It further explores following notions while modeling Trend

  1. A saturating non linear growth function
  2. Linear trend with change points  
  3. Automatic Changepoint Selection 

For modeling Seasonal component the paper uses Fourier series to provide a flexible model of periodic effects. 

Screen Shot 2018-02-21 at 12.57.55 AM

The third model is for Holidays and Events to account for calendar effect(spikes and drops). 

The above methodology has been implemented in Facebook’s Prophet library. In the next section I try to show it’s use on a toy data set.

Data Set: 

Screen Shot 2018-02-21 at 1.02.52 AM

Advantage of Facebook Prophet 

  1. The formulation is flexible: we can easily accommodate seasonality with multiple periods and different assumptions about trends.
  2. Unlike with ARIMA models, the time series measurements need not have a regular period and we do not need to interpolate missing values to fit.
  3. Automatic Bayesian change point detection
  4. We don’t need to explicitly split time series in level and seasonality components in a linear fashion
  5. Models Level/Trend via a Non-linear growth function
  6. Periodic Seasonality modelled via Fourier Series
  7. Holiday Modeling: Incorporates list of holidays into the model in a straightforward way assuming that the effects of holidays are independent.
  8. Can incorporate change in seasonality behavior e.g. if we used to monitor peeks on Monday but this behavior changes the Prophet can detect change point and incorporate the effects in seasonality component.

Fitting the model and forecasting 

Screen Shot 2018-02-21 at 1.07.13 AM

Extracting Components:

Screen Shot 2018-02-21 at 1.08.25 AM.png

The following attached pdf link (generated from ipython notebook) showcases comprehensive analysis of the toy time series used above. In the attached doc I explore various ways of modeling components of a time series and highlights their corresponding pros and cons.

Timeseries Analysis IPython Notebook(pdf)

Posted in Uncategorized | Leave a comment

CTR Prediction System – Design Doc

The Anatomy Of Large Scale CTR* Prediction System

* With little or no modifications the proposed system design and algorithms can be used for optimizing other metrics like Cost Per Viewable Completion, Cost Per Completion, Cost Per Engagement, Cost etc

Aim: To build a scalable distributed machine learning framework based on stacked classifier comprising Gradient Boosting Machines and Logistic Regression trained with Online Stochastic Gradient Descent. This system is specifically designed to achieve high performance for display advertising campaigns with a varied range of targeting criteria. The final aim of the prediction system is not only restricted to achieve high performance but also extends to ease of deployment, frequent model updates in online fashion, highly scalability and distributed learning.

A Glimpse Into The Modeling Aspects:

Click and conversion prediction for display advertising presents a different set of challenges.

  1. Sparse Features : Categorical features like domain, publisher, advertiser ID, creative ID, cookie ID etc when converted to binary representation with One Hot Encoding leads to very sparse features.
  2. Biased Training Data: Normally the number of instances with click are much less (less than 0.1%) compared to negative training instances. This leads to a biased class problem which further complicates the training of a conventional classifier.
  3.  Feature Generation: Usually we will generate different permutations of features e.g. quadratic features and combination of features to help the classifier generate a non linear decision boundary. But though it increases the performance of the system it leads to explosion in dimensionality of the features and makes it harder to train the model.
  4. Tuning Log Linear Model: L1 vs L2 Regularization ( conventional wisdom suggests that go with L1 since it preserves sparsity)
  5. Tuning Stochastic Gradient Descent: Finding optimal rates for weight update
  6. Subsampling: Finding optimal sampling rate for the the negative class to get a balanced data set
  7. Periodic updates to model: The Logistic Regression model will be trained using online stochastic gradient descent. This will make training faster and scalable in a distributed manner. Though we will need to periodically train the model after few hours to incorporate new data points.
  8. Gradient Boosting Machine: I will be training GBMs separately to generate features that will be consumed by the log linear model. Since training GBMs take comparably much more time (I will have to test the training time vs (#trees and training data size) for GBMS using spark cluster) we will have to schedule a daily job to update the GBM.
  9. Dealing with new ads/creatives: We will have to generate a separate generalized model to handle new advertisers/creatives. Though the performance of this model will not be up to the benchmark but it will still help us to exploit historical patterns and collect better training data for the new ad campaign.
  10. Generating Historical Features: We will be adding historical performance of advertiser and user as additional features to our feature vector.
  11. Mobile Advertisement: Whether a separate model for mobile/app based ads since the feature set will involve many new elements e.g. device id, device type, app
  12. Feature Hashing: Reducing dimensionality of generated feature vector
  13. Explore and Exploit: How to sometimes show wrong ads to wrong users with the sole intention of exploring new users, finding new patterns and discovering covariate shift. If we only shows ads according to our system then next day we will get a highly biased training sample.

Advance Non Linear Approaches 

I will avoid using fancy ensemble techniques (the winners of Avazu and Criteo CTR prediction challenge used 20 model ensemble and other many more hacks) and algorithms like Field- aware Factorization Machines that have shown to perform a bit better (increase the performance metric score by fourth decimal place) in Kaggle competitions but they lead to grave overfitting and are not at all scalable in production. We need a mechanism that can be easily trained on a cluster and can be efficiently updated within few hours. After going through more than a dozen research papers and other literature I narrowed down the final model to use Online gradient descent based log linear model with feature generation via gradient boosted machines. With appropriate computing power (Spark cluster ) we should be able to train the model every few hours hence exploring new data. Many people have used Vowpal Wabbit to train log linear models in Kaggle competitions. Though it takes comparatively much less time when using Stochastic Gradient Descent but it took them days in generating basic features and other data munging stuff. Hence I plan to go with Spark which claims to provide lightening fast computation due to its in memory computation framework.

One interesting approach mentioned in ad click prediction literature to solve the problem of exploration is Bayesian Logistic Regression. The problem of exploration can be explained using the following example, imagine that we trained our CTR prediction model using 2 days of training data. On the third we used the trained model to only severe ads to request that have high predicted CTR. Now the third day training data will only contain training instances that had high predicted accuracy, i.e. we will be getting a narrow/biased training sample. If new segments requests were encountered or the segment behavior changed our model will not be containing necessary training data to adapt to the mentioned change. So we need a procedure to gradually explore along with exploiting high performing segments.

When it comes to “Explore And Exploit” paradigm I have a strong proclivity towards Bayesian methods. Bayesian Logistic Regression takes prior distribution on the weights of Logistic Regression and tries to gradually update the weights using the incoming data stream. This method is high suited to online stream learning. Finally we draw a sample (using Thompson Sampling) from the weight’s posterior distributions and use these weights to predict the performance of a segment. I have used Bayesian Bandit based Thompson Sampling techniques to design a RTB optimal bid price detection algorithm and hence I am handsomely introduced to the underlined statistics.

Graepel et. al[3] gives a detailed account of how this method is nearly equivalent to Logistic Regression trained with Stochastic Gradient Descent.

  •  With Gaussian Prior on weights Bayesian Logistic Regression is equivalent to Logistic Regression trained with Stochastic Gradient Descent with L2 regularization.
  • With Laplacian Prior on weights Bayesian Logistic Regression is equivalent to Logistic Regression trained with Stochastic Gradient Descent with L1 regularization.

In the initial version I will not be using Bayesian Logistic Regression(BLR). Instead I will be going with Logistic Regression(LR) trained with Stochastic/Online Gradient Descent(SGD). Research has shown that the LR trained with SGD inherently leads to exploration since the gradient deviates from finding the optimal wights instantaneously (convergence time is more), also due to anomalies in data the gradient deviates much often which in turn makes this algorithm inherently suited to exploration. So sometimes we will predict sub optimally i.e. after hourly updates our algorithm will predict accurate ctr for some segments but this can act as blessing in disguise since it will let us explore more unseen segments which will be part of training data set in next training iteration.
Research[1,3,4] has shown that these algorithms performs nearly on same scale. But in later versions I will definitely love to try BLR as it has more robust update rule.

Model Training And Tuning Procedure
This will be the most critical and time consuming step. Once done we will be just hard coding the various tuned model parameters and generated features in the model generation job. I have few ideas to generate more useful features like forming clusters of existing users w.r.t their behavior and forming clusters of different advertiser ids, and including respective cluster ids in the feature vector. This will act as a hierarchical feature based on historical behavioral data and may lead to increased performance. But all the experimentation will be constrained depending on the project timeline.
For modeling I will be following the main findings and steps defined in the four landmark papers on CTR prediction by Facebook, Google, Microsoft and Criteo together with the results and findings from the two Kaggle competitions on CTR prediction by Criteo and Avazu

1. Practical Lessons from Predicting Clicks on Ads at Facebook , Xinran et. al., ADKDD’14.
2. Ad Click Prediction: a View from the Trenches , McMahan et. al., KDD’13.
3. Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in
Microsoft’s Bing Search Engine, Graepel et. al. Appearing in Proceedings of the 27th
International Conference on Machine Learning
4. Simple and scalable response prediction for display advertising, Chapel et. al., YYYY ACM

7. with-less-then-200mb-of-memory/53674

Main Modeling Steps:

  1. Feature Generation: Try quadratic, cubic and different combination of features. Check the performance and use the best combination. The continuous features can be binned and we can treat the bin index as a feature.
  2. Historical Features: Generate the following historical features to see if they improve performance
    a. Counting Features:
    i. Device_IP count
    ii. device id count
    iii. hourlyusercount
    iv. user count
    v. hourly impression count
    vi. #impressions to the user in a given day
    vii. #impressionstouserperpuborappid
    viii. time interval since last visit
    ix. #impressions to user per pub or app id in last hour
    x. features that occurred less than 10 times will be converted to “rare” category xi.
    b. Bag Features
    i. user, pubid or app id, bag of app ids
    c. Click History:
    i. #clicks (over all) by a user
    ii. #clicks (over all) per pub/app id
  3.  Feature Hashing:
    The issue with the dummy coding presented above is that the dimensionality d can get very large when there are variables of high cardinality. The idea is to use a hash function to reduce the number of values a feature can take. We still make use of the dummy coding described in the previous section, but instead of a c-dimensional code, we end up with a d-dimensional one, where d is the number of bins used with hashing. If d < c, this results in a compressed representation. In that case, collisions are bound to occur, but as explained later, this is not a major concern.
    When dealing with several features, there are two possible strategies:
    (1) Hash each feature f into a df -dimensional space and concatenate the codes, resulting in df dimensions.
    (2) Hash all features into the same space; a different hash function is used for each feature.
    We use the latter approach as it is easier to implement. The tuning procedure will consist of finding the optimal value of the size of reduced dimensions df.
  4. Tuning Gradient Boosting Machines : This step involves train the data set on GBM and generating features from them. We treat each individual tree as a categorical feature that takes as value the index of the leaf as instance end up falling in. The final representation of the input feature vector will be in binary coded format.                 Tuning GBMs:                                                                                                                          •  shrinkage • number of trees • interaction depth Our goal is to find optimal value of all these parameters via using k fold cross validation.

5. Learning Rate Parameters of Stochastic Gradient Descent:

Per-coordinate learning rate has been shown to achieve the best accuracy. The learning rate for feature i at iteration t is set to

per feature learning lare

per feature learning rate

6. Learning weights and regularization parameter of Logistic Regression using cross entropy cost function
* To Do: I have to study and test Follow The Leader Regularization — It has been recently published by Google[2] and is acclaimed to be performing a bit better than L1 regularization on sparse data.
7. Finding optimal negative sampling rate: Since we have a class imbalance problem we need to find a subsampling rate parameter to sample from the negative (no click) instances. We need to experiment with different negative down sampling rate to test the prediction accuracy of the learned model. We can vary the rate in {0.1, 0.01, 0.001, 0.0001} can check the effect on final cross entropy achieved.

Infrastructure And Data Pipeline
I will have to discuss more about this section with you in person. So putting this section in a very broad way(rough sketch), we will be needing following batch jobs
1. Hourly SPARK job to process the RTB logs (read logs from HDFS and create training segments.
2. Hourly SPARK job Preprocessing, features and training data generation job (one hot encoded vectors).
3. Daily SPARK job to generate Gradient Boosted Machines Model (if training on spark cluster is not computationally expensive may be we can train this model much more often)
4. Hourly SPARK job to generate Logistic Regression Model.
5. Job to deploy the updated model into production
6. Daily job to generate a generalized model to handle new advertisers/creatives.

Posted in ctr, machine learning, online ads, Uncategorized | 3 Comments

Of Bandits And Bidding

Real-time bidding(RTB) refers to the buying and selling of online ad impressions through real-time auctions that occur in the time it takes a webpage to load. Those auctions are often facilitated by ad exchanges or supply side platforms(SSPs).  A RTB agent has a set of active ad campaigns each with its own targeting criteria. So when an ad request is broadcasted by an ad ex or SSP, each RTB agent has to take two important decisions, first is given the segment (a segment is just the unique set of features of the incoming request e.g. geo location, content category, ad slot size, ad slot position etc) which campaign to serve from list of active campaigns and the second one is, how much to bid at.

The first problem is dealt by smooth pacing algorithms. They involve allocating/spreading out the daily budget of a campaign throughout (if possible uniformly) the 24 hours time line. The main aim of pacing algorithm is to allocate the budget of a campaign  to different time periods in such a way that the campaign is able to exhaust the daily budget as well as impressions delivery is not skewed or biased. In this article I will not be discussing the details of pacing algorithm, rather I will dig into the intricacies of bidding algorithm, in particular how Bayesian Bandits can be used to determine optimal bidding price in real time.

Under The Analytical Hood 

We are starting with the assumption that given a segment S (impression request), N RTBs are actively bidding on it. We are unaware of the number N and this number is a dynamic parameter. It may happen that the campaign running on one of the competitor RTB targeting the concerned segment ends, and we are left with N-1 bidder, on the other hand new bidders can join the bidding process thereby increasing the number of active bidders.

Furthermore an impression has relative significance for each RTB agent. Let’s say some RTB has an active campaign that is only targeting impressions belonging to one particular segment and this campaign has high CPM and huge daily budget. Now this RTB will bid higher compared to rest of the bidders when an impression belonging to that particular segment is received.

Another interesting aspect that makes this problem more challenging is that if an RTB lost the bid, it will never get to know the auction winning price.

  • The number of players in the game are dynamically changing
  • The bid process response is either 1 (you won the bid) or 0 (you lost the bid)
  • At some particular bid price for a segment we have an associated win rate distribution.
  • The payoffs matrix is unknown i.e. we don’t know how important that segment is to the competing RTB.

We are further making an assumption that given an impression request a Pacing Algorithm has already selected a campaign and the win rate to achieve. There will be a specific win rate associated for each campaign and segment pair. Since each segment has different available inventory we can distribute our budget accordingly. So even though in next thirty minutes we are expected to receive 10000 impressions belonging to a particular customer segment we needn’t win them all. Based on all active campaigns targeting them, their respective budgets and availability of all other segments that can be targeted too our pacing algorithm outputs a win rate value. Now we will try to analytically formulate a model for the bid price prediction problem. I will try to be as much comprehensive as possible for the benefit of the uninitiated reader

Symbols used and assumptions made : 

  • We denote the parameters of the ith competing bidder for segment s by Bidder(i,s).
  • Each Bidder(i,s) is assumed to draw it’s bid price at some time ‘t’ from a Gaussian Distribution with mean=μ(i,s) and some constant standard deviation = σ
  • We assume the standard deviation, σ is very small, but over time a bidder can change  mean parameter of bid price.
  • From Game Theoretic standpoint this is a Game of incomplete information i.e. each bidder will only get to know the optimal win price only if he wins the bid and will never know what other bidders are bidding. Every player will have different set of information regarding the payoffs.
  • The Payoffs are continuously changing i.e. the winning rate of bids on a constant bid amount (a fixed strategy) is changing.
  • Each bidder i, bids on a proportion of incoming request from SSP for a particular segment s, therefore even if some bidder is bidding way less he/she will win some times since it may happen that other RTBs who usually bid higher for segment s haven’t bid on some particular requests. Lets  denote the normalized bidding rate of a player i on segment s  as pi,s . So we can also say that pi,s  represents the probability of bidding for an RTB  i on segment s.
  • Since a bidder can increase or decrease its pacing rate depending time of day and remaining budget of the campaign pi,s   is a dynamic parameter. 

Win Price Distribution 

The Win Price distribution at time t will be a Gaussian Mixture Model with probability density function(pdf) as

pdf(x) = p1,s   * N(x|mean=μ(1,s), sd= σ) +  p2,s   * N(x|mean=μ(2,s), sd= σ)  + …………………….+ pn,s   * N(x|mean=μ(n,s), sd= σ

here n refers to total number of bidder and x is the bid price.The above distribution will give us the probability density of the bid price. 

So given a bid price x the above pdf will give us probability density of winning the bid. 

To solve the above problem i.e. to derive the estimated values of all the distribution parameters pi,s  and mean=μ(i,s)t , sd= σ  we need to generate a sample (points) from the distribution and then calculate the parameters using Expectation Maximization. In the figure below the distribution is sum of two Gaussian Distribution. We can imagine that there are two bidders, each with bid price derived from one Gaussian distribution and each bids only on the proportion of requests. So the one with higher bid price always win on the bids made by it while the other won only winds a proportion of bids i.e. when the earlier bidder doesn’t bid. So the overall distribution of winning price will look like as follows

img1RTB1 = Green

RTB2 = Red

RTB1 has higher mean bidding price, around $18 CPM but has lower variance, whereas RTB2 has lower mean bidding price of approximately $11 CPM but has higher variance. Now given a bid price we can get an estimate of winning rate over a long run.

*So sometimes RTB2  wins since it bids more than RTB1 while other times it wins because RTB1 didn’t bid on those requests. 


Now to calculate all the known parameters and draw this distribution we will need to get a sample of data points from this distribution, in our case these are the winning bid price over a significant interval of time.

  • In our case we never get to know the winning bid price if we lose the bid. Even if we win the bid many exchanges doesn’t reveal the second highest winning price since they don’t use Vickery auctions.
  • To use EM (Expectation Maximization) we should know how many RTBs are taking part in the auction, but we have on way of knowing the number ‘n’ used in the pdf.
  • Another important issue is the temporal aspect of the distribution. It may happen that RTB1 or RTB2 stops bidding on the segment (may be their respective campaign targeting the segment finishes) or they increase the bidding rate, resulting a major change in the shape of the distribution. This kind of behavior is much expected in RTB environment.
  • So basically we can’t use the approach of analytically solving the Gaussian Mixture Model using  Expectation Maximization.

Reinforcement Learning : Explore And Exploit 

To solve the above defined mixture model we will be using a variant of a technique known as reinforcement learning that works via allocating some fixed amount of resources to exploration of the optimal solution and spends the rest of the resources on the best option. After spending due time on various explore and exploit techniques like Multi Arm Bandit, Bayesian Bandits  etc I settled on using Boltzmann Exploration / Simulated Annealing technique.


Bidding Algorithm : Bayesian Bandits

We will be using Bayesian Bandit based approach  using Thompson sampling for selecting optimal bandit [5]. Though bayesian bandits are mostly used in A/B testing and ad selection based on CTR, according to our research their usage suits the requirements and dynamics of RTB environment. In [6] the authors compares the performance of Bayesian Bandit techniques with other bandit algorithms like UCB, epsilon greedy etc with the aim of selecting best ad to deliver to maximize CTR. All the referred researches show that Bayesian Bandit based method outperforms other bandit techniques w.r.t. total regret, proportion of best armed played, and convergence time and readily converges to best choice/action in the explore-exploit dilemma.

Please go through the references 3, 5 and 6 for getting comprehensive look into bandit based techniques. Henceforth I will be explaining the application of Bayesian Bandit method to RTB bidding skipping the underlining theoretical details.

As described in earlier section (titled win price distribution) that the distribution of win rate and bidding price can be a mixture distribution with latent parameters. Since we will never know the winning price of a lost bid we can’t solve this optimization problem in more of an closed form analytical way. Therefore we will try to discover the optimal bid price to get the desired win rate by exploring the bid price space.

Let’s say a campaign has cpm of $5. We will divide the bid price space into different bins as follows [$1, $1.5, $2, $2.5, $3, $3.5, $4, $4.5]. Now we will explore over these bins to discover the corresponding win rate.

Experiment Scenario :  

Scenario 1:

RTBS: We have 3 hypothetical RTBS bidding on a particular segment, each generating bid price via a Normal Distribution with the following bid parameters

RTB1: mean bid price = $3, standard deviation = $0.1, proportion of bids placed on incoming requests = 70% i.e. on an average this rtb bids on 70% of all requests sent by the exchange

RTB2: mean bid price = $4, standard deviation = $0.1, proportion of bids placed on incoming requests = 50%

RTB3: mean bid price = $5, standard deviation = $0.1, proportion of bids placed on incoming requests = 30%

Target Win Rate required by our test campaign = 40%

CPM = $5

Given a list of bid price choices from [$1, $1.5, $2, $2.5, $3, $3.5, $4, $4.5]  bins the aim of the algorithm is to find the bin where the win rate is closest to 40%.

Examples : 

  • Now if we bid at $3.5 cpm then the probability of winning will be = (probability that RTB3  doesn’t bid on that request) * (probability that RTB2 doesn’t bid at that request)
    = (1-0.3) *(1-0.5) = 0.35
    i.e. if we bid $3.5 we will win 35% of the time.
  • On the other hand if we bid $4 cpm then out probability of winning is = (probability that RTB3  doesn’t bid on that request) * (probability that RTB2 bid less than $4 at that request) + (probability that RTB3  doesn’t bid on that request) * (probability that RTB2 doesn’t bid at that request)
    = (1-0.3) * 0.5 * 0.5 + (1-0.3) *(1-0.5)
    = 0.175 + 0.35
    i.e. If we bid at $4 we will be winning 52.5% of the time.

Since the required win rate is 40% and bid price of $3.5 is giving us the win rate closest to the target win rate the algorithm should win most of the bids at $3.5

Experiment Results: 

I did a simulation of the above defined scenario for 400 trials. At each request the algorithm selects a bid bin/bucket based on its assessment of win rate of that bin. Closer the win rate of the bin is to target win rate more bids will be made at that price. After each bid the algorithm will update the winning probability. After some trials the algorithm will get more and more confident about its assessment of the win rate of each bid price and will finally select the best bid price for all further bids.

scenario1.1Figure 1.1

scenario1.2Figure 1.2

scenario1.3Figure 1.3

Figure 1.1 shows that for some initial runs (around 100 bids) the algorithm had enough information to make intelligent guesses about winning rate of each bid price. Once the exploration phase is over the algorithm was selecting $3.5 cpm as the bid price over and over. Figure 1.3 depicts that out of 400 bids around 225 bids were made at $3.5. The interesting aspect of the Bayesian bandit is that once the algorithm has enough confidence about the estimates of winning probability for each bid price, it was selecting $3.5 over and over (after bid #150, bid $3.5 was selected almost every time).

Issues with the Algorithm: 

At each bid the algorithms updates the Posterior Distribution of the win probability of the corresponding bid price. As more and more bids take place and the algorithm’s Beta Posterior estimates converges, the variance of the posterior will be low and accurate predictions of the win rate will be made. Since the bidding environment is dynamic i.e. a new rtb can enter or an old rtb can leave the bidding process thereby changing the winning probability at some particular bid price, our algorithm will not be able to adopt to this change. The reason being that our posterior variance is too low and we have grown pretty much confident in our estimates.

Scenario 2: A new RTB enters the bidding process after 300 trials. By this time our algorithm had drawn win estimate for each bid price and has grown pretty confident about them but this new RTB will change the real win rate at $3.5 cpm bid. Now this bid will not be optimal choice for us.

RTB4: mean bid price = $3.5, standard deviation = $0.01, proportion of bids placed on incoming requests = 90%

Now if we bid at $3.5 our win probability will be = (1-0.3) * (1-0.5)  (1- 0.9) = 0.034, i.e. 3.4%


Figure 2.1


Figure 2.2


Figure 2.3

Figure 2.3 shows that as soon as RTB 4 enters the bidding process we stopped winning at $3.5 price. But since the algorithm will try to keep on believing that win rate is optimal for this bid price (think of it in terms of inertia or momentum) it will keep on bidding at $3.5. But with time it will lose the inertia and started selecting $4 cpm (after iteration 400). But overall this algorithm takes too much time to adapt to the new scenario.

Restless Bandit And Variance Inflation 

To tackle the issue of dynamic environment described in the last section, I experimented with a technique called variance inflation. As we saw in last experiment that the Bayesian Bandit technique was able to continuously pick the best bid price for an ad request just after around 100 overall bids. As we select the optimal bid price more often the variance of posterior beta will decrease. This means that our algorithm is learning continuously and becoming more and more confident about its choice. But as soon as the environment changed e.g. a new RTB bidding more than our optimal bidding price entered the bidding environment we will start losing all the bids. But it took us around 200 lost bids to learn that the current bid price is stale and that we should select some other bid price. In a way after losing hundreds of bid our algorithm lost confidence (the distribution changed) in the bid price it thought was optimal.

To tackle  this problem I started increasing the variance of win rate distribution associated with each bid price by 10% as soon as we have made 50 bids on that price. This means that as soon as our algorithm starts selecting one particular bid price more often for an segment we will decrease its confidence by 10 % and ask it to explore other bid price. If the environment doesn’t change then the exploration will result in wastage of resources since we will bid either higher price or lower price then the optimal price, but if the environment changes i.e. lets say some new rtb enters or leave the bidding environment or some existing rtb starts bidding higher or lower, our algorithm will grasp this change.

Simulation Scenario:

I made the appropriate implementation changes and created similar scenario as defined in last section. We have 3 RTBs bidding for a segment  with following configuration

RTB1: mean bid price = $3, standard deviation = $0.1, proportion of bids placed on incoming requests = 70% i.e. on an average this rtb bids on 70% of all requests sent by the exchange

RTB2: mean bid price = $4, standard deviation = $0.1, proportion of bids placed on incoming requests = 50%

RTB3: mean bid price = $5, standard deviation = $0.1, proportion of bids placed on incoming requests = 30%

Target Win Rate required by our test campaign = 40%

CPM = $5

Given a list of bid price choices from [$1, $1.5, $2, $2.5, $3, $3.5, $4, $4.5]  bins the aim of the algorithm is to find the bin where the win rate is closest to 40%.

Objective :  The bidding algorithm will find the optimal bid price i.e. $3.5 after some trials. From this point onwards our algorithm will bid on most of the requests at $3.5. But after 300 bids a new RTB, RTB 4 will enter the bidding domain. It will start bidding little more i.e. $3.6 than us. Now we will see compared to results posted in last section how fast will our new algorithm will adapt to this change and relearn the optimal bidding price.



Figure 3.1

Figure 3.1 shows the over all bids made on X axis and #bids made for each bid price on Y axis. We can see that before bid #300 the optimal bid price was $3.5 and it was being used most of the time. At trial #300 a new RTB enters the bidding environment. From the above plot we can see that it took the algorithm only 20-30 trials to find that optimal bid price has changed from $3.5 to $4. Henceforth $4 bid price was selected most often.


Figure 3.2


Figure 3.3

Figure 3.3 shows that the win rate for bid price $3.5 becomes nearly zero after trial #300 since RTB 4 was bidding at a higher price.


  1. Bid Landscape Forecasting in Online Ad Exchange Marketplace
  2. Real Time Bid Optimization with Smooth Budget Delivery in Online Advertising
  6. Learning Techniques In Multi Armed Bandits 
Posted in Uncategorized | 4 Comments

Proposal : An Integrated Zero Knowledge Authentication and Public key verification protocol using Gap Diffie Hellman Groups to counter MITM attack

Introduction : The basic idea behind the proposal is to integrate authentication and key distribution protocol. Immense work has been done to design a secure key distribution protocol starting with Needham Schroeder and evolving to OAKELEY, SKEME, IKE and ultimately to Diffie Hellman Protocol and PKI. Though much work has been done in the field of Data Integrity, Non Repudiation and Secrecy, password based authentication is still the Achilles heel of cryptography.  In the presence of Byzantine Adversaries like Man In The Middle(MITM) even a combination of multi factor authentication and PKI fails.

The Gap-DH problem
Okamoto and Pointcheval formalized the gap between inverting and decisional
problem . In particular, they applied it to the Diffie-Hellman problems:
– The Inverting Diffie-Hellman Problem (C-DH) (a.k.a. the Computational
Diffie-Hellman Problem): given a triple of G elements (g, g^a, g^b), find the
element C = g^ab.
The Decision Diffie-Hellman Problem (D-DH): given a quadruple of G elements
(g, g^a, g^b, g^c), decide whether c = ab mod q or not.
The Gap Diffie-Hellman Problem (G-DH): given a triple (g, g^a, g^b), find the
element C = g^ab with the help of a Decision Diffie-Hellman Oracle (which
answers whether a given quadruple is a Diffie-Hellman quadruple or not).

Bilinear maps and pairings 

The bilinear maps used in cryptography are the Weil and Tate pairings on some
elliptic curves.

Let G and G1 be two groups of order some large prime q, denoted
multiplicatively. An admissible bilinear map is a map e : G × G ->G1 that is:
– bilinear: e(ga, hb) = e(g, h)ab for all g, h 2 G and all a, b  belongs to Z,
– non-degenerate: for g and h two generators of G, we have e(g, h) =/= 1,
– computable: there exists an efficient algorithm to compute e(g, h) for any
g, h belongs to G.
In the case of the Weil and Tate pairings, G is a subgroup of the additive group
of points of an elliptic curve E/Fp, and G1 is the multiplicative group of the
extension field Fp2 . However, in order to remain close to the identification scheme
of Schnorr, we choose to keep the multiplicative notation both for G and G1.

Proposed Protocol 

From now on, G is a group in which the Gap Diffie-Hellman problem is intractable.
We assume the existence of an admissible linear map e : G×G -> G1
and G and G1 are of order q and we denote by g a generator of G.
The prover holds public parameters

g, g^a, g^n, e(g, g), v = e(g, g)^a

and a secret S = g^a. Here public key is g^n, where n is the corresponding private key. The value v is given in the public parameters in order to withdraw its computation by the verifier. The scheme we propose is a zero-knowledge proof of knowledge of the
value g^a obtained by iterating l times the algorithm. Along with the proof of knowledge the public key of the Prover is embedded into the proof so that while verifying the secret’s knowledge, the verifier will also enter the public key of prover as a parameter and verify the authenticity of the secret as well as of the key. So from now onwards, Man In The Middle cannot just forward the prover’s proof and forward his public key to establish a secure channel. MITM will be unable to separate the key part from the proof of secret provided and thus will be incapable of carrying out attacks like active phishing where he first forwards his public key to server and then user’s credentials. By using the proposed protocol server will detect that the credentials forwarded does not belong to the particular public key supplied earlier to establish the secure channel.

Step 1: Here Prover choose a random number in the prescribed range and computer W ( a mapping on G1).  Then we introduce the public key in the picture and computer mapping of public key using the random number r. ‘r’ is only known to the prover and is created for obfuscation. The result V is sent to the verifier.

Step 2: The Verifier computes another random number c and sends it to Prover.

Step 3: The Prover uses c to computer a product Y on G, which comprises of secret, unknown r and public key computed r-1 times.

Step 4. The Verifier multiplies the resultant Y with the known public key of prover and computes its mapping. If the mapping is equivalent to the product of mapping received in step 1 and mapping of secret c times then the proof is accepted.

To prove the security of the scheme under the G-DH problem, we follow the
outline of general zero-knowledge proofs. Namely, we prove the completeness and the soundness of the scheme. Finally we prove the protocol is zero-knowledge.
During the proof, the prover and the verifier are modeled by Probabilistic
Polynomial time Turing Machines (PPTM).


If the legitimate prover and the verifier both follow the scheme, then it always
succeeds. Indeed, at each iteration, the probability of success is equal to 1 since:

e(g, Z ) = e(g, g^r × g^ac × g^nr) = e(g, g)r+ac+nr = e(g, g)^r × e(g, g)^ac ×  e(g, g) ^nr=

{e(g, g) ^nr ×  e(g, g)^r} × e(g, g)^ac = V ×

Zero Knowledge

The present identification scheme satisfies the zero-knowledge property. To simulate
in polynomial time (in |I|) the communications between a real prover and
a (not necessarily honest) verifier, we use the following algorithm M :
For the simulation of one round, M randomly picks c in [[0, 2k[[, randomly
picks Y in G and computes W = e(g, Y ) × v−1. M then sends W to the verifier
which answers ˜c. If c = ˜c then the triple (W, c, Y ) is kept, otherwise M computes method.
In average, the equality c = ˜c holds after 2^k tests. To obtain, l triples, M
constructs l×2^k triples. If l×2^k is polynomial in |I|, then we obtain a polynomial
time algorithm.


(under work )

Posted in Uncategorized | Tagged , , | 1 Comment


Download paper :


In 1976 Diffie and Hellman[7] introduced the concept of Public key cryptography. They revolutionized the world of cryptography by developing key exchange system popularly known as Diffie Hellman Key Exchange which introduces applicability of discrete log in cryptography .The purpose of the algorithm is to enable two users to securely exchange a key. It depends for its effectiveness on the difficulty of computing discrete logarithms. The discrete logarithm problem employed was defined explicitly as the problem of finding logarithms with respect to a generator in the multiplicative group of the integers modulo a prime. But this idea can be extended to other groups. Elliptic curves were introduced in cryptography in 1985 independently by Koblitz[5] and Miller[6], who proposed to use Elliptic curve as groups over which all calculations are performed. This generated more complex discrete log problems .

Other public key cryptographic systems like RSA relies on the difficulty of integers factorization. In [1] the authors discusses the issues with RSA .The 1024 bit keys may be breakable in near future. The key length for secure RSA use has increased over years due to the development of fast computing processors which provide aid for brute force computation. Now larger key length has put heavier processing load on applications using RSA.[1] also discusses the advantages of ECC over RSA . There is a sub exponential attack already developed for RSA whereas ECC has attacks developed only for few classes of the curve.

Jurisic and Menezes[2] make a comparison based study between RSA and ECC. They compared the security achieved by both methods verses the key length used .The results showed that ECC outperforms RSA .[1]Table 1 shows the comparison between RSA and ECC methods on the basis of key size used and complexity achieved.

RSA                                                ECC                                    MIPS years

key size(bits)                                   key size

512                                               106                                              10^4

768                                               132                                               10^8

1024                                             160                                               10^11

2048                                            210                                                 10^20

Here key sizes are in bits and MIPS year represents computation power of a computer executing a million instructions per second,when used for one year.

[3] describes that the advantages that elliptic curve systems have over systems based on the multiplicative group of a finite field (and also over systems based on the intractability of integer factorization) is the absence of a sub exponential-time algorithm (such as those of “index-calculus” type) that could find discrete logs in these groups ECC compared to RSA offers equal security for smaller key sizes. The result is smaller key sizes, bandwidth savings, and faster implementations, features which are especially attractive for security applications where computational power and integrated circuit space is limited, such as smart cards, PC (personal computer) cards, and wireless devices.[4] also discusses various motivating factors which describes the advantages of ECC over other cryptosystems.


Diffie and Hellman [7] used the discrete logarithm problem in their key exchange protocol .They defines the problem as finding logarithms with respect to a generator in the multiplicative group of the integers modulo a prime. But the problem of finding discrete logarithms can be extended to other groups ,especially when it is used over elliptic curves defined as curves its computational complexity increases many folds .[3] explains he discrete logarithm problem as, let G be a finite group of order n, and let α be an element of G. The discrete logarithm problem for G is the following: given an element β ∈ G, find an integer x, 0 ≤ x ≤ n − 1, such that α x = β,.Various groups have been proposed over the years for cryptographic purposes

like Agnew et al purposed multiplicative groups of characteristic two finite field .Koblitz[5] and Miller[6] used the group of points on an elliptic curve defined over a finite field.


An elliptic curve used for cryptographic purposes is defined as follows:

y ^2 mod p = (x^ 3 + ax + b) mod p -(1)

where a and b are integer constants. The set of points E (a, b) is a set ( x, y ) of all x and y satisfying the above equation.

For an elliptic curve over a finite field Z p , we use the above cubic equation in which the variables and coefficients all take on values in the set of integers from 0 and p − 1 for some prime p,in which calculations are performed modulo p .

Assume first that Fq has characteristic greater than 3. An elliptic curve E over Fq is the set of all solutions (x, y) ∈ Fq × Fq to an equation

y 2 = x 3 + ax + b,

where a, b ∈ Fq and 4a + 27b = 0, together with a special point ∞ called the point at infinity.

Addition Formulas for the Curve (1). Let P = (x1 , y1 ) ∈ E; then −P = (x1 , −y1 ). If Q = (x 2 , y2 ) ∈ E, Q = −P, then P + Q = (x3 , y3 ), where

x3 = λ^2 − x1 − x2

y3 = λ(x1 − x3 ) − y1 ,


λ=  y2-y1 /  x2-x1   if P!=Q

       3×1^2+a / 2y1   if P=Q



Let us assume that both the user of the system agreed on a generating point G on the curve .Both will select their private keys and keep is as secret.Let the private key of user A is nA and that of user B is nB. Their respective public keys are pA and pB will be calculated in the following manner



Now we will use MAP2 Group method to map a message to a point on the elliptic group .Now message

m is converted to point P. Now let a secret parameter k. User A calculates k*pB and k*G(where G is the generating point).


The adversary can access G and pB as they are in public domain. Given G and and kG is computational hard to kind k.

To decrypt it User B will multiply his private key with the hint provided as kG and subtract it from the second argument.


The total time taken is

1. Time taken to map message to group Map2Group algorithm maps message M to Fp^m as (x,y).Its most expensive operation is Square root operation. Time needed is O(m^3). If m=1 ie Fp is the desired field as in our case, then the time taken is O(1).

  1. Time taken to add two points over the elliptic curve is also a constant ie O(1).
  2. Time taken in finding the cipher text is O(k),where k is the security parameter .So total time taken is O(k)+Θ(1)

Security of ECC

The basis for the security of elliptic curve cryptosystems such as the ECDSA is the apparent intractability of the following elliptic curve discrete logarithm problem (ECDLP): given an elliptic curve E defined over Fq , a point P ∈ E(Fq ) of order n, and a point Q ∈ E(Fq ), determine the integer l, 0 ≤ l ≤ n − 1, such that Q = l P, provided that such an integer

exists.[3] have made a good discussion on various kind of possible methods developed to solve the ECDLP .Pohlig–Hellman[9] developed a algorithm that reduces the determination of l to the determination of l modulo each of the prime factors of n .Pollard ρ-method[10] is one of the best known algorithm developed to counter the ECDLP .Gallant, Lambert and Vanstone [11] developed made the previous method more efficient which takes about √( π n)/2 elliptic curve additions.

Semaev[12]–Smart[13]–Satoh–Araki[14] designed which efficiently computes isomorphism between E(F p ), where E is a prime-field-anomalous curve, and the additive group of F p . This gives a polynomial-time algorithm for the ECDLP in E(F p ) .Menezes, Okamoto and Vanstone [15] developed the famous MOV attack ,it uses the concept of weil pairing to transform the elliptic curve group to a multiplicative group,defined over field Fq k for some integer k .Due to this transformation the Elliptic curve discrete logarithm problem got reduced to finding discrete logarithm problem (DLP) in Fq k. In [16] Miller discussed the implementation of index calculus method in elliptic curve groups.

Proposed method to Increase the complexity of elliptic curve crypto system

In this paper we are proposing a new idea of enhancing the security of elliptic curve crypto system by producing two discrete log problems,each one of them has to be independently solved. In ECC the curve on which computation is to be performed is in public domain since its beneficial to keep the algorithm and components in public domain to avoid overhead .After doing some computation on this base curve ie the curve available in the public domain we are rotating it along its axis to some appropriate value. Now this curve is hidden from the adversary. To know this curve curve the adversary has to solve a Elliptic Curve Discrete Logarithm Problem(ECDLP) .After rotating this curve we map the public keys and generating point(G) from base curve to the rotated curve. This can be performed by developing a mapping function,that performs a one-one mapping between the two curves. Now all the desired group arithmetic as performed conventionally is performed on the rotated curve. The user receiving the encrypted message has to use his/her private key with a given hint provided along with encrypted message in the cipher text to find the position of the curve .

We are increasing the security by producing two discrete log problems using keys of same size as used earlier .Thereby enhancing the security level .Now the adversary using any of the above explained sub exponential time algorithm,have to apply them to two separate ECDLP problems. In the next section we are explaining the proposed technique .


An elliptic curve E over Fq is the set of all solutions (x, y) ∈ Fq × Fq to an equation

y 2 = x 3 + ax + b, (1)

where a, b ∈ Fq and 4a + 27b = 0, together with a special point ∞ called the point at infinity. Let there be two users Alice and Bob having private key and public key as (nA,pA) and (nB,pB) respectively.Suppose that E is an elliptic curve over Fq , and G(generating point) is an agreed upon (and publicly known) point on the curve .


Alice will take a secret number k1 and multiply the generating point G with it as per the group laws defined above.


Since Q lies on the curve let its coordinates be (x’,y’ ).




Rotate the axis of the curve by θ .


Alice will map public keys pA ,pB and generating point G to the rotated elliptic curve,using the mapping function F . Let us assume that the mapped images are pA’,pB’and G’.

Step 5

Now Alice will select another secret parameter k2 and compute k2*G’ and k2*pB’. Here * depicts group multiplication operation.

Step 6

Alice will map the plain text to Pm,using the Map2Group algorithm .

Cipher text generation

Now Alice sends cipher text Pm to Bob as

Cipher text Cm={k1G, k2G’,Pm+k2*pB’}

Here k1 is the parameter which Alice choses for rotation of the curve ,G is the generating point to be used on the initially given curve E1 to find E2,k2 is the number of times the addition is performed,G’ is the mapped generating point ,Pm is the plain text.


Bob will multiply his private key nB with the first hint to discover the new curve


From this data Bob will calculate the angle by which curve is rotated .Now all the calculations will be performed on the rotated curve. Bob will multiply his private key with the second argument ie k2*G’ and subtract the result form the third argument of cipher text.

{Pm+k2*pB’} -{k2G’*nB}={Pm+k2*pB’} -{k2*(G’*nB)}={Pm+k2*pB’}-{k2*pB’}=Pm


The two curves obtained will be isomorphic images of each other





αG1=α ø(G1)=αG2

Development of a one-one function

Let F be a one to one function mapping points form E1 to E2 ie F:E1->E2. Let there be a point (x,y) on E1.After the rotating the curve by θ we obtained the curve E2.Let there be another point having abscissa x’ and ordinate y’ on E2,which is the image of the point (x,y) on E1. F will find a mapping scheme between the two groups as

x’=x cosθ – y sin θ

y’=x sin θ +y cos θ

Enhancement in Security

Now the adversary has to solve two elliptic curve discrete log problems instead of one .First it is necessary for the adversary to find the rotated curve then only he can perform the desired computation on it. We are enhancing the security without increasing the key length .We can make the analysis of the security verses key length. If we keep the key length same,the security is increased two folds eg the index calculus algorithm takes sub exponential running time

exp((c + o(1))(log q k )1/3 (log log q k )2/3 )

here we assume that the elliptic curve is defined over the field Fqk

Now we have to apply the index calculus algorithm twice to solve the two ECDLP. In this way security will be increased by twofolds.

Majority of the algorithms solving the ECDLP exploit the field properties and size on which the elliptic curve is defined .As an example, we will show the security level reached by incorporating the above proposed method in comparison to the conventionally used ECC algorithm .One of the best known algorithm to solve ECDLP is Pollard p method[10]. Let us assume that over curve E is defined over the field F2^k. We will depict the field size parameter 2^k by n. Pollard p method will take ( π n)/2 steps,here a step represents elliptic curve addition. Let us denote the computing power of the computers solving the EDCLP by Pollard p algorithm in MIPS ie million instructions per second. Here MIPS year defines the computational power of a computer executing a million instruction per second utilized for one year.

Size of n ( π n)/2 MIPS year MIPS year (when proposed method is used)

(in bits) 160 2^80 8.5 * 10^11 1.7 * 10^12

186 2^93 7 * 10^15 1.4 * 10^16

234 2^117 1.2 *10^23 2.4* 10^23

Since our proposed method generates two separate ECDLP problems , the Pollard p method will have to be applied to both the problems independently.

Implementation efficiency

ECC has major application in wireless communication. The wireless devices run on battery power,so the limited energy provided by battery acts as a constraint on the processing capabilities .If we use methods like RSA ,yet they provide high level of security but they will consume the power of the wireless devices

at both ends as both encryption and decryption processes will take enough time and energy . ECC overtook RSA in this respect ,as we can use keys of lower size with respect to RSA,and still provide same level of security. ECC takes less time to implement as arithmetic performed ie point addition and point doubling over the elliptic curve are relatively fast when compared to arithmetic performed over other fields.

[3] made a comparison between the time taken by ECC arithmetic performed over by elliptic curve defined over field Fq,whose order is 160 bits with DSA arithmetic performed over field Fp where calculations are performed with a 1024 bit modulus p .Since they both provide same level of security we are comparing the time taken to implement them. Multiplying two n bit numbers takes n^2 bit operation ,so modular multiplication is (1024/160)^2 ≈ 41 times longer than a field multiplication.

  1. To calculate kP ,where P∈ E(Fq ) and k is a 160 bit integer, we have to do repeated doubling and addition which on average requires 160 elliptic curve doubling and 80 elliptic curve additions

.The time taken to perform Elliptic curve addition or doubling requires one field inversion and two field multiplication[17].Also [18] and [19] showed that one field inversion is equivalent to three field multiplications .[3] showed that computing k P requires the equivalent of 1200 field multiplications, or 1200/41 ≈ 29 1024-bit modular multiplications.

  1. To calculate a^k mod p ,where k is 160 bit random number and p is 1024 bit prime (as perfomed in DSA) by repeated squaring and multiplying requires an average of 240 1024-bit modular multiplications

[3]. It proves that arithmetic operations performed on elliptic curve defined over Fq are nearly 8 times faster then operations performed in case 2.

Let us assume a random number k of 32 bits .As explained earlier in cipher text,


In this mechanism we have to make three calculations

1.multiplication of G with k

2.multiplication of pB with k. addition operation

Both 1 and 2 , will require 240 field multiplications. So in total we have 480 field multiplications. As explained earlier one addition is equivalent to five field multiplications. So the total time taken in producing the cipher text is equivalent of doing 485 field multiplications.

Our proposed method may seem to be more time consuming at first look , but if we select the numbers k1 and k2 prudently then we can perform the above calculation in the same time .In our method cipher text,

Cm={k1G, k2G’,Pm+k2*pB’}

Let us assume that k1 and k2 are 32 bit and 16 bit numbers respectively .Now , the four operations performed are

1.multiplication of k1 with G.

2.multiplication of k2 with G’.

3.multiplication of k2 with pB’

4.addition between Pm and k2*pB’

Operation 1 is equivalent of doing 240 field multiplications. While operations 2 and 3 are equivalent of doing 120 field multiplications each. The fourth addition operation is equivalent of performing one field inversion and three field multiplication , which sums up to five field multiplications .In this way the total time taken is equivalent to 485 field multiplications ,which is exactly equal to the earlier case.

Thus our proposed method takes the same time in implementation as taken by previous method .We can still use it in wireless communication .On the other hand it provides more security than the conventional elliptic curve cryptographic system.


In this paper we have introduced a technique called Mutarotation and presented the algorithm to implement it. We also made a comparison on the basis of security achieved and implementation time between our proposed method and previously incorporated technique. We have proved that our proposed technique is more secure then the previously used method and it takes same time in implementation. This method can provide more security to wireless communication and can be implemented in same time as previous algorithm.


1.Elliptic Curve Cryptography by Vivek Kapoor and Vivek Sonny Abraham Department of Computer Engineering Delhi college of Engineering .

2.Elliptic curves and cryptography by Aleksandar Jurisic and Alfred J. Menezes . .Dr. Dobb’s Journal, 1997.

3.The State of Elliptic Curve Cryptography Neal koblitz Dept. of Mathematics, Box 354350, University of Washington, Seattle, WA 98195, USA.,ALFRED MENEZES Dept. of C&O, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1. SCOTT VANSTONE Dept. of C&O, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1.)

4.New Vistas in elliptic curve cryptography Brian A. LaMacchia, John L. Manferdelli Microsoft Corporation, One Microsoft Way, Redmond, WA, USA

5.N. Koblitz, Elliptic curve cryptosystems, Mathematics of Computation, Vol. 48 (1987) pp. 203–209.

6.V. Miller, Uses of elliptic curves in cryptography, Advances in Cryptology—CRYPTO ’85, Lecture Notes in Computer Science, Springer-Verlag, 218 (1986) pp. 417–426.

7.W. Diffie and M. Hellman, New directions in cryptography, IEEE Transactions on Information Theory, Vol. 22 (1976) pp. 644–654.

8. G. Agnew, R. Mullin, I. Onyszchuk and S. Vanstone, An implementation for a fast public-key cryptosystem,

Journal of Cryptology, Vol. 3 (1991) pp. 63–79.

9. . Pohlig and M. Hellman, An improved algorithm for computing logarithms over G F( p) and its crypto-

graphic significance, IEEE Transactions on Information Theory, Vol. 24 (1978) pp. 106–110.

10. J. Pollard, Monte Carlo methods for index computation mod p, Mathematics of Computation, Vol. 32 (1978)

pp. 918–924.

11. R. Gallant, R. Lambert and S. Vanstone, Improving the parallelized Pollard lambda search on binary anoma-

lous curves, to appear in Mathematics of Computation.

12.I. Semaev, Evaluation of discrete logarithms in a group of p-torsion points of an elliptic curve in characteristic

p, Mathematics of Computation, Vol. 67 (1998) pp. 353–356.

13.N. Smart, The discrete logarithm problem on elliptic curves of trace one, to appear in Journal of Cryptology.

14.T. Satoh and K. Araki, Fermat quotients and the polynomial time discrete log algorithm for anomalous elliptic

curves, Commentarii Mathematici Universitatis Sancti Pauli, Vol. 47 (1998) pp. 81–92.

15.A. Menezes, T. Okamoto and S. Vanstone, Reducing elliptic curve logarithms to logarithms in a finite field,

IEEE Transactions on Information Theory, Vol. 39 (1993) pp. 1639–1646.

  1. Miller, Uses of elliptic curves in cryptography, Advances in Cryptology—CRYPTO ’85, Lecture Notes in Computer Science, Springer-Verlag, 218 (1986) pp. 417–426.

17.G. Agnew, R. Mullin, I. Onyszchuk and S. Vanstone, An implementation for a fast public-key cryptosystem ,

Journal of Cryptology.

18.R. Schroeppel, H. Orman, S. O’Malley and O. Spatscheck, Fast key exchange with elliptic curve systems,

Advances in Cryptology—CRYPTO ’95


19.E. De Win, A. Bosselaers, S. Vandenberghe, P. De Gersem and J. Vandewalle, A fast software implementation

for arithmetic operations in G F(2n ), Advances in Cryptology—ASIACRYPT ’96

Posted in Uncategorized | Leave a comment

Patent application 2015/MUM/2011 : Securing Transactions Using Secure Multi-factor Authentication And Synchronized Symmetric Key Generation

Patent documents are enlisted below :

  1. IP Docket dated 18 July 2011
  2. PCT – Introduction and Filing
  3. PCT- List of Member Countries
  4. Prov_IN_Drawings
  5. Prov_IN_Filing_receipt
  6. Prov_IN_Specification

Security system for secure authentication, key generation, key distribution and session maintenance to prevent active/passive phishing, man in the middle, cross side scripting and other network based as well as browser-based attacks  on-line banking transactions.

Problem Definition: The text-book definition of security involves three aspects, authentication, secure transmission and data integrity.  Conventionally much stress was laid on the last two factors, and enormous amount of work has been done concerning encryption schemes, hashing etc. With role of internet becoming more and more profound in social and economic life of common man, authentication has become equally important of an issue.
During the last decade, online financial frauds most importantly Phishing sites have become much of a concern for online banking, credit card transaction  and other online financial services. With the advent of sophisticated tools for phishing, data sniffing, man in the middle attacks, it has become possible to impersonate as a legitimate user and perform online financial frauds. Last few years have seen a steep rise in phishing attacks aimed at collecting user sensitive information, majority of which were targeted on online banking customers. Banks and their customers are losing lots of money in these fraudulent practices.
Research Data:
The recent attacks on Facebook, Twitter, PayPal and various other banks, resulting in the leak of client sensitive information, and shows the level of complexity and sophistication involving their execution. Monthly report (May 09) of Symantec Security Response Anti-Fraud Team, shows that nearly 79% of online attacks were targeted on banks. According to, there were 3.6 million adults who lost nearly 3.2 billion US dollars in the time period between September 1, 2006 and August 31, 2007. The Guardian shows that online banking losses totaled almost £60m in 2009 compared to £52.5m in 2008 and £23.2m in 2005. Gartner Group estimates that data theft through phishing activities costs US banks and credit card issuers an estimated $2.8 billion annually. Mcafee’s 2009 Q3 report shows that number of malware detected in that particular quarter has reached nearly 5,000,000. Symantec Securities 2010 Q3 report highlights that 82% of reported phishing attacks were targeted on financial sector.

1. a. The number of “phishing” attacks in UK, where fraudsters lead customers to fake bank websites via an email that purports to come from their bank, increased by 16% from 2008 to 51,000. This compares to just 1,700 such attacks five years ago. As a result, online banking losses totaled almost £60m in 2009 compared to £52.5m in 2008 and £23.2m in 2005.

b. Phone banking losses, which were recorded for the first time in 2009, totaled £12.1m, with most losses involving customers being duped into disclosing security details through cold calling.

2.     Theft via online banking increased by 14 per cent to £60 million ($95 million) while plastic card losses amounted to £440 million ($700 million). (UK National Fraud Authority)
3.     Institute/Guardian Analytics study finds 40 percent of small and medium businesses change banks after a fraud incident.
4.       Annual Phishing Related Losses Estimated to be as High as $9.4M per Million Customers.
The above reports justifies the gravity of  situation concerning the rapid increase in phishing attacks directed specifically on financial institutions and payment portals like pay pal etc.

Currently employed authentication schemes for online banking
Various methods have been proposed to deal with the issue of phishing and ensuring secure authentication. Some have a two-page login in which the second page shows a picture and phrase selected by the user, which a fake site wouldn’t have. Some show a picture of a keyboard, on which one “type” a second password with a mouse, to defeat intrusions that record the stream of keystrokes. Some show the keyboard, but just ask for three randomly chosen characters, to defeat attacks that steal credentials and reuse it to set up another session later. But none is safe against Man in the middle attacks like active phishing, Trojans and malware which store the username password or other secret the user is having, and use it later. Even authenticating user with cell phone has its drawbacks. With phones becoming more and more computer like the risk of malware and Trojans is increasing.
The only tried and trusted option left is Keyfob (random number tokenizer) provided by RSA, VeriSign etc. It’s been widely used by various firms to log in their network and by other corporate users.
In 2007 a highly sophisticated phishing attack was launched on ABN Amro. Though ABN Amro uses the key fob multifactor authentication mechanism, the attackers were able to install malware on client side and proceed with what is now known as “In Session – Phishing “attack.  Various security researchers have shown that it is nearly impossible to stop that kind of attack using the key fob.

1.     ICICI bank uses static numbers on debit cards back as a second factor for authentication.
2.     After user login into his account, HDFC bank shows a picture to the user, this picture is previously selected by user while creating online account. Now this ensures the legitimacy of the site since only hdfc bank’s server knows the client credential. But this picture is a static credential, also its probable that user  will most select picture from first page, and via social engineering we can sort out that from the few pics on first page of selections users mainly selects pics from a much smaller set of famous icons that can be remembered later like Taj Mahal.
3.     CITI, SBI etc sends text messages to user containing One time password(OTP) that’s used as a second factor to authenticate.

Concerns regarding current authentication procedures
1.      Although banks have started using multi factor authentication to authenticate user there is no way to authenticate whether a website is legitimate or phished or fake website.
2.      Authentication methods currently employed for online banking schemes either requires static credentials like username/password or one time password texted on cell phone as a second factor which has its own vulnerabilities. No current method except RSA key fob, that in turn too expensive to be issued to all banking customers, provides a mechanism to deliver one time password safely at client’s end.
3.      Even if OTP is used as a second factor all these authentication methods fail when under active phishing or man in the middle attacks.
4.      Digital certificate forgery leads to identity impersonation by forged site and users provides his credentials to phished website.
5.      No existing mechanism provides a mechanism to safely produce and deliver client credentials to legitimate bank’s server and provide dual authentication i.e. bank’s server authenticates user and user authenticates bank.
6.      There is no current technology that provides a full proof session security, saving client credentials from phished site, man in the middle attacks or when the network is compromised.
7.      There’s a sudden rise in various tools used for network sniffing, ARP poisoning, keys and digital certificate generation, and automated tool kits that makes it pretty easily to perform sophisticated attacks like man in the middle and active phishing.
8. Recently there were many malware/Trojan based attacks on mobile banking apps. Once the user log in via multifactor authentication mechanism and a secure channel is established, the malware inserts its own transaction packets in between the ongoing stream. Now instead of one, two transactions takes place. Instead of having secure multi factor authentication and ssl channel for data transfer mobile apps are prone to malware and Trojan based attacks that exploit the system level security.

Security weakness in existing authentication and session security techniques
All current existing authentication schemes are concentrating on providing the user with multi factor authentication schemes. Like OTP with key fobs and OTP sent to cell phones. But the adversary is focussing his efforts on hacking the channel and then the session via forging digital certificates, or via phishing. In the case of simple phishing, currently there is no scheme (except HDFC’s) where the user can authenticate the bank’s site i.e whether the website is legitimate or not. When the user logins in HDFC’s online banking portal, he is shown a picture that he selected when he registered for online banking. So HDFC responds to user’s password via a static picture. Now the problem concerning it is its static nature.  Even bank of America uses the same scheme. Maximum users don’t even know the use of this picture. By default this picture is taj mahal in many cases. Even by using social engineering, we can crack this scheme, since most users will only select picture available on the first page. The main concern is that even to get to the page showing the pic the client has to disclose his credentials, thereby becoming vulnerable to phishing. In case of active phishing the man in the middle with forward the client’s forwarded credentials to legitimate bank’s server and will forward the response to client’s browser, making him believe that he is connected to legitimate site.
State Bank of India sends a sms containing OTP to user, which has to be entered within 3 minutes otherwise the session expires. There has been a lot of criticism of this approach. Various security experts have spoken in detail that the main purpose of cell phone networks is forwarding the calls.SMS can be logged for 8 hrs. If the person shifts to different circle (the scope of a tower), he is not registered in the new circle until unless he makes a call, receives a call or restarts his cell phone, during this time all sms are logged in the previous tower infrastructure. The cell phone network is designed to be call centric, so it will not waste resources to locate the cell phone just for sms delivery once it has moved out of current circle. We did a little survey to collect response of SBI’s net banking customers. Many complained about late delivery of sms that led to expiry of involved session, resulting in loss of customer’s faith in the efficiency of online banking service.
Malware and Trojans installed on mobile or PC can exploit the system level vulnerabilities and can perform illegitimate financial transactions even though user is login with RSA secure ID as multi factor authentication mechanism and using SSL layer for network security.
Attacks like XSS – cross site scripting are exploiting vulnerabilities in browser and web applications. They steal session credentials like session ID , keys and cookies and establish another parallel session with the same session credentials.

Abstract: A pseudo random number generator, third party key distribution protocol, random number generators synchronization algorithm, multi session data delivery, mechanism to protect from malware/Trojan based attacks on installed banking app and two ways one time password authentication scheme are disclosed. The Pseudo random number generator is based on multiple curves Elliptic curve cryptography.  The authentication scheme is based on challenge – response strategy, in which client and server sends a challenge to each other. These fields are passed to authentication and key generation that authenticates the two ends and generates and forwards secure session credentials to end systems. The underlined  authentication, key distribution scheme and multi session communication mechanism helps to detect forged sites, phishing activities and most importantly man in the middle attack, thereby providing a secure multi factor authentication that not only authenticates the legitimacy of user but also provides safe session establishment and session security from cookie and session credential thefts like session fixation.
The whole system provides mechanism for authentication and securing online banking from phishing, man in the middle and other financial frauds. Once installed on client end (personal computer as well as mobile), it provides a two way secure authentication mechanism i.e. the client application authenticates server (bank’s server) and (bank’s) server authenticates client with one time password functionality. Before performing authentication, a handshake protocol generates and performs secure symmetric key exchange between client application and server via a third party authentication and key distribution server. Once a secure session is established and both ends are authenticated, client and server perform second level of handshake.  They synchronize their PRNG and thereby generate one time padding keys for further communication.

Summary of the invention:
1. The invention addresses the limitations of existing authentication mechanisms to provide security from phishing and man in the middle attack, and using pseudo random number generator for one time padding scheme to provide secure data transfer by symmetric key encryption bypassing the vulnerabilities of PKI like digital certificate forgery.
2. The present invention provides a secure mechanism to authenticate users in an unsecure and untrustworthy environment via a unique multi factor authentication algorithm. It addresses the limitation of other multi factor authentication protocols that are unable to deal with active phishing and man in the middle attack. The protocol not only checks for the legitimacy of the user, but also the authenticity of the hosted website, in a way providing two way authentication.
3. Along with authentication the invention also provides a secure mechanism for key distribution via a third party server that not only authenticates both communicating ends i.e. user and bank’s server but also generates and securely delivers session key to both ends.
4. The algorithm does not use conventional Public key infrastructure to exchange key for symmetric key encryption scheme, on the contrary it uses third party server for session key distribution in a way that provides security from external, internal, classic replays and interleaving attacks thereby satisfying definition of probable security.
5. The protocol also provides “forward secrecy” i.e. even if long time key is compromised somehow in future yet previously communicated data and keys will not be recovered.
6. The efficiency of key distribution protocol is much more than Deffie Hellam key exchange protocol as it doesn’t involves large computations over a group to achieve forward secrecy.
7. The invention uses pseudo random numbers generated at both communicating ends to provide nounce as a token of key freshness as well as user identification credential thereby eliminating the need for costly third party digital certificate. Every time the users want to communicate their user ID will be unique known only to authentication server.
8. Once secure session keys get delivered to both ends, the communicating parties will exchange messages to reassure on the consistency of the session key received.
9. Once secure session is established the client and bank’s server will perform second handshake to synchronize their pre installed pseudo random number generators (PRNG).
10. This algorithm involves the exchange of credential parameters (state file of PRNG) in such a way that both ends agree on a consensus (a common a state file) without even disclosing their earlier states to each other. ( imagine a situation when two people are having secret numbers at their ends and they want to generate consensus on a single number without disclosing either the numbers at their ends or the finally reached consensus.
11. Once PRNGs reached the same state OTP keys are generated and semantically secure security is achieved by encrypting each message with same length key.
12. All communication takes place via sub session within the same session in multiple bursts making it computationally infeasible for attacker in the middle to capture the data and respond accordingly.
i. In one aspect, the present invention is directed at providing a two way multi factor One Time Token based authentication mechanism, for rendering authentication functionality for both communicating ends.
ii. In second aspect it removes the need for costly digital certificates needed by Bank’s server that are important aspect of public key infrastructure, replacing it by much better dynamic unique ID at both end thereby giving an extra credential even to client.
iii. In third aspect the invention is directed at providing a secure key distribution facility there by replacing the need for public key infrastructure, making it more preferable in terms of security as well as performance i.e. it is safe from all attacks encumbering PKI and is computationally faster since PKI mechanism is mainly based in complex computation that are computationally expensive and are not efficient especially in the case of battery operated devices.
iv. In fourth aspect the invention provides safety from phishing attacks, man in the middle attack, Cross site scripting and other session hijacking attacks on customers of online banking. The system comprises an installed application at client end communicating with the browser thereby eliminating threats caused by various scripts and other session credentials thefts cause of browser vulnerabilities.
v. In fifth aspect the invention is directed at providing One Time Padding facility without the need of pre shared keys. This becomes feasible by a handshake algorithm that is targeted to synchronize two elliptic curve based pseudo random number generators. In a Diffie Hellman like parameter exchange over elliptic curve, seed parameters are exchanged between the communicating parties and a consensus is reached without disclosing either the previously shared states or the newly generated consensus.
vi. The sixth aspect of the invention deals with multi elliptic curve based pseudo random number generator. The state of the curve consists of many curve seed states and numbers are generated randomly from any of the curves in the seed set. . Elliptic curves defined over prime fields, with different field sizes are used. The state of different curves keeps on changing throughout the life cycle of employed PRNG.
vii. The state file of the PRNG is much more dynamic and defined over much larger primes fields making it more uniformly distributed than PRNG defined over single curve.

The same mechanism can be used to authenticate the client when he is using a computer or cell phone for secure login authentication needed by banks and financial institutions, mobile banking schemes, remote login into corporate network from personal computer or cell phone, one time padding symmetric encryption using PRNG installed at client end for transferring highly confidential data, authentication and data transfer in cloud, login authentication and data transfer for government network, secure remote desktop application for login authentication and data transfer, secure remote desktop application from cell phone to desktop, for providing security from phishing and man in the middle when authentication is done using voice recognition or biometrics, for authentication, remote desktop application in 3G networks, client end stock trading application that require authentication and secure data transfer( current SEBI guidelines makes 128 bit encryption mandatory for all trading related data transfer.), for providing bit by bit voice encryption facility for telephonic or 3G networks.
System configuration of client end system (desktop or mobile client or client’s (bank’s) server)
1. Client Server session initiation procedure
2. Multi factor, two way authentication protocol ( between client/bank’s server and third party authentication server)
3. Key exchange protocol (between client/bank’s server and third party authentication server)
4. Session creation procedure
5. PRNG synchronization algorithm – to synchronize client and bank’s server’s pre installed PRNG.
6. Session credential and key maintenance procedure
7. Encryption/Decryption algorithm
8. Data integrity and signature check procedures
9. Threat monitoring procedures
10. Token – output random triplet periodically
11. PRNG – pseudorandom number generator
Token will output three numbers periodically
1. UID
2. Token(password)
3. Authentication Server’s UID for this client
Client server session initiation protocol
Client Bank’s Server

Here client and server (Bank) initiates session via exchanging hello messages and then forwarding their hashed unique identification credentials. Now these credentials are random numbers generated by token program installed at their end. So the UIDs are dynamic. Once these exchanges are over they discard this session and initiates next protocol with authentication server respectively.
Authentication and Key exchange protocol
Here we are depicting Client via alias A and Banks Server via B.

represents signed message digest by A with pre shared long lived key with authentication server S.

represents encrypted message by A with pre shared long lived key with authentication serverS.

Client/Bank’s server Authentication Server

Same steps will occur between B and server S. Now They will exchange messages forwarded to them by authentication server, to concur on keys received by authentication server.

After secure key exchange and session, client server and server start communicating securely. After this step they start to synchronize their Pseudo Random Number Generators.

PRNG synchronization
All this communication takes place once a secure session is established between client and bank’s server. Both client and server have pre installed elliptic curve based pseudo random number generator installed at their end. Both PRNG have different state files containing group generators, prime field parameters, constants and point coordinates. The below explained communication relies on the hardness of discrete logarithm problem based on elliptic curves and helps both parties on consensus generation. Once these messages are exchanged both PRNG will have same state files and thereby outputting same numbers which can be used as one time padding keys.

The PRNG installed is elliptic curve based. It contains k curves with state parameters as generators G1, G2, …, Gk. Apart from generator each curve’s state consists of seed calculation point Pi and random number generation point Qi. So to synchronize the PRNG we need to generate same state files for both PRNG. This is similar to consensus generation problem where no party except the involved can guess the final state reached just by seeing the transferred messages. The security of the proposed scheme is based on Elliptic curve group Discrete Logarithm problem. After exchange of messages and without disclosing their initial state in any manner both parties reach a consensus on final state. There by both PRNG will output same numbers that can be further used as keys for one time padding mechanism.

Step 1.
Alice generates two sets of k numbers (depending on the numbers of curves being used in PRNG), {np1, np2, …,npk} and {nq1,nq2,…,nqk}. Then these numbers are multiplied with curve generator G1,G2 …Gk respectively. All multiplication takes place over respective elliptic curves and all computation follows elliptic group algebra. Once computed these parameters are sent to Bob (bank’s server).

Step 1: Alice sends pairs of k coordinates to Bob. It’s like Deffie Hellman key exchange system. All computation is done over elliptic curve via concerned group algebra defined over finite fields. Only Alice knows the multipliers (mp1,mq1),….(mpk,mqk). Even if the session key got compromised yet adversary will have access to the number n * G and G, it’s infeasible for him to find n due to the computational hardness of Discrete Logarithm problem.
Step 2: In a similar fashion Bob also computes two sets of coordinates, each in turn having k coordinates. Bob forwards be numbers to Alice.
Step 3: Alice does the following computation at his end for all i belonging to k.

Step 4: Bob too performs similar computations for all i belonging to k.

Now since all computation is performed over elliptic curve defined over prime field or Galois field, which are abelian(commutative) in nature , both Alice and Bob will generated same points at end of step 3 and step 4.

Posted in Uncategorized | Leave a comment