1 |p pub_23 |a adid_456 ad_daily ad_news

here p(publisher_features) is a namespace where you put all the publisher related features

and a(ad_features) is a namespace for ad specific features (you can name them according to your need).

Vowpal Wabbit used feature hashing, so we don’t need to generate 1 hot encoding. Whuile training, let’s say a logistic regression model you can even generate quadratic features as follows

vw -d train_data_file.txt -c –loss_function=logistic –passes 5 -f model_name –l1 0.0000001 –l2 0.00000001 –readable_model readable.txt -q pa -b 26

now -q pa corresponds to generation of quadratic features present for publisher and ad name spaces.

There is no need to use space, since vowpal wabbit is an online learner and it is really fast even on a single machine. Though you can use Spark for training data generation.

]]>On a different note, you can also exploit domain knowledge of ad ops guys in this scheme. So today our ad ops guys told us that historically to win impressions on Adaptv (a video ad exchange) one has to bid around $7 or $8 CPM. So even before the algorithm stats exploration of win rates of different bidding prices we can feed this as a Prior to the Beta Distribution (just to start take alpha and beta for $7 and $8 bandit arms such that mean win rate alpha/(alpha+beta) is 90%. This will reduce the time to converge.

I am working on another approach to find a better solution for the Dynamic Bandit i.e. when the bidding environment changes and we will want to start the exploration again. I am think about using the approach of evolutionary game theory, just an intuition that we can use some sort of replicator equation to make copies of bandits or adjust the variance of their in rate distribution.

]]>thanks for the quick response. That makes sense. Any specific reason why you took the pdf of a normal distribution with a low std deviation?

Ciao,

Maritn

Actually I have a “pacing/campaign selection” algorithm that runs before finding the optimal bid price. Given a set of active campaigns the smooth pacing algorithm outputs a campaign id and its required win rate based on available inventory and hourly unspent budget, e.g. given a request segment Si and three campaigns {C1,C2,C3} with hourly unspent budget as {B1,B2,B3} respectively, it will output C1 and win rate, let’s say 50%. Now I will make a draw from each bandit distribution (posterior Beta corresponding to each Bandit) corresponding to campaign C1 and the pick the bandit according to following logic:

bandit_id =1

max_value = INT.MIN

target_win_rate =0.5

for each bandit_i in bandit_list:

#this value is the supposed win_rate of the bandit

x=posterior_draw_from_beta_bandit[i]

#here we are trying to find how much closer we are to the target or needed win rate

#you can use any distance function

output= norm.pdf(target_win_rate -x ,loc=0,scale=0.1)

bandit_id = bandit_i if output>max_value else bandit_id

nice post! Thought I have a quick question for you:

“At each request the algorithm selects a bid bin/bucket based on its assessment of win rate of that bin. Closer the win rate of the bin is to target win rate more bids will be made at that price” => Do you just scale the the sample from the beta distribution by something like 1-|targetWinRate-actualWinRate| before picking the largest sample for the round or do you use a custom distribution for each arm?

Ciao,

Martin