Since the 1990s, HIV can be treated with anti-retroviral drugs, which when used in a combination ('cocktail') regiment can suppress the virus for many years. However, HIV frequently develops drug resistance through mutations in the proteins targeted by the drugs. Many thousands of these drug resistant variants of the proteins have been characterized. We want to use this data to infer the map between the amino-acid sequence and the properties of the proteins. Since the space of possible mutant proteins is of very high dimension and still rather sparsely and sampled, this is a challenging inference problem that requires innovative approaches.