Min-P is such an elegant solution to the long tail problem. The way it dynamically adjusts the threshold based on the top token's probability is somthing I wish I had understood earlier when tuning my own models. Do you find that combinig Min-P with a moderate temperature works better than using Top-P alone?
I haven't tried it yet in production, but I think it's similar to 'min p' at moderate temperatures. I recommend reading the full paper if you want to know more!
Thank you, Shmulik, for contributing this amazing piece!
🎩💙🍷🖖
This is such an informational post . Thank you
Yes, it is! Loved this one
Thanks for the sanity testing and services to the same for the good 😊
Min-P is such an elegant solution to the long tail problem. The way it dynamically adjusts the threshold based on the top token's probability is somthing I wish I had understood earlier when tuning my own models. Do you find that combinig Min-P with a moderate temperature works better than using Top-P alone?
I haven't tried it yet in production, but I think it's similar to 'min p' at moderate temperatures. I recommend reading the full paper if you want to know more!
* to top p
Hope you liked it 🥂