Related changes
Appearance
Enter a page name to see changes on pages linked to or from that page. (To see members of a category, enter Category:Name of category). Changes to pages on your Watchlist are shown in bold with a green bullet. See more at Help:Related changes.
List of abbreviations (help):
- D
- Edit made at Wikidata
- r
- Edit flagged by ORES
- N
- New page
- m
- Minor edit
- b
- Bot edit
- (±123)
- Page byte size change
- Temporarily watched page
11 April 2025
- diffhist m Perplexity 13:50 +5 Jmacwiki talk contribs (→Perplexity of a probability distribution: Tweak.)
- diffhist Mamba (deep learning architecture) 11:45 +6 ActivelyDisinterested talk contribs (Fixing notelist position and section title)
10 April 2025
- diffhist Reinforcement learning from human feedback 20:48 −14 PopoDameron talk contribs (Not an example but rather a clarification on how the formula simplifies in this common case)
8 April 2025
- diffhist EleutherAI 05:15 +3 174.197.131.222 talk (→History: m.)
- diffhist EleutherAI 04:28 +803 174.197.131.222 talk (→History: update.)
- diffhist EleutherAI 04:19 +61 174.197.131.222 talk (→References: ai list.)
6 April 2025
- diffhist Reinforcement learning from human feedback 18:22 −5 Alenoach talk contribs (copyedit) Tag: Visual edit
- diffhist m Reinforcement learning from human feedback 15:52 0 Kooryan talk contribs (→Kahneman-Tversky Optimization (KTO)) Tag: Visual edit
- diffhist Reinforcement learning from human feedback 15:52 +4,354 Kooryan talk contribs (Added KTO, another important DAA) Tag: Visual edit
- diffhist Reinforcement learning from human feedback 15:39 −5 Alenoach talk contribs (copyedit) Tag: Visual edit
- diffhist m Reinforcement learning from human feedback 15:36 0 Alenoach talk contribs (sentence case) Tag: Visual edit
- diffhist Reinforcement learning from human feedback 15:34 +2 Alenoach talk contribs (Moved and copyedited the "Direct Alignment Algorithms" section) Tag: Visual edit
- diffhist m Reinforcement learning from human feedback 14:58 +667 Hanpei talk contribs (→Direct Alignment Algorithms)
- diffhist m Reinforcement learning from human feedback 13:30 −5 Arjayay talk contribs (Duplicate word removed)
- diffhist m Perplexity 08:51 −14 SpectrumMigou talk contribs (Fix formatting of KL divergence formula)
- diffhist m Reinforcement learning from human feedback 06:24 +258 JWEEEEEEN talk contribs (edit new section) Tag: Visual edit
- diffhist Reinforcement learning from human feedback 00:27 +1,175 Kooryan talk contribs (New section for Direct Alignment Algorithms) Tag: Visual edit
- diffhist m Reinforcement learning from human feedback 00:22 −22 Kooryan talk contribs (→(Identity Preference Optimization)) Tag: Visual edit
- diffhist Reinforcement learning from human feedback 00:19 +4,053 Kooryan talk contribs (Added identity preference optimization) Tag: Visual edit
5 April 2025
- diffhist m Reinforcement learning from human feedback 21:44 +39 Kooryan talk contribs (→Direct preference optimization) Tag: Visual edit
- diffhist m Reinforcement learning from human feedback 15:58 +160 Kooryan talk contribs (→Reward model: Modified equation formatting. Modified incorrect definitions of the advantage estimation in the clipped surrogate objective. It is wrong to consider the KL-penalty because the clipped surrogate is a totally different objective. A more technical description is required actually for PPO in this section as well, as many details are incorrect or unclear.) Tag: Visual edit