Spark in me - Internet, data science, math, deep learning, philosophy

@snakers4 Нравится 0
Это ваш канал? Подтвердите владение для дополнительных возможностей

All this - lost like tears in rain.
Data science, deep learning, sometimes a bit of philosophy and math. No bs.
Our website
Our chat
DS courses review
Гео и язык канала
Россия, Русский

Гео канала
Язык канала
Добавлен в индекс
09.05.2017 23:31
Последнее обновление
16.02.2019 18:09
Telegram Analytics
Самые свежие новости сервиса TGStat. Подписаться →
Searchee Bot
Ваш незаменимый помощник в поиске Telеgram-каналов.
Бот для получения статистики каналов не выходя из Telegram
1 443
охват 1 публикации
дневной охват
постов / нед.
индекс цитирования
Репосты и упоминания канала
5 упоминаний канала
30 упоминаний публикаций
41 репостов
Just links
Just links
Just links
Main ML_KZ
Nightly Blabber
Just links
Нейронач / NeuroChan
Just links
Anscombe's Quartet
Food-stained hoodie
Anscombe's Quartet
Блог Шмакова
Anscombe's Quartet
Anscombe's Quartet
Dato ML
Dato ML
DeepLearning ru
Dato ML
Dato ML
Dato ML
DeepLearning ru
DeepLearning ru
Entombed rapture
Entombed rapture
Entombed rapture
DX space
Каналы, которые цитирует @snakers4
Bird Born
Loss function porn
Just links
Just links
Bird Born
Just links
Just links
Just links
Just links
Hacker News
Just links
Just links
Hacker News
Just links
Just links
Hacker News
Админим с Буквой
Админим с Буквой
Data Science
Just links
Linuxgram 🐧
Arseniy's channel
Arseniy's channel
Linuxgram 🐧
Linuxgram 🐧
Ivan Begtin
Linuxgram 🐧
Последние публикации
С упоминаниями
Whict type of content do you / would you like most on the channel?
  • Weekly / bi-weekly digests;
  • Full articles;
  • Podcasts with actual ML practicioners;
  • Practical bits on real applied NLP;
  • Pre-trained BERT with Embedding Bags for Russian;
  • Paper reviews;
  • Jokes / memes / cats;
77 голосов
(2) is valid for models with complex forward pass and models with large embedding layers
PyTorch DataLoader, GIL thrashing and CNNs

Well all of this seems a bit like magic to me, but hear me out.

I abused my GPU box for weeks running CNNs on 2-4 GPUs.
Nothing broke.
And then my GPU box started shutting down for no apparent reason.

No, this was not:
- CPU overheating (I have a massive cooler, I checked - it works);
- PSU;
- Overclocking;
- It also adds to confusion that AMD has weird temperature readings;

To cut the story short - if you have a very fast Dataset class and you use PyTorch's DataLoader with workers > 0 it can lead to system instability instead of speeding up.

It is obvious in retrospect, but it is not when you face this issue.

PyTorch NLP best practices

Very simple ideas, actually.

(1) Multi GPU parallelization and FP16 training

Do not bother reinventing the wheel.
Just use nvidia's apex, DistributedDataParallel, DataParallel.
Best examples [here](

(2) Put as much as possible INSIDE of the model

Implement the as much as possible of your logic inside of nn.module.
So that you can seamleassly you all the abstractions from (1) with ease.
Also models are more abstract and reusable in general.

(3) Why have a separate train/val loop?

PyTorch 0.4 introduced context handlers.

You can simplify your train / val / test loops, and merge them into one simple function.

context = torch.no_grad() if loop_type=='Val' else torch.enable_grad()

if loop_type=='Train':
elif loop_type=='Val':

with context:
for i, (some_tensor) in enumerate(tqdm(train_loader)):
# do your stuff here
(4) EmbeddingBag

Use EmbeddingBag layer for morphologically rich languages. Seriously!

(5) Writing trainers / training abstractions

This is waste of time imho if you follow (1), (2) and (3).

(6) Nice bonus

If you follow most of these, you can train on as many GPUs and machines as you wan for any language)

(7) Using tensorboard for logging

This goes without saying.

Russian thesaurus that really works

It knows so many peculiar / old-fashioned and cheeky synonyms for obscene words!

Russian Distributional Thesaurus
Russian Distributional Thesaurus (сокр. RDT) — проект создания открытого дистрибутивного тезауруса русского языка. На данный момент ресурс содержит несколько компонент: вектора слов (word embeddings), граф подобия слов (дистрибутивный тезаурус), множество гиперонимов и инвентарь смыслов слов. Все ресурсы были построены автоматически на основании корпуса текстов книг на русском языке (12.9 млрд словоупотреблений). В следующих версиях ресурса планируется добавление и векторов смыслов слов для русского языка, которые были получены на основании того же корпуса текстов. Проект разрабатывается усилиями представителей УрФУ, МГУ им. Ломоносова, Университета Гамбурга. В прошлом в проект внесли свой вклад исследователи из Южно-Уральского государственного университета, Дармштадского технического университета, Волверхемтонского университета и Университета Тренто.
Репост из: thinline72
Old news ... but Attention works

Funny enough, but in the past my models :
- Either did not need attention;
- Attention was implemented by @thinline72 ;
- The domain was so complicated (NMT) so that I had to resort to boilerplate with key-value attention;

It was the first time I / we tried manually building a model with plain self attention from scratch.

An you know - it really adds 5-10% to all of the tracked metrics.

Best plain attention layer in PyTorch - simple, well documented ... and it works in real life applications:

Third 2019 DS / ML digest

Highlights of the week
- quaternions;
- ODEs;

A new paradigm in ML?

Репост из: Анна
Checked out sentence embeddings in LASER:
- installation guide is a bit messy
- works on FAISS lib, performance is pretty fast (
Second 2019 DS / ML digest

Highlight of the week - Facebook's LASER.

Jupiter widgets + pandas

With the @interact decorator, the IPywidgets library automatically gives us a text box and a slider for choosing a column and number! It looks at the inputs

Serialization of large objects in Python

So far found no sane way for this with 1M chunks / 10GB+ object size.

Of course, chunking / plain txt works.

Feather / parquet - fail with 2+GB size.
Pickle works, but it is kind of slow.


Downsides of using Common Crawl

Took a look at the Common Crawl data I myself pre-processed last year and could not find abstracts - only sentences.

Took a look at these - archives - - also only sentences, though they seem to be in logical order sometimes.

You can use any form of CC - but only to learn word representations. Not sentences.

Neat PyTorch hack

(1) If possible Implement your complex loss / logic within your model.forward()
(2) Enjoy the multi-GPU / multi-node training wrappers from APEX, PyTorch DataParallel, DistributedDataParallel etc


NLP - Highlight of the week - LASER

- Hm, a new sentence embedding tool?
- Plain PyTorch 1.0 / numpy / FAISS based;
- [Release](, [library](;
- Looks like an off-shoot of their "unsupervised" NMT project;

LASER’s vector representations of sentences are generic with respect to both the
input language and the NLP task. The tool maps a sentence in any language to
point in a high-dimensional space with the goal that the same statement in any
language will end up in the same neighborhood. This representation could be seen
as a universal language in a semantic vector space. We have observed that the
distance in that space correlates very well to the semantic closeness of the
sentences.- Alleged pros:
It delivers extremely fast performance, processing up to 2,000 sentences per second on GPU.
The sentence encoder is implemented in PyTorch with minimal external dependencies.
Languages with limited resources can benefit from joint training over many languages.
The model supports the use of multiple languages in one sentence.
Performance improves as new languages are added, as the system learns to recognize characteristics of language families.They essentially trained an NMT model with a shared encoder for many languages.

I tried training sth similar - but it quickly over-fitted into just memorizing the indexes of words.


Pre-trained BERT in PyTorch

Model code here is just awesome.
Integrated DataParallel / DDP wrappers / FP16 wrappers also are awesome.

FP16 precision training from APEX just works (no idea about convergence though yet).

As for model weights - I cannot really tell, there is no dedicated Russian model.
The only problem I am facing now - using large embeddings bags batch size is literally 1-4 even for smaller models.

And training models with sentence piece is kind of feasible for rich languages, but you will always worry about generalization.

Did not try the generative pre-training (and sentence prediction pre-training), I hope that properly initializing embeddings will also work for a closed domain with a smaller model (they pre-train 4 days on 4+ TPUs, lol).

Why even tackle such models?
Chat / dialogue / machine comprehension models are complex / require one-off feature engineering.
Being able to tune something like BERT on publicly available benchmarks and then on your domain can provide a good way to embed complex situations (like questions in dialogues).

New amazing video by 3B1B