Spark in me - Internet, data science, math, deep learning, philosophy

@snakers4 Нравится 0
Это ваш канал? Подтвердите владение для дополнительных возможностей

All this - lost like tears in rain.
Data science, ML, a bit of philosophy and math. No bs.
Our website
- http://spark-in.me
Our chat
- https://t.me/joinchat/Bv9tjkH9JHYvOr92hi5LxQ
DS courses review
- http://goo.gl/5VGU5A
- https://goo.gl/YzVUKf
Гео и язык канала
Россия, Русский
Категория
Технологии


Гео канала
Россия
Язык канала
Русский
Категория
Технологии
Добавлен в индекс
09.05.2017 23:31
Последнее обновление
22.05.2019 02:05
реклама
Telegram Analytics
Самые свежие новости сервиса TGStat. Подписаться →
Searchee Bot
Ваш незаменимый помощник в поиске Telеgram-каналов.
@TGStat_Bot
Бот для получения статистики каналов не выходя из Telegram
1 822
подписчиков
~1k
охват 1 публикации
~539
дневной охват
~4
постов / нед.
56.9%
ERR %
1.8
индекс цитирования
Репосты и упоминания канала
8 упоминаний канала
30 упоминаний публикаций
44 репостов
Just links
Нейронач
Physics Blues
Нейронач
Just links
Just links
Just links
Админим с Буквой
Main ML_KZ
Нейронач
Нейронач
Red Fox On White Snow
Нейронач
Just links
Нейронач
Нейронач
Нейронач
Админим с Буквой
Just links
Anscombe's Quartet
Админим с Буквой
Админим с Буквой
Food-stained hoodie
Anscombe's Quartet
Блог Шмакова
Anscombe's Quartet
Anscombe's Quartet
Machinelearning
Отраженный свет
Dato ML
Dato ML
DeepLearning ru
Dato ML
Dato ML
Dato ML
DeepLearning ru
Каналы, которые цитирует @snakers4
Just links
Just links
Neural Networks Engineering
Neural Networks Engineering
Just links
Just links
Just links
Админим с Буквой
Вастрик.Пынь
Bird Born
Loss function porn
Just links
Just links
Админим с Буквой
Bird Born
NVIDIA
Админим с Буквой
Just links
Just links
Just links
Just links
Админим с Буквой
Hacker News
Just links
Админим с Буквой
NVIDIA
Just links
Hacker News
Just links
Админим с Буквой
Just links
Hacker News
Roem.ru
2ch/Двач
Админим с Буквой
Админим с Буквой
Админим с Буквой
Админим с Буквой
Админим с Буквой
Data Science
Just links
Linuxgram 🐧
addmeto
Arseniy's channel
Arseniy's channel
Econerso
Roem.ru
Linuxgram 🐧
Roem.ru
addmeto
Последние публикации
Удалённые
С упоминаниями
Репосты
New in our Open STT dataset

https://github.com/snakers4/open_stt#updates

- An mp3 version of the dataset;
- A torrent for mp3 dataset;
- A torrent for the original wav dataset;
- Benchmarks on the public dataset / files with "poor" annotation marked;

#deep_learning
#data_science
#dataset
SWA in contrib repo of pytorch )
Репост из: Just links
https://pytorch.org/blog/stochastic-weight-averaging-in-pytorch/
2019 DS / ML digest 10

Highlights of the week(s)
- New MobileNet;
- New PyTorch release;
- Practical GANs?;

https://spark-in.me/post/2019_ds_ml_digest_10

#digest
#deep_learning
Habr.com / TowardsDataScience post for our dataset

In addition to a github release and a medium post, we also made habr.com post:
- https://habr.com/ru/post/450760/

Also our post was accepted to an editor's pick part of TDS:
- http://bit.ly/ru_open_stt

Share / give us a star / clap if you have not already!

Original release
https://github.com/snakers4/open_stt/

#deep_learning
#data_science
#dataset
Читать полностью
PyTorch DP / DDP / model parallel

Finally they made proper tutorials:
- https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html
- https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html
- https://pytorch.org/tutorials/intermediate/ddp_tutorial.html

Model parallel = have parts of the same model on different devices
Data Parallel (DP) = wrapper to use multi-GPU withing a single parent process
Distributed Data Parallel = multiple processes are spawned across cluster / on the same machine

#deep_learning
Читать полностью
PyTorch

PyTorch 1.1
https://github.com/pytorch/pytorch/releases/tag/v1.1.0

- Tensorboard (beta);
- DistributedDataParallel new functionality and tutorials;
- Multi-headed attention;
- EmbeddingBag enhancements;
- Other cool, but more niche features:
- nn.SyncBatchNorm;
- optim.lr_scheduler.CyclicLR;

#deep_learning
Russian Open Speech To Text (STT/ASR) Dataset
4000 hours of STT data in Russian

Made by us. Yes, really. I am not joking.
It was a lot of work.

The dataset:
https://github.com/snakers4/open_stt/

Accompanying post:
https://spark-in.me/post/russian-open-stt-part1

TLDR:
- On third release, we have ~4000 hours;
- Contributors and help wanted;
- Let's bring the Imagenet moment in STT closer together!;

Please repost this as much as you can.

#stt
#asr
#data_science
#deep_learning
Читать полностью
Poor man's computing cluster

So, when I last checked, Amazon's p3.4xlarge instances cost around US$12 per hour (unless you reserve them for a year). A tower supercomputer from Nvidia costs probably US$40-50k or more (it was announced at around US$69k).

It is not difficult to crunch the numbers and see, that 1 month of renting such a machine would cost at least US$8-10k. Also there will the additional cost / problem of actually storing your large datasets. When I last used Amazon - their cheap storage was sloooooow, and fast storage was prohibitively expensive.


So, why I am saying this?


Let's assume (according to my miner friends' experience) - that consumer Nvidia GPUs can work 2-3 years non-stop given proper cooling and care (test before buying!). Also let's assume that 4xTesla V100 is roughly the same as 7-8 * 1080Ti.

Yeah, I know that you will point out at least one reason why this does not hold, but for practical purposes this is fine (yes, I know that Teslas have some cool features like Nvlink).

Now let me drop the ball - modern professional motherboards often boast 2-3 Ethernet ports. And sometimes you can even get 2x10Gbit/s ports (!!!).

It means, that you actually can connect at least 2 (or maybe you can daisy chain them?) machines into a computing cluster.

Now let's crunch the numbers

According to quotes I collected through the years, you can build a cluster roughly equivalent to Amazon's p3.4xlarge for US$10k (but with storage!) with used GPUs (miners sell them like crazy now). If you buy second market drives, motherboards, CPUs and processors you can lower the cost to US$5k or less.

So, a cluster, that would serve you at least one year (if you test everything properly and take care of it) costing US$10k is roughly equivalent to:
- 20-25% of DGX desktop;
- 1 month of renting on Amazon;

Assuming that all the hardware will just break in a year:
- It is 4-5x cheaper than buying from Nvidia;
- It is 10x cheaper than renting;

If you buy everything used, then it is 10x and 20x cheaper!

I would buy that for a dollar!
Ofc you have to invest your free time.

See my calculations here:
http://bit.ly/spark00001

#deep_learning
#hardware
Attached file
Читать полностью
More about STT from also us ... soon)
Репост из: Yuri Baburov
Вторая экспериментальная гостевая лекция курса.
Один из семинаристов курса, Юрий Бабуров, расскажет о распознавании речи и работе с аудио.

1-го мая в 8:40 Мск (12:40 Нск, 10:40 вечера 30-го апреля по PST).

Deep Learning на пальцах 11 - Аудио и Speech Recognition (Юрий Бабуров)
https://www.youtube.com/watch?v=wm4H2Ym33Io
Tricky rsync flags

Rsync is the best program ever.

I find these flags the most useful
--ignore-existing (ignores existing files)
--update (updates to newer versions of files based on ts)
--size-only (uses file-size to compare files)
-e 'ssh -p 22 -i /path/to/private/key' (use custom ssh identity)
Sometimes first three flags get confusing.

#linux
2019 DS / ML digest 9

Highlights of the week
- Stack Overlow survey;
- Unsupervised STT (ofc not!);
- A mix between detection and semseg?;

https://spark-in.me/post/2019_ds_ml_digest_09

#digest
#deep_learning
Using snakeviz for profiling Python code

Why
To profile complicated and convoluted code.
Snakeviz is a cool GUI tool to analyze cProfile profile files.
https://jiffyclub.github.io/snakeviz/

Just launch your code like this
python3 -m cProfile -o profile_file.cprofile

And then just analyze with snakeviz.

GUI

They have a server GUI and a jupyter notebook plugin.
Also you can launch their tool from within a docker container:
snakeviz -s -H 0.0.0.0 profile_file.cprofile
Do not forget to EXPOSE necessary ports. SSH tunnel to a host is also an option.

#data_science
Читать полностью
2019 DS / ML digest number 8

Highlights of the week
- Transformer from Facebook with sub-word information;
- How to generate endless sentiment annotation;
- 1M breast cancer images;

https://spark-in.me/post/2019_ds_ml_digest_08

#digest
#deep_learning