Objective

There are a lot of time I slice some portion of data from multi-dimension vector/tensor. The Numpy array and PyTorch tensor make it very easy to slice, and with a very similar syntax. In some scenario I might need to work from a list, and here comes one implementation that can be done.

Example 1

A simple 2D number list, I want to slice the input into 3 list like elements

from:
[[ 1, 2, 3, 4, 5],
[11,12,13,14,15],
[21,22,23,24,25]]
into:
[[ 1, 2, 3], | [4, | [7,
[11,12,13], | 5, | 8,
[21,22,23]] | 6] | 9]

Numpy array


Intention

There have been cases that I have some dataset that’s not strictly numerical and not necessary fit into tensor, so I have been trying to find a way to manage my data loading beyond passing the input to PyTorch Dataloader object and let it automatically sample the batches for me, and I have been doing that multiple times and so I would like to study a bit deeper and share it here as a record for my future reference.

Main Reference

PyTorch official reference:

Main Classes / function(s)

Dataset (and their subclasses)


When running code in Colab, there are occasion I need to debug code that’s not develop by me but from installed packages, and it’s impossible alter code inside.

When code is written by myself, it’s easy to add code:

from pdb import set_trace; set_trace()

So when code run to this line, it would trigger the debugging:

Problem

Note that when debugging, hitting “n” (next) does not lead us to next line in code (line 5), one would need to experiment with some “n” or “s” (step) get to the line:


I admire the effort of this article, if you want to embed a interactive graph/plot, follow the way this article shared.

I found at least 2 (if not more) providers that help host your plotly on cloud and so you can embed in your website, but for me sometimes an image can serve my purpose, if the resolution is high enough.

The simplest way to get an image from a plotly is the download as image button they provided

But sometimes when the information is packed, I would expect some thing with a little bit higher resolution.

And the good…


This is a quick summary on using Hugging Face Transformer pipeline and problem I faced.

Pipeline is a very good idea to streamline some operation one need to handle during NLP process with their transformer library, at least but not limited to:

  1. tokenize the input string
  2. map tokens to IDs (integer)
  3. pass the mapped id as tensor to model

The old way before pipeline:

# Load pretrained model/tokenizer
from transformers import DistilBertModel, DistilBertTokenizer
model_class, tokenizer_class, pretrained_weights = (DistilBertModel, DistilBertTokenizer, 'distilbert-base-uncased')
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights)
input_ids = torch.tensor([tokenizer.encode("this is a test")])
with torch.no_grad(): …


Background

I have this Python project that run multiple web services on Flask. I used to debug very inefficiently with adding pdb set_trace lines or print message to trace, but with Visual Studio Code as my major IDE (mainly for Javascript), I would like to leverage it to make my Python programming more efficient.

Challenges

I use Windows, but I prefer Linux when I develop, so I use WSL (Window Linux Subsystems)

I use Conda to manage my Python package, and the environments are all in file system inside WSL.

When using Visual Studio Code, it can detect the WSL environment for…


This is a browser extension that generate QR code for selected text or current page URL.

Intention and Background

This is a project that I rethink about the OCR solution in:

Problem in OCR solution was the accuracy and the speed is not satisfactory, and so consider the same problem to copy text from PC and use it in mobile phone, one way was to use a middle man like Signal, WhatsApp, email…But my take is to do it with camera (OCR in previous solution or QR code scanning in this solution).

What it does

It display the QR code of the current URL or selected…


This is one of the learning about what I can do with a browser extension, the extension is Firefox only for now as I leverage DNS resolve capability that is not supported in Chrome (yet).

Intention and Background

As data privacy are of more concern (at least I do concern), I would like to know what kind of request is made underneath when I browse on the internet (with my browser). …


This is a NLP exploration during WFH period.

Intention and background

For a query like “can I get an egg sushi”, can the machine possible relate it to “tamago sushi” (in which tamago is egg in Japanese)

I would like to see if any pretrained models available can pick such kind of association. If that work, it might be possible to have an input query to search a pool of dish and pick up closest one.

What I tried

  • Load the language models, have them process some dish name, see their clustering
  • Load all dish into a Faiss index, try to query with dish name or…

Stephen Cow Chau

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store