[Bug-apl] Use with word2vec

Discussion:

Fred Weigel

2017-04-29 01:18:54 UTC

Jeurgen, and other GNU APL experts.

I am exploring neural nets, word2vec and some other AI related areas.

Right now, I want to tie in google's word2vec trained models (the
billion word one GoogleNews-vectors-negative300.bin.gz)

This is a binary file containing a lot of floating point data -- about
3.5GB of data. These are words, followed by cosine distances. I could
attempt to feed this in slow way, and put it into an APL workspace.
But... I also intend on attempting to feed the data to a GPU. So, what I
am looking for is a modification to GNU APL (and yes, I am willing to do
the work) -- to allow for the complete suppression of normal C++
allocations, etc. and allow the introduction of simple float/double
vectors or matrices (helpful to allow "C"-ish or UTF-8-ish strings: the
data is (C string containing word name) (fixed number of floating
point)... repeated LOTs of times.

The data set(s) may be compressed, so I don't want read them directly --
possibly from a shared memory region (64 bit system only, of course), or
, perhaps using shared variables... but I don't think that would be fast
enough.

Anyway, this begins to allow the push into "big data" and AI
applications. Just looking for some input and ideas here.

Many thanks
Fred Weigel

Fred Weigel

2017-04-29 01:19:02 UTC

Permalink

Xiao-Yong Jin

2017-04-29 01:32:30 UTC

Permalink

If shared variables can go through SHMEM, you can probably interface
cuda that way without much bottle neck.
But with the way GNU APL is implemented now, there are just too many
other limitations on performance with arrays of such size.

Post by Fred Weigel
Jeurgen, and other GNU APL experts.
I am exploring neural nets, word2vec and some other AI related areas.
Right now, I want to tie in google's word2vec trained models (the
billion word one GoogleNews-vectors-negative300.bin.gz)
This is a binary file containing a lot of floating point data -- about
3.5GB of data. These are words, followed by cosine distances. I could
attempt to feed this in slow way, and put it into an APL workspace.
But... I also intend on attempting to feed the data to a GPU. So, what I
am looking for is a modification to GNU APL (and yes, I am willing to do
the work) -- to allow for the complete suppression of normal C++
allocations, etc. and allow the introduction of simple float/double
vectors or matrices (helpful to allow "C"-ish or UTF-8-ish strings: the
data is (C string containing word name) (fixed number of floating
point)... repeated LOTs of times.
The data set(s) may be compressed, so I don't want read them directly --
possibly from a shared memory region (64 bit system only, of course), or
, perhaps using shared variables... but I don't think that would be fast
enough.
Anyway, this begins to allow the push into "big data" and AI
applications. Just looking for some input and ideas here.
Many thanks
Fred Weigel

Leslie S Satenstein

2017-04-29 01:50:15 UTC

Permalink

HiÂ FredÂ
Following up on Xiao-Yong Jin's response.

You did not mention if you need the data in realtime or if you can work at the apl interpretor speed.Do you have a structure for your data.Â You mentioned a format ofÂ [text][floats] without specifyingsize of text and number of floats.Â Is your data clean or does it need to be vetted. (NANs excluded)?
I believe you should create a data dictionary which constructed with sqlite.Â That data wouldbe loaded into sqlite via some C, CPP, python code and subsequently read via shared variables.APL is an interpretor.Â What would take hours with APL to do what you want to do,Â could take a few
minutes by externally loading the sql database and then using APL for presentation.
Its an interesting idea you have.Â Can you put out a more formal draft starter document.
Something to fill in the topics below.
Aim:Data Descriptions/Quantities:Vetting and Filtering:Processing speed:
Frequency of use.

Since you propose to do the work, who can estimate the cost.

From: Xiao-Yong Jin <***@gmail.com> To: ***@crisys.com
Cc: GNU APL <bug-***@gnu.org>
Sent: Friday, April 28, 2017 9:32 PM
Subject: Re: [Bug-apl] Use with word2vec

If shared variables can go through SHMEM, you can probably interface
cuda that way without much bottle neck.
But with the way GNU APL is implemented now, there are just too many
other limitations on performance with arrays of such size.

Fred Weigel

2017-04-29 20:57:50 UTC

Permalink

Leslie

It is not so much "interpret speed". The data is an array of floats (32
bit) - 71,000 to 3,000,000 rows each with 200 to 300 columns. Each row
will be subject to a vector multiplication for a query (obviously 71000
to millions, depending on number of rows). Yes, I am interested in
parallel computation (one of the reasons I started looking at GNU APL).

The data is completely clean -- no NANs, etc. Each row corresponds to a
word from a corpus. The word list is separate when computation begins
(but, in the model data, interleaved; I extract and build the memory
structures separately).

My test model is 71,000 x 200 floats, the "standard" model is 3,000,000
x 300 floats (3.5GB of memory)

The use is for low end AI (alternate word/concept selection, basic
analogies) to begin the process of deriving "meaning" from documents. I
figure around one billion operations per word in a document for this
processing. I am looking at APL specification and testing, and
deployment on GPGPU (OpenCL or CUDA). For example Futhark or something
like that.

FredW

Hi Fred
Following up on Xiao-Yong Jin's response.
You did not mention if you need the data in realtime or if you can
work at the apl interpretor speed.Do you have a structure for your
data. You mentioned a format of [text][floats] without
specifyingsize of text and number of floats. Is your data clean or
does it need to be vetted. (NANs excluded)?
I believe you should create a data dictionary which constructed with
sqlite. That data wouldbe loaded into sqlite via some C, CPP, python
code and subsequently read via shared variables.APL is an
interpretor. What would take hours with APL to do what you want to
do, could take a few
minutes by externally loading the sql database and then using APL for presentation.
Its an interesting idea you have. Can you put out a more formal draft
starter document.
Something to fill in the topics below.
Frequency of use.

Since you propose to do the work, who can estimate the cost.
Sent: Friday, April 28, 2017 9:32 PM
Subject: Re: [Bug-apl] Use with word2vec

If shared variables can go through SHMEM, you can probably interface
cuda that way without much bottle neck.
But with the way GNU APL is implemented now, there are just too many
other limitations on performance with arrays of such size.

Post by Fred Weigel
Jeurgen, and other GNU APL experts.
I am exploring neural nets, word2vec and some other AI related areas.
Right now, I want to tie in google's word2vec trained models (the
billion word one GoogleNews-vectors-negative300.bin.gz)
This is a binary file containing a lot of floating point data -- about
3.5GB of data. These are words, followed by cosine distances. I could
attempt to feed this in slow way, and put it into an APL workspace.
But... I also intend on attempting to feed the data to a GPU. So, what I
am looking for is a modification to GNU APL (and yes, I am willing to do
the work) -- to allow for the complete suppression of normal C++
allocations, etc. and allow the introduction of simple float/double
vectors or matrices (helpful to allow "C"-ish or UTF-8-ish strings: the
data is (C string containing word name) (fixed number of floating
point)... repeated LOTs of times.
The data set(s) may be compressed, so I don't want read them
directly --
possibly from a shared memory region (64 bit system only, of
course), or
, perhaps using shared variables... but I don't think that would be fast
enough.
Anyway, this begins to allow the push into "big data" and AI
applications. Just looking for some input and ideas here.
Many thanks
Fred Weigel

Fred Weigel

2017-04-29 20:26:06 UTC

Permalink

Thanks!

I'll probably go with SHMEM for future cuda/opencl use (I was thinking
along those lines). I don't yet need typical size -- the model I am
working with this weekend is vector8.bin, which is 71000 x 200 floats
(71000 words, each with 200 floats = 57MB) in size, but the *big* one is
much larger.

Fred Weigel

Post by Xiao-Yong Jin
If shared variables can go through SHMEM, you can probably interface
cuda that way without much bottle neck.
But with the way GNU APL is implemented now, there are just too many
other limitations on performance with arrays of such size.

Post by Fred Weigel
Jeurgen, and other GNU APL experts.
I am exploring neural nets, word2vec and some other AI related areas.
Right now, I want to tie in google's word2vec trained models (the
billion word one GoogleNews-vectors-negative300.bin.gz)
This is a binary file containing a lot of floating point data -- about
3.5GB of data. These are words, followed by cosine distances. I could
attempt to feed this in slow way, and put it into an APL workspace.
But... I also intend on attempting to feed the data to a GPU. So, what I
am looking for is a modification to GNU APL (and yes, I am willing to do
the work) -- to allow for the complete suppression of normal C++
allocations, etc. and allow the introduction of simple float/double
vectors or matrices (helpful to allow "C"-ish or UTF-8-ish strings: the
data is (C string containing word name) (fixed number of floating
point)... repeated LOTs of times.
The data set(s) may be compressed, so I don't want read them
directly --
possibly from a shared memory region (64 bit system only, of
course), or
, perhaps using shared variables... but I don't think that would be fast
enough.
Anyway, this begins to allow the push into "big data" and AI
applications. Just looking for some input and ideas here.
Many thanks
Fred Weigel

Juergen Sauermann

2017-04-29 11:04:03 UTC

Permalink

<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Fred, 
 
I have not fully understood what you want to do exactly, but is
looks to me as if you want to go for 
native GNU APL functions. Native functions provide the means to
bypass the GNU APL interpreter 
itself to the extent desired. For example you can use APL variables
but not the APL parser, or the 
APL parser but not the implementation of primitives, or whatever
else you are up to. 
 
As to plain double vectors, it is very difficult to introduce them
as a new built-in data type because that 
change would affect: every APL primitive, every APL operator,
)LOAD, )SAVE, )DUMP, and a lot 
more. 
 
However, you can have a look at (the top-level of) the
implementation of the matrix divide primitive which 
is doing what you are maybe after. The implementation of matrix
divide expects either a double vector or 
a complex<double> vector as argument(s) and returns such a
vector as result. Before and after the computation 
of matrix divide a conversion between APL values and the plain
double or complex vector is performed. 
This conversion is very lightweight. If you have a homogenious GNU
APL value, say all revel items being double, 
then that value is almost like a C double *. The difference is a
space between adjacent ravel elements. In other 
words (expressed in APL): 
 
C_vector ←→ 1 0 1 0 ... / APL_vector 
 
I can provide you with more information if you want to go along
this path. 
 
/// Jürgen 
 
 
 
 
<div class="moz-cite-prefix">On 04/29/2017 03:19 AM, Fred Weigel
wrote: 
</div>
<blockquote cite="mid:***@crisys.com"
type="cite">
<pre wrap="">Jeurgen, and other GNU APL experts.

I am exploring neural nets, word2vec and some other AI related areas.

Right now, I want to tie in google's word2vec trained models (the
billion word one GoogleNews-vectors-negative300.bin.gz)

This is a binary file containing a lot of floating point data -- about
3.5GB of data. These are words, followed by cosine distances. I could
attempt to feed this in slow way, and put it into an APL workspace.
But... I also intend on attempting to feed the data to a GPU. So, what I
am looking for is a modification to GNU APL (and yes, I am willing to do
the work) -- to allow for the complete suppression of normal C++
allocations, etc. and allow the introduction of simple float/double
vectors or matrices (helpful to allow "C"-ish or UTF-8-ish strings: the
data is (C string containing word name) (fixed number of floating
point)... repeated LOTs of times.

The data set(s) may be compressed, so I don't want read them directly --
possibly from a shared memory region (64 bit system only, of course), or
, perhaps using shared variables... but I don't think that would be fast
enough.

Anyway, this begins to allow the push into "big data" and AI
applications. Just looking for some input and ideas here.

Many thanks
Fred Weigel

</pre>
</blockquote>
 
</body>
</html>

Fred Weigel

2017-05-01 05:10:37 UTC

Permalink

Juergen
This is useful -- I was looking at LApack.cc already. It is in line with
what I need (as a template).
I am not worried about saving these things, but I have a 3000000x300
array of C float,and do a 300 element vector by 300 element multiply on
each of the 3 million rows in a "typical"processing step. I don't want
to convert to C double (that would increase memory from 3.6GB to
7.2GB).I don't really want to copy the data at all! I can generate a
descriptor to the data (memory pointer, dimensions).Â Â Â I think I want to
plant the data into a shared memory region (and, in future, pass it to a
GPU).
I think I want to do some specific functions on the data -- right now I
pass in row sets to GNU APL usingthe API, and execute APL code using the
API. However, the control is exclusively from outside APL,meaning I
cannot experimentally analyze using APL.
I can work on the model given by LApack.cc, and supply some functions
which (basically) providea "virtual memory/workspace".
The main problem with these array sizes is saving and loading -- this
array would be around 30GB inGNU APL (as far as I can tell). If ever
saved, it would then take 300GB. I can convert from float to double,and
create the Cell structures, but I would want to simply mmap() the thing
into GNU APL (and, of course,never have the thing participate in memory
management). Again, I was leaning towards partial mapping.Because, when
I start with tensors, the arrays will be sparse.
So, two real problems -- (1) how to deal with LARGE non-sparse matrices,
and (2) how to deal withLARGE sparse matrices.
I really like the expression afforded by APL.
It may be possible to use the APL parser,Â Â and provide new
implementations of primitives -- thanksfor that idea.
LApack.cc seems to provide for something I can start with -- the actual
LARGE arrays won't changeso this provides a good demark point and start
for something workable.Â
Thanks!Fred Weigel

Â Â Â Â Hi Fred,
Â Â Â Â Â Â
Â Â Â Â Â Â I have not fully understood what you want to do exactly, but is
Â Â Â Â Â Â looks to me as if you want to go for
Â Â Â Â Â Â native GNU APL functions. Native functions provide the means to
Â Â Â Â Â Â bypass the GNU APL interpreter
Â Â Â Â Â Â itself to the extent desired. For example you can use APL
variables
Â Â Â Â Â Â but not the APL parser, or the
Â Â Â Â Â Â APL parser but not the implementation of primitives, or whatever
Â Â Â Â Â Â else you are up to.
Â Â Â Â Â Â
Â Â Â Â Â Â As to plain double vectors, it is very difficult to introduce
them
Â Â Â Â Â Â as a new built-in data type because that
Â Â Â Â Â Â change would affect: every APL primitive, every APL operator,
Â Â Â Â Â Â )LOAD, )SAVE, )DUMP, and a lot
Â Â Â Â Â Â more.
Â Â Â Â Â Â
Â Â Â Â Â Â However, you can have a look at (the top-level of) the
Â Â Â Â Â Â implementation of the matrix divide primitive which
Â Â Â Â Â Â is doing what you are maybe after. The implementation of matrix
Â Â Â Â Â Â divide expects either a double vector or
Â Â Â Â Â Â a complex<double> vector as argument(s) and returns such a
Â Â Â Â Â Â vector as result. Before and after the computation
Â Â Â Â Â Â of matrix divide a conversion between APL values and the plain
Â Â Â Â Â Â double or complex vector is performed.
Â Â Â Â Â Â This conversion is very lightweight. If you have a homogenious
GNU
Â Â Â Â Â Â APL value, say all revel items being double,
Â Â Â Â Â Â then that value is almost like a C double *. The difference is a
Â Â Â Â Â Â space between adjacent ravel elements. In other
Â Â Â Â Â Â
Â Â Â Â Â Â C_vector ââ 1 0 1 0 ... / APL_vector
Â Â Â Â Â Â
Â Â Â Â Â Â I can provide you with more information if you want to go along
Â Â Â Â Â Â this path.
Â Â Â Â Â Â
Â Â Â Â Â Â /// JÃŒrgen
Â Â Â Â Â Â
Â Â Â Â Â Â
Â Â Â Â Â Â
Â Â Â Â
Â Â Â Â On 04/29/2017 03:19 AM, Fred Weigel
Â Â Â Â
Â Â Â Â

Â Â Â Â Â Â Jeurgen, and other GNU APL experts.
I am exploring neural nets, word2vec and some other AI related areas.
Right now, I want to tie in google's word2vec trained models (the
billion word one GoogleNews-vectors-negative300.bin.gz)
This is a binary file containing a lot of floating point data -- about
3.5GB of data. These are words, followed by cosine distances. I could
attempt to feed this in slow way, and put it into an APL workspace.Â
But... I also intend on attempting to feed the data to a GPU. So, what I
am looking for is a modification to GNU APL (and yes, I am willing to do
the work) -- to allow for the complete suppression of normal C++
allocations, etc. and allow the introduction of simple float/double
vectors or matrices (helpful to allow "C"-ish or UTF-8-ish strings: the
data is (C string containing word name) (fixed number of floating
point)... repeated LOTs of times.
The data set(s) may be compressed, so I don't want read them
directly --
possibly from a shared memory region (64 bit system only, of
course), or
, perhaps using shared variables... but I don't think that would be fast
enough.
Anyway, this begins to allow the push into "big data" and AI
applications. Just looking for some input and ideas here.
Many thanks
Fred Weigel
Â Â Â Â

Â Â Â Â
Â Â