Rodrigo Parra

  • Archive
  • RSS

GSoC 2014 Sum Up

Being a participant of the GSoC 2014 as a Sugar developer has been an amazing experience for me. I learned and did a lot, though there are many things more I wish I could have done. Being introduced to mailing lists, IRC meetings and the whole open-source project workflow was a very rewarding journey, both professionaly and personally. 

As a part of the Sugarlistens project, I wrapped the Pocketsphinx speech-recognition library in order to offer activity developers a ‘friendlier’, easier to-start-with API. First weeks were mostly about design and benchmarking of prospective implementation architectures, as I have shared in previous blogposts.

Maze

After finishing a first prototype of the Sugarlistens library, I decided to take it for a spin through two simple use cases:

  1. Starting activities from the home view.
  2. Implementing a voice interface for a simple Activity (Maze).

The following video shows the results of this phase:

Source code for the speech-enabled Maze Activity is available here: https://github.com/rparrapy/maze

Turtle Blocks

Next goal was targeting a somewhat more complex Activity. In this case, Turtle Blocks was chosen for being like a flagship of the Sugar learning environment and because of a helpful and persistent mainteiner.

For Turtle Blocks I developed a boolean block that returns true if the command pronounced by the user equals its text value. Thanks to the conventions stablished for the directory structure and file names of speech-enabled Activities, I was able to build the language model (a JSGF grammar) at runtime based on block values, achieving better accuracy.

The speech recognition block allows Turtle Blocks programmers to do pretty awesome stuff, like what is shown in the next video:

Speech recognition integration was developed as a Tutle Blocks plugin, its source code can be found here: https://github.com/rparrapy/turtle-listens

Querying the Journal

Sugar stores a registry of activities opened by the user, to let him resume its work where he left off. To my mentor and me this looked like a great feature for some speech-recognition goodness.

Using Sugarlistens plus the Sugar Datastore API and some timestamp arithmetics and there you go:

Source code from Sugar with a speech-recognition branch with these changes can be found here: https://github.com/rparrapy/sugar

Sugarlistens Icon

Now, this is all great but let’s say you are not into the whole voice commands thingy (you must be fun at parties (?)). I implemented a device icon to turn speech recognition on/off.

Seriously, it was a needed feature and you can see it in action here:

Source code for the Sugarlistens icon is available here: https://github.com/rparrapy/listen-trailicon

Packaging Up

Last part of the project was about packaging things up to make setting up Sugarlistens as easy as possible. To make it so, both the core sugarlistens library and the device icon listen-trailicon include a genrpm.sh bash script in their root folder.

As the script name kindly suggests, it generates a .rpm package that includes the dependency definitions and other configurations needed to get speech recognition up and running with Sugar.

Final Words

Spare me a little redundance here, last months have been great. Many thanks to my mentor tch, who was great to work with, Walter Bender, who provided lots of helpful tips specially during the Turtle Blocks period and all the guys from the IRC channel and the mailing list.

  • 8 years ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

GSoC Update: More Turtle Love

Finally, after some struggle with Turtle Blocks’ codebase, I got to implement (almost) everything I wanted for my speech plugin.

Changes since the last update include:

  • No more start listening block, since I was able to add the binding code to the start method of the plugin.
  • Dynamic grammar building: the JSGF grammar is now produced on the fly when the user runs the program, by inspecting the text values of the listen to blocks.
    The full list of supported words depends on the phonetic dictionary, which can normally contain thousands of words.

Both changes seek to reduce the complexity of using voice commands as a part of a Turtle Blocks program, and I think the improve the overall user experience.

Do check out the source code, here: https://github.com/rparrapy/turtle-listens

Special thanks to Walter Bender, whose patience was thoroughly tested with my questions :)

  • 8 years ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

GSoC Update: Turtle Blocks Speech Palette

This week has been mainly about two things:

  • RPM packaging: in order to make Sugar Listens easily available for all Sugar Users.
    This subject is still a bit rough, as I’m struggling with some issues to run the speech engine as a daemon on startup.
  • Turtle Blocks integration: Turtle Blocks was chosen as another proof-of-concept Sugar Listens use case.

With help from the maintainer and the guys from FING, I implemented a Turtle Blocks plugin to add some speech recognition related blocks. Currently implemented blocks are:

  • A ‘start listening’ block, that starts tracking voice commands said by the user.
  • A 'listen to’ conditional block, that receives a text parameter and returns True if the last command said by the user equals its parameter, False otherwise.

Currently, only the last command is remembered. With help from the maintainer, I plan to try out keeping a list of N last commands instead, with each block remembering the last command it evaluated.

This may be helpful to avoid blocking inside a loop, and it can even be more natural inside 'long’ loops (as it would allow more than one command to be evaluated in a loop iteration).

Other issue worth discussing is the actual list of valid commands. Perhaps having some blocks to build a grammar on the fly, before the ’start listening’ block, could be interesting?

The source code of the plugin can be found here:

https://github.com/rparrapy/turtle-listens

TL;DR: Here’s a video showing Turtle Blocks integration with a basic grammar:

  • 8 years ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

GSoC Update: Sugar Listens in Action

Time for Sugar Listens news. It’s been a while since my last post, so a lot has been going on regarding the project:

  • Sugar Listens now uses D-Bus’ system bus, so a couple of extra steps are needed to set it up (for now).

    To make this change as painless as possible, I improved the docs of the repository. I included detailed instructions to get the project up and running, so be sure to check the repository description here: https://github.com/rparrapy/sugarlistens

  • I adapted a simple Sugar Activity to integrate speech recognition. I chose Maze, because it seemed very well suited for a proof-of-concept.

    If you are an Activity developer and want a sneek peek of the Sugar Listens API (or if you just want to have fun playing Maze with your voice ), be sure to check it out here: https://github.com/rparrapy/maze/tree/speech-recognition 

  • I also implemented a speech recognition deviceicon to be able to launch some Activities from the Sugar Favorites View.

    The source code and instructions on how to run it can be found here: https://github.com/rparrapy/sugar/tree/speech-recognition

What’s next?

There’s still plenty of work ahead.

First, I want to improve Sugar Listens distribution, so I’m looking into rpm packaging.

TurtleArt speech recognition integration is also on the roadmap. Some great ideas were mentioned during IRC meetings with Walter Bender himself, and I’m really looking forward to work on this. I’ve started a thread in the mailing list about this subject, so feel free to jump in (or you can leave a message here if you are lazy :) ).

That’s it for now, so stay tuned for the next update.

  • 8 years ago
  • 1
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

GSoC Week 3: Show me the code

After some clean up, I’ve released the code I’ve been working on for the past few weeks. Keep in mind that Sugar Listens is still a work in progress, although the uploaded code is a good indicator of the project goals and architecture.

The core project, which exposes Pocketsphinx results through D-Bus, is available at: https://github.com/rparrapy/sugarlistens.

A very simple Sugar Activity for testing purposes can be found at: https://github.com/rparrapy/sugarlistens-livedemo.

How to try it out

These steps were run in Fedora 20, with Sugar 0.100 installed.

To test Sugar Listens with the simple demo Activity, you should first install some dependencies:

sudo yum install pocketsphinx pocketsphinx-libs pocketsphinx-plugin pocketsphinx-devel pocketsphinx-python pocketsphinx-models git python-setuptools

Afterwards, clone and install the sugarlistens github project:

git clone https://github.com/rparrapy/sugarlistens.git

cd sugarlistens

python setup.py develop

In the parent folder, clone and install the sugarlistens-livedemo github project:

cd ..

git clone https://github.com/rparrapy/sugarlistens-livedemo.git

cd sugarlistens-livedemo

python setup.py dev

In a terminal window, run the recognition server included in the sugarlistens project

cd ..

python sugarlistens/sugarlistens/recognizer.py

Without killing the server process, launch the Livedemo Activity from the Sugar desktop. Livedemo basically prints everything it recognizes, nothing fancy but it does its job.

For a list of accepted words for the Livedemo activity, check its dictionary, which can be found at sugarlistens-livedemo/speech/en/dictionary.dic

Last thoughts

Have I missed something? Do you have any trouble running this basic example? Please let me know. I’ve also enabled Disqus comments, to make it easier to get feedback from the community.

  • 8 years ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

GSoC Week 2: Benchmark Results

Time for another GSoC update. This week I’ll share the memory and time benchmark results, comparing the two approaches mentioned in last week’s post (and it seems like we have a winner).

Memory Benchmark

Tests were made with the hub4wsj_sc_8k english model that can be found in the pocketsphinx-models package in Fedora 20.

Basically, for each approach, I launched 10 instances of a minimal graphical application (that printed out every recognized sentence),and memory measurements were taken with top.

Results are shown in the following table:

#Centralized
(MB)
Decentralized
(MB)
124.347.0
224.347.0
324.347.0
424.347.0
524.346.8
624.342.7
724.341.9
824.341.4
924.341.1
1024.340.6
Daemon27.90
Total270.9442.5
The 170 MB difference in terms of memory usage is hard to ignore, and it is an important advantage of the centralized approach.

Time Benchmark

Again, the hub4wsj_sc_8k was used for benchmarking both approaches in terms of time. In this case, a 40 second voice recording was used, and the time it took for two minimal examples to process this input was measured.

What this test tried to estimate was the cost of the IPC calls. So one example script used D-Bus and the other one received the results directly from the GStreamer pipeline.

The recording contained 10 sentences.The test was run 10 times. That gives a total of 100 recognized sentences. Results are shown in the following table:

#GStreamer
(seconds)
GStreamer + D-Bus
(seconds)
12.4157998562.40635800362
22.554405927662.62581205368
32.490586996082.38606405258
42.5381591322.47606492043
52.459161996842.38926792145
62.457581043242.55321598053
72.412658929822.37823200226
82.412658929822.77139401436
92.43664407732.33737802505
102.579935789112.83359718323
Total24.7575926825.15738416
This time results don’t differ that much. A total time difference of 0.3997914791 seconds for a hundred recognized sentences can easily be ignored. This means that the delay caused by IPC calls is unnoticeable.

Summing up

The centralized approach for TamTam Listens was proven to save a significant amount of memory and the delay introduced by D-Bus resulted almost insignificant. This is an important lesson regarding Sugar Listens implementation, and I will definitely try to stick with this architecture.
  • 8 years ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

GSoC Week 1: Design, Design, Design

As some of you might know, I got selected as a student for the Google Summer of Code Program of this year. I’m coding for Sugarlabs and my mentor is tch.

The project I’m currently working on is called Sugar Listens, and it seeks to bring speech recognition capabilities to Sugar. If you want to read it, my proposal can be found here.

I expect GSoC 2014 to be a great learning experience, and I intend to blog about it regularly to share my accomplishments, my mistakes and any thoughts that come to my mind during this journey.

So, what’s in it for Sugar users? 

As a user, wouldn’t it be great to start your favorite Activity by saying: “Sugar, open Activity X”?. 

Think about programming with Turtle Blocks, testing your memory with Memorize or listening to your music with JAMedia without having to touch your keyboard.

This is exactly what Sugar Listens aims for: allowing Sugar users to interact with Sugar Activities, and the Sugar Desktop itself, through voice commands.

What about Activity developers?

An Activity developer shouldn’t have to worry about some speech recognition engine internals, unless he really wants to. So, in order to bring speech recognition to Sugar Activities, offering a higher level of abstraction through an API seems like the way to go.  

Sugar Listens will provide developers with a simple API to easily integrate voice commands with Sugar Activities.

Basically, the questions that an Activity developer should ask to himself are:

  • What kind of commands should my awesome Activity react to?
    This will be answered by creating a JSGF grammar to define the accepted language.

  • What will my awesome Activity do in response to each voice command?
    This will be answered by binding listening patterns (Python regex) to listening functions that perform the intended changes in the Activity.

System Architecture Showdown

This first week has been mostly about making design decisions about Sugar Listens. Although all the pieces of the puzzle are there, finding the right way to put them together is very important.

Regarding system architecture, two approaches are being currently considered and compared:

First Approach: Centralized Sugar Listens

image

In this approach, Sugar Activities are basically thin clients receiving results from a central process through D-Bus. This process holds an instance of a Gstreamer pipeline to communicate with Pocketsphinx.

Pros:

  • Saves memory: only one instance of Pocketsphinx is needed.
  • More general approach: the only dependency for Activities would be D-Bus, which is already extensively used in Sugar.
    This could be benefitial for future (potentially not Pocketsphinx-based) backend alternatives.

Cons:

  • IPC message load: delay introduced by IPC calls should be considered.
  • Context switching for central process: the central process should be somehow awared of Activity switches, in order to use the appropriate resources (language model and speech model for Pocketsphinx).

Second Approach: Decentralized Sugar Listens

image

In this approach, each Sugar Activity holds an instance of a Gstreamer pipeline to communicate with Pocketsphinx. There’s no centralized process and no need for IPC.

Pros:

  • Simpler design: talk about KISS, this is as simple as it gets. No model switching, no IPC.
  • No IPC message load: should be more responsive since there is no IPC delay.

Cons:

  • Memory usage: each Gstreamer pipeline ‘costs’ memory, which
  • sdfaadfcanaaa be quite a valuable resource.
  • Less general approach: a D-Bus based default implementation could make it easier to just 'plug in’ alternative backends.

Intuitively, I lean towards the first approach. However, I’m making some benchmarks to compare both approaches in terms of memory and responsiveness. As tch likes to say: “Even if you have a gut feeling about something, it’s always better to have some numbers to back you up.”

API Sketch

Directory Structure

To integrate a Sugar Activity with Sugar Listens, a speech folder with the following structure must be defined inside the Activity folder;

ACTIVITY_DIR/
    speech/
        en/
            language.gram
            dictionary.dic
            feat.params
            mdef
            means
            .
            .
            .
        es/
        fr/
        .
        .
        .

Basically, inside the speech folder there are folders for each supported language. These folders are named using standard i18n codes.

All speech related resources will be stored in these folders. Each Activity can provide its own language model in a file named language (file extension may vary according to the language model type), its own phonetic dictionary in a file named dictionary.dic and its own acoustic model in a Pocketsphinx compatible format (every other file in the example).

If you had trouble understanding what the last paragraph was about, don’t worry! Default models will be included if available. That being said, writing your own grammar is not that hard and will give you greater control over voice commands (and should improve accuracy, too).

If you want to learn a bit more about these terms, and speech recognition in general, here’s some light reading.

Available methods

The main goal for the API is to provide developers with a simple, easy to use way of binding command patterns with listeners. This will be achieved by using the following methods that should be available for all Activities:

  • listen(listener)
    Binds the listener to all recognized voice commands.
    listener is a function that receives recognized text as its only parameter, like myfunction(text).

  • listen_to(pattern, listener)
    Binds the listener to all recognized voice commands that match the given pattern.
    pattern is a Python regex string that can have named groups.
    listener is a function that recieves the original text, the matched pattern and either has a parameter for each named group defined in pattern or has a keywords parameter that can hold all named parameters.
    For example, for the following call to listen_to:

    listen_to('go from (?P<orig>%s) to (?P<dest>%s)’, myfunction)

    myfunction can be defined as any of the following:

    def myfunction(text, pattern, orig, dest)
    def myfunction(text, pattern, **kwargs)

    Returns an id that identifies the (pattern, listener) tuple and can be used for listener removal.

  • mute(listener)
    Unbinds the listener from all patterns.

  • mute_only(id)
    Unbinds the (pattern, listener) tuple identified by the id.

  • starts_listening()
    Starts capturing voice commands. Any speech related configuration should happen before calling this method.

  • pause_listening()
    Pauses voice command capture. In a Gstreamer based implementation it might translate to setting the pipeline to GST_STATE_NULL or GST_STATE_PAUSE.

  • stop_listening([pattern])
    Unbinds all listeners from the given pattern.
    If no pattern is given, unbinds all listeners from all patterns.

Even though additional configuration methods might be included for speech recognition savvy developers, the above methods describe the basic intended API for Sugar Listens.

Also, keep in mind this project is currently on pre-alpha state, so changes can and will be made.

Final thoughts

Well this was a pretty long post. If you read until this point, thank you very much for your time and I hope I did not bore you to death :)

Things to do for the next update include:

  • Showing result for time and memory benchmarks between the two approaches.
  • Sharing the benchmark code as the initial commit for the Sugar Listens public repository.

As always, any constructive feedback is welcome!

  • 8 years ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

Premio Elena Ammatuna de Cuento Corto 2013

Ayer participé del Acto de Premiación del Premio Elena Ammatuna de Cuento Corto 2013, en el Gran Hotel del Paraguay.

Más allá del premio (recibí la 2da Mención) fue una ocasión excelente para compartir con la familia y con otra gente que comparte el gusto por las letras.

No está demás volver a agradecer a la Fundación Lazos de Cultura “Elena Ammatuna” por la oportunidad y por el trabajo que vienen realizando hace años.

Para los interesados, el libro con todos los cuentos distinguidos está en venta y puede adquirirse contactando con la Fundación.

Por último, aquellos que quieran leer “La carga del héroe”, el cuento que me valió la distinción, pueden hacerlo aquí.

  • 9 years ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

Ojalá me equivoque

Hoy tengo 23 años y la bendición, gracias a Dios, mis padres y mi esfuerzo, de que mi futuro, mi conciencia y mi opinión me pertenecen a mí y no a un partido político. Tengo en mi haber logros y errores, y por sobre todo muchas metas y sueños por cumplir. Hoy tengo también una gran desilusión.

Hoy, 21 de abril del 2013, participé por segunda vez de unas elecciones presidenciales. Fue, sin embargo, la primera vez que voté movido por la obligación, no por la esperanza. Es así, voté hoy para que mi conciencia me permitiese quejarme mañana.

Hoy voté en unas elecciones que invitaban a elegir al menos peor, porque el mejor o estaba ausente o no tenía posibilidades de ganar. Aún así, no sin muchas dudas, decidí votar por quien me pareció el más apto, aunque mi voto haya resultado ‘inútil’ a los ojos de muchos.

Y mientras mi voto se pierde en el porcentaje ínfimo obtenido por su receptor, es triste pensar en todas las conciencias compradas por un billete de 50.000 Gs. o, peor aún, entregadas ciegamente detrás de la idea maldita de que se vota por el color y no por la persona.

Y mientras la tinta se va borrando de mi dedo, duele pensar que estos gobernantes electos hoy sean tal vez los que este pueblo se merece y que, quien lo hubiese dicho, con el dinero suficiente puede uno comprar un país.

Hoy, a pocas horas de escuchar un resultado que parece cantado, pienso que el único nuevo rumbo para Paraguay, mi país, es ir de mal en peor. ¿Cómo pensar diferente, mientras festeja la misma gente que hace años viene burlándose de su pueblo? 

Hoy, como pocas veces, me gustaría estar equivocado.

  • 9 years ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

Fuzzic: Fuzzy string searching for music library querying

This is the first programming related entry and as you, a smart reader, might have noticed, it is written in english. In fact, all entries about software development, coding, hacking and so on, are going to be like that.

That way, I get to blog and practice my writting skills. Talk about killing to birds with one stone! 

The idea behind this post came to my mind while trying out Xnoise. This media player, unlike Banshee, Rhythmbox and others, focuses on the playlist while leaving the music collection as a sidebar with search capabilities.

To make it short: I liked the idea, I liked the player in general, but the search bar felt as it was lacking something. Let me explain what with an example.

Let’s say I have a music track with the following metadata:

Artist: The Cranberries
Album:
No Need to Argue
Title:
Zombie

I would expect this track to show up by typing ‘cranberries zombie’ in the search bar, but it doesn’t. Taking it further, if I typed 'crnbrrs zmb’ it would a good feature (IMHO) to get the mentioned song as a result.

See where I’m going? If you have used Sublime Text 2, you probably know what I mean already: I’m talking about ctrl + p bitches ladies and gentlemen!

image

The formal name for this awesome feature is approximate string matching (AKA fuzzy string searching) and it seemed fun, perhaps even useful, to apply it to a media-player-like track search. A little googling around and I was able to found the tools for the job:

  • eyeD3 made the .mp3 metadata extraction a piece of cake.
  • FuzzyWuzzy translated approximate string matching to one easy method call.

The resulting source code (less than 100 LOC) can be found in:
https://bitbucket.org/rparra/fuzzic/

Fuzzic is far from being a big deal, but trying it out with my own music collection gave me the impression that the idea behind it might have some potential.

I plan to write an entry on how to try Fuzzic with your own music library soon, but in the meantime, it’s not hard at all to figure it out by reading the source code.

PS: Want to play around with fuzzy string searching and FuzzyWuzzy yourself? This article is great.

    • #dev
  • 10 years ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+
Page 1 of 2
← Newer • Older →

Logo

About

Data Science student @TU/e.
Ingeniero Informático @FPUNA.
Developer Wannabe.
Lector.Quiero volver a escribir.
Olimpista. Paraguayo. Este es mi blog personal.

Twitter

loading tweets…

Top

  • RSS
  • Random
  • Archive
  • Mobile
Effector Theme — Tumblr themes by Pixel Union