Fixing common Unicode mistakes with Python — after they’ve been made
Update: not only can you fix Unicode mistakes with Python, you can fix Unicode mistakes with our open source Python package ftfy. It’s on PyPI and everything. You have almost certainly seen text on a...
View ArticleFixing Unicode mistakes and more: the ftfy package
There’s been a great response to my earlier post, Fixing common Unicode mistakes with Python. This is clearly something that people besides me needed. In fact, someone already made the code into a web...
View ArticleHow to make an orderly transition to Python Requests 1.0 instead of running...
There’s a lovely Python module for making HTTP requests, called requests. We use it at Luminoso. A bunch of code we depend on uses it. Our API customers use it. Basically everyone uses it because it’s...
View Articleftfy (fixes text for you) 4.0: changing less and fixing more
ftfy is a Python tool that takes in bad Unicode and outputs good Unicode. I developed it because we really needed it at Luminoso — the text we work with can be damaged in several ways by the time it...
View Articlewordfreq: Open source and open data about word frequencies
Often, in NLP, you need to answer the simple question: “is this a common word?” It turns out that this leaves the computer to answer a more vexing question: “What’s a word?” Let’s talk briefly about...
View Articlewordfreq 1.2 is better at Chinese, English, Greek, Polish, Swedish, and Turkish
Examples in Chinese and British English. Click through for copyable code. In a previous post, we introduced wordfreq, our open-source Python library that lets you ask “how common is this word?”...
View Article
More Pages to Explore .....