Python expresión regular

❮ Anterior Próximo ❯

Un RegEx, o expresión regular, es una secuencia de caracteres que forma un patrón de búsqueda.

RegEx se puede usar para verificar si una cadena contiene el patrón de búsqueda especificado.

Módulo RegEx

Python tiene un paquete integrado llamado re, que se puede usar para trabajar con expresiones regulares.

Importar el remódulo:

import re

RegEx en Python

Cuando haya importado el remódulo, puede comenzar a usar expresiones regulares:

Ejemplo

Busque la cadena para ver si comienza con "The" y termina con "Spain":

import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

Funciones RegEx

El remódulo ofrece un conjunto de funciones que nos permite buscar una cadena para una coincidencia:

Function	Description
findall	Returns a list containing all matches
search	Returns a Match object if there is a match anywhere in the string
split	Returns a list where the string has been split at each match
sub	Replaces one or many matches with a string

Metacaracteres

Los metacaracteres son caracteres con un significado especial:

Character	Description	Example
[]	A set of characters	"[a-m]"
\	Signals a special sequence (can also be used to escape special characters)	"\d"
.	Any character (except newline character)	"he..o"
^	Starts with	"^hello"
$	Ends with	"planet$"
*	Zero or more occurrences	"he.*o"
+	One or more occurrences	"he.+o"
?	Zero or one occurrences	"he.?o"
{}	Exactly the specified number of occurrences	"he{2}o"
\|	Either or	"falls\|stays"
()	Capture and group

Secuencias Especiales

Una secuencia especial es \seguida por uno de los caracteres de la lista a continuación y tiene un significado especial:

Character	Description	Example
\A	Returns a match if the specified characters are at the beginning of the string	"\AThe"
\b	Returns a match where the specified characters are at the beginning or at the end of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"\bain" r"ain\b"
\B	Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"\Bain" r"ain\B"
\d	Returns a match where the string contains digits (numbers from 0-9)	"\d"
\D	Returns a match where the string DOES NOT contain digits	"\D"
\s	Returns a match where the string contains a white space character	"\s"
\S	Returns a match where the string DOES NOT contain a white space character	"\S"
\w	Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)	"\w"
\W	Returns a match where the string DOES NOT contain any word characters	"\W"
\Z	Returns a match if the specified characters are at the end of the string	"Spain\Z"

Conjuntos

Un conjunto es un conjunto de caracteres dentro de un par de corchetes []con un significado especial:

Set	Description	Try it
[arn]	Returns a match where one of the specified characters (`a`, `r`, or `n`) are present
[a-n]	Returns a match for any lower case character, alphabetically between `a` and `n`
[^arn]	Returns a match for any character EXCEPT `a`, `r`, and `n`
[0123]	Returns a match where any of the specified digits (`0`, `1`, `2`, or `3`) are present
[0-9]	Returns a match for any digit between `0` and `9`
[0-5][0-9]	Returns a match for any two-digit numbers from `00` and `59`
[a-zA-Z]	Returns a match for any character alphabetically between `a` and `z`, lower case OR upper case
[+]	In sets, `+`, `*`, `.`, `\|`, `()`, `$`,`{}` has no special meaning, so `[+]` means: return a match for any `+` character in the string

La función findall()

La findall()función devuelve una lista que contiene todas las coincidencias.

Ejemplo

Imprima una lista de todas las coincidencias:

import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

La lista contiene las coincidencias en el orden en que se encuentran.

Si no se encuentran coincidencias, se devuelve una lista vacía:

Ejemplo

Devuelve una lista vacía si no se encontró ninguna coincidencia:

import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x)

La función de búsqueda ()

La search()función busca una coincidencia en la cadena y devuelve un objeto Match si hay una coincidencia.

Si hay más de una coincidencia, solo se devolverá la primera aparición de la coincidencia:

Ejemplo

Busque el primer carácter de espacio en blanco en la cadena:

import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("The first white-space character is located in position:", x.start())

Si no se encuentran coincidencias, Nonese devuelve el valor:

Ejemplo

Realice una búsqueda que no devuelva ninguna coincidencia:

import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)

La función dividir ()

La split()función devuelve una lista donde la cadena se ha dividido en cada coincidencia:

Ejemplo

Dividir en cada carácter de espacio en blanco:

import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

Puede controlar el número de ocurrencias especificando el maxsplit parámetro:

Ejemplo

Divida la cadena solo en la primera aparición:

import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)

La función sub()

La sub()función reemplaza las coincidencias con el texto de su elección:

Ejemplo

Reemplace cada carácter de espacio en blanco con el número 9:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

Puede controlar el número de reemplazos especificando el count parámetro:

Ejemplo

Reemplace las primeras 2 apariciones:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

Igualar objeto

Un objeto de coincidencia es un objeto que contiene información sobre la búsqueda y el resultado.

Nota: Si no hay ninguna coincidencia, Nonese devolverá el valor, en lugar del objeto de coincidencia.

Ejemplo

Realice una búsqueda que devolverá un objeto coincidente:

import re

txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) #this will print an object

El objeto Match tiene propiedades y métodos que se utilizan para recuperar información sobre la búsqueda y el resultado:

.span()devuelve una tupla que contiene las posiciones inicial y final de la coincidencia.
.stringdevuelve la cadena pasada a la función
.group()devuelve la parte de la cadena donde hubo una coincidencia

Ejemplo

Imprime la posición (posición inicial y final) de la primera coincidencia.

La expresión regular busca cualquier palabra que comience con una "S" mayúscula:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.span())

Ejemplo

Imprime la cadena pasada a la función:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.string)

Ejemplo

Imprime la parte de la cadena donde hubo una coincidencia.

La expresión regular busca cualquier palabra que comience con una "S" mayúscula:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.group())

Nota: Si no hay ninguna coincidencia, Nonese devolverá el valor, en lugar del objeto de coincidencia.

❮ Anterior Próximo ❯

Tutorial de Python

Manejo de archivos

Módulos de Python

Matplotlib de Python

Aprendizaje automático

pitón mysql

Python MongoDB

Referencia de Python

Referencia del módulo

Python Cómo

Ejemplos de Python

Python expresión regular

Módulo RegEx

RegEx en Python

Ejemplo

Funciones RegEx

Metacaracteres

Secuencias Especiales

Conjuntos

La función findall()

Ejemplo

Ejemplo

La función de búsqueda ()

Ejemplo

Ejemplo

La función dividir ()

Ejemplo

Ejemplo

La función sub()

Ejemplo

Ejemplo

Igualar objeto

Ejemplo

Ejemplo

Ejemplo

Ejemplo