This forum is in READ-ONLY mode.
You can look around, but if you want to ask a new question, please use the new forum.
Home » development » Documentation » SLUG for non-latin
SLUG for non-latin [message #73643] Fri, 27 February 2009 19:39 Go to next message
leenya  is currently offline leenya
Messages: 10
Registered: February 2009
Location: Syria
Junior Member
Hi

here is the famous slugify class we have:

class Slug
{
static public function slugify($text)
{
// replace all non letters or digits by -
$text = preg_replace('/\W+/', '-', $text);

// trim and lowercase
$text = strtolower(trim($text, '-'));

return $text;
}

}

But unfortuantly it does not work for non-latin letters or complex languages..
arabic, greek, chinese, ..

anyone knows a better requelar expression or something.. ?
I'm googling it to see but all I found so far is for wordpress which I am not good enough at regular expressions to take it out from what is a wp..

[Updated on: Sat, 28 February 2009 14:46]

Re: SLUG for non-latin [message #73711 is a reply to message #73643 ] Sat, 28 February 2009 22:37 Go to previous messageGo to next message
leenya  is currently offline leenya
Messages: 10
Registered: February 2009
Location: Syria
Junior Member
Ok here is some of what I found.. I don't know if it could be of any help to figure out what to do..

The wordpress guy said, for urls to be allowed in the address and transfered into URI we edit the .htaccess file replacing:

RewriteRule ^([a-zA-Z0-9]+/)?(wp-.*) $2 [L]
RewriteRule ^([a-zA-Z0-9]+/)?(.*\.php)$ $2 [L]

with either:
RewriteRule ^([a-zA-Z0-9_-\x7f-\xff]+/)?(wp-.*) $2 [L]
RewriteRule ^([a-zA-Z0-9_-\x7f-\xff]+/)?(.*\.php)$ $2 [L]

OR:
RewriteRule ^(.+/)?(wp-.*) $2 [L]
RewriteRule ^(.+/)?(.*\.php)$ $2 [L]

-------

A symfony .htaccess has:

# we check if the .html version is here (caching)
RewriteRule ^$ index.html [QSA]
RewriteRule ^([^.]+)$ $1.html [QSA]
RewriteCond %{REQUEST_FILENAME} !-f

# no, so we redirect to our front web controller
RewriteRule ^(.*)$ index.php [QSA,L]

----------
I suppose we need to edit the first block of that..

------
I am thinking maybe as for SLUGs
The regular expression should be instead of finding anything not a charechter
/\W+/

something like remove symbols set and numbers set and whitespaces



Re: SLUG for non-latin [message #73712 is a reply to message #73643 ] Sat, 28 February 2009 22:57 Go to previous messageGo to next message
leenya  is currently offline leenya
Messages: 10
Registered: February 2009
Location: Syria
Junior Member
I actually realized that what is messing up the string is not the $text = preg_replace('/\W+/', '-', $text);
but the $text = strtolower(trim($text, '-'));

that only Latin Letter has letter case sensetivity.
so i am trying to find a way to verify if $text is ASCII or not..

in Java it goes like this

boolean isValidASCII = "text".matches("[\u0000-\u007f]+");

boolean isValidPrintableASCII = "text".matches("[-~]+");

I'm searching for a methode in php to do so..

if( preg_match($text, '[-~]+') ){
$text = strtolower(trim($text, '-'));
}

not working as it seems not to be returning bool ..
icon10.gif  Re: SLUG for non-latin [message #73713 is a reply to message #73643 ] Sat, 28 February 2009 23:06 Go to previous messageGo to next message
leenya  is currently offline leenya
Messages: 10
Registered: February 2009
Location: Syria
Junior Member
I found it!

This will work for any unicode..
it works like charm for any language.. but there is a real bug inside which is I can't remove symbols set from the non ASCII
it only replaces spaces and trim

<?php
class Slug
{
static public function slugify($text)
{
if (!(mb_ereg("[^\w\s\.\-]", $text))) {
$text = preg_replace('/\W+/', '-', $text);
$text = strtolower(trim($text, '-'));
}
else
{
$text = str_replace(" ", "-", $text);
$text = trim($text, '-');
}
return $text;
}

}

The .htaccess need to modify rewrite URL to
Replace the following line

RewriteRule ^([^.]+)$ $1.html [QSA]

with
RewriteRule ^([a-zA-Z0-9_-\x7f-\xff]+/)$ $1.html [QSA]


This let's you navigate around and get objects and everything but writes the urls with raw utf ... which is really ugly..
I am trying to find a solution for this..

[Updated on: Sun, 01 March 2009 10:30]

Re: SLUG for non-latin [message #75319 is a reply to message #73713 ] Fri, 20 March 2009 18:40 Go to previous messageGo to next message
joshcoady  is currently offline joshcoady
Messages: 52
Registered: June 2008
Location: Rohnert Park, CA
Member

This is what we use to take care of unicode chars:

<?php
$slug 
iconv('UTF-8''ASCII//TRANSLIT'$input_str);
$slug strtolower($slug);
$slug preg_replace('/[^a-z0-9]+/i'' '$slug);
$slug trim($slug);
$slug str_replace(' ''-'$slug);
?>


Josh Coady
Symfony Tips & Tricks · Photo Blog · Easy Mortgage Calculator
Re: SLUG for non-latin [message #85549 is a reply to message #75319 ] Sat, 19 September 2009 20:25 Go to previous message
Dvir  is currently offline Dvir
Messages: 33
Registered: January 2009
Location: Israel
Member
HELP! I changed my htaccess to
RewriteRule ^([a-zA-Z0-9_-\x7f-\xff]+/)$ $1.html [QSA]
as you said, i checked the regExp and it works fine with HEBREW language...
but still when i click on the url www.ablades.com/web/חיפוש.html
[www.site.com/"HEBREW WORD"]
it still reffer me to 404, how can i check the way that the route is impliment. and where it goes wrong. i try to change every file encoding to UTF-8 WITHOUT SIGNATURE. please help me to debug the route issue and where it goes wrong.
whice file i need to test first.


http://www.vise.co.il
my symfony project:http://www.ablades.com
Dvir Levanon Programmer
Previous Topic:You guys do a great job on docs... but....
Next Topic:Error in "My First Project" table of contents [PDF]
Goto Forum:
  

powered by FUDforum - copyright ©2001-2004 FUD Forum Bulletin Board Software